Skip to content

spec: add loop invariant for timsort#3135

Open
Yu-zh wants to merge 1 commit intomainfrom
Yu-zh/timsort-invariant
Open

spec: add loop invariant for timsort#3135
Yu-zh wants to merge 1 commit intomainfrom
Yu-zh/timsort-invariant

Conversation

@Yu-zh
Copy link
Collaborator

@Yu-zh Yu-zh commented Jan 9, 2026

For testing new nightly feature

@Yu-zh
Copy link
Collaborator Author

Yu-zh commented Jan 9, 2026

I'll analyze the loop invariants and reasoning in this file. Let me examine the key parts.

  TimSort Invariants (lines 27-28)

  The stated invariants are:
  1. runs[i - 1].len > runs[i].len for every i in 1..runs.len()
  2. runs[i - 2].len > runs[i - 1].len + runs[i].len for every i in 2..runs.len()

  Analysis of collapse function (lines 157-173):

  The collapse function correctly detects invariant violations:
  - runs[n - 2].len <= runs[n - 1].len → violates invariant 1
  - runs[n - 3].len <= runs[n - 2].len + runs[n - 1].len → violates invariant 2

  The merge choice logic (line 165) is correct:
  - If runs[n - 3].len < runs[n - 1].len, merge at n - 3 (merges runs n-3 and n-2)
  - Otherwise merge at n - 2 (merges runs n-2 and n-1)

  However, this implementation only checks the top 3 runs on the stack. This matches the original TimSort which had a https://bugs.openjdk.org/browse/JDK-8072909 - invariants can be violated deeper in the stack. In practice, this only manifests with arrays larger than ~2^30 elements, so it's not a practical issue.

  Bug Found: Missing Early Return (lines 37-39)

  if len <= max_insertion {
    MutArrayView::insertion_sort(arr)
  }  // <-- Missing return here!

  The comment on line 35 says "Short arrays get sorted in-place via insertion sort to avoid allocations", but after insertion sort, the function continues into the TimSort logic rather than returning. This is inefficient - the already-sorted array will be processed through find_streak, run detection, etc.

  Fix: Add return after line 38:
  if len <= max_insertion {
    MutArrayView::insertion_sort(arr)
    return
  }

  Other Invariants - Correct

  1. merge function (lines 77-100): Correctly implements a stable merge by copying the right half to a buffer and merging from high to low indices.
  2. find_streak (lines 106-129): Correctly distinguishes:
    - Strictly descending: arr[end] < arr[end - 1]
    - Non-descending: arr[end] >= arr[end - 1]
  3. QuickSort limit logic (lines 243-260): The introsort-style fallback to heapsort when limit == 0 correctly prevents O(n²) worst-case. The limit decreases when partitions are unbalanced (< len/8), which is standard.
  4. Heap sort sift_down (lines 419-434): Correctly maintains the max-heap property.

  Summary
  ┌─────────────────────────────────────┬─────────────────────────────────────────────┐
  │              Component              │                   Status                    │
  ├─────────────────────────────────────┼─────────────────────────────────────────────┤
  │ TimSort invariant reasoning         │ Correct (with known theoretical limitation) │
  ├─────────────────────────────────────┼─────────────────────────────────────────────┤
  │ Missing return after insertion_sort │ Bug - causes unnecessary work               │
  ├─────────────────────────────────────┼─────────────────────────────────────────────┤
  │ merge function                      │ Correct                                     │
  ├─────────────────────────────────────┼─────────────────────────────────────────────┤
  │ find_streak                         │ Correct                                     │
  ├─────────────────────────────────────┼─────────────────────────────────────────────┤
  │ QuickSort limit/fallback            │ Correct                                     │
  ├─────────────────────────────────────┼─────────────────────────────────────────────┤
  │ Heap sort                           │ Correct                                     │
  └─────────────────────────────────────┴─────────────────────────────────────────────┘
  Would you like me to fix the missing return statement?

✻ Brewed for 7m 57s

This is the analyze result when I changed it into an incorrect implementation. I'm not sure the invariants and reasoning helped, but the Claude's knowledge about Timsort definitely helped a lot here.

@mndrix
Copy link
Contributor

mndrix commented Jan 9, 2026

This new invariant feature looks exciting

@bobzhang bobzhang force-pushed the Yu-zh/timsort-invariant branch from fb5ef5d to 7d013b5 Compare January 15, 2026 06:49
@coveralls
Copy link
Collaborator

Pull Request Test Coverage Report for Build 2393

Details

  • 3 of 23 (13.04%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.2%) to 97.287%

Changes Missing Coverage Covered Lines Changed/Added Lines %
builtin/array_sort_impl.mbt 3 23 13.04%
Totals Coverage Status
Change from base Build 2391: -0.2%
Covered Lines: 10939
Relevant Lines: 11244

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants