bpo-34561: Switch to Munro & Wild "powersort" merge strategy. #28108

tim-one · 2021-09-01T03:27:43Z

First stab at this - work in progress, but so far so good. Still needs changes to listsort.txt, and, in listobject.c, at least reducing the max stack space needed (for remembering pending runs to be merged ).

https://bugs.python.org/issue34561

member recording a run's starting index - that can be computed cheaply on-the-fly, by storing instead the key array's base address just once in the MergeState struct.

code, and a new writeup for listsort.txt.

Misc/NEWS.d/next/Core and Builtins/2021-09-01-19-21-48.bpo-34561.uMAVA-.rst

…61.uMAVA-.rst

tim-one · 2021-09-02T19:12:28Z

@rhettinger , I requested your review because you're the only active developer I recall who did significant work on fiddling our sort algorithm - although this change is at a deeper level than, e.g., adding a key= argument.

@taleinat , I requested your review because you once said you'd be happy to review a change of mine nobody else wanted to touch. So I'm calling your bluff 😉.

Both, this change isn't expected to make dramatic changes in timing. Indeed, I can barely detect any in ordinary cases (for example, running Lib/test/sortperf.py). It's aimed instead at replacing my long-ago ad hoc made-up merging strategy with one from later academic work that has provably stellar worst-case behavior, and a published paper readers can consult for details and proofs.

To see a clear - but still modest - timing difference, here's a case contrived to be unusually poor for the current merge strategy:

runs = 190000, 180000, 10000, 10000

from time import perf_counter as now
xs = []
for r in runs:
    xs.extend(range(r * 320))

t0 = now()
xs.sort()
t1 = now()
print(len(xs), t1 - t0)

tim-one · 2021-09-02T23:42:09Z

Here's a slightly more effective way to contrive a "bad case" for the current code, which also allows specifying (a close approximation to) the desired list length.

N = 100_000_000  # target length of list

MAX_MINRUN = 64
base = N >> 1

runs = base, base - MAX_MINRUN, MAX_MINRUN, 1

xs = []
for r in runs:
    xs.extend(range(r))

from time import perf_counter as now
t0 = now()
xs.sort()
t1 = now()
print(len(xs), t1 - t0)

tim-one · 2021-09-03T02:06:29Z

And another bad-case generator. This exaggerates the effects of merging patterns by using only two distinct objects as list elements, so essentially all operations are done on objects sitting in L1 cache.

It can also generate bad cases for powersort. However, note that they're relatively less bad than the timsort bad cases are for timsort. In fact, before I pulled the "only two objects" trick to slash cache misses, it was hard to be sure they were "bad" cases (ya, they were provably so, but not really measurably so 😉).

N = 100_000_000  # target length of list

MAX_MINRUN = 64

if 1:   # bad for timsort
    base = N >> 1
    runs = base, base - MAX_MINRUN, MAX_MINRUN, 1
else:   # bad for powersort
    base = N // (2 * MAX_MINRUN)
    runs = (base - 1)*MAX_MINRUN, MAX_MINRUN, MAX_MINRUN, base * MAX_MINRUN

xs = []
from itertools import repeat
for r in runs:
    xs.append(0)
    xs.extend(repeat(1, r-1))

from time import perf_counter as now
t0 = now()
xs.sort()
t1 = now()
print(len(xs), t1 - t0)
assert xs[: len(runs) + 1] == [0] * len(runs) + [1]

cfbolz

not a super thorough review, but I did read through all the code and noticed a typo and a nitpick. (aside: it's a pity that we can't easily test those implementation-level C functions, it would be super nice to stick a random test for powerloop somewhere and compare it against a high level python implementation).

Objects/listobject.c

cfbolz · 2021-09-03T11:29:23Z

Objects/listobject.c

+     * b = s1 + n1 + n2/2 = a + (n1 + n2)/2
+     */
+    Py_ssize_t a = 2 * s1 + n1;  /* 2*a */
+    Py_ssize_t b = a + n1 + n2;  /* 2*b */


bit of a nitpick, but I found this confusing when reading the code and trying to follow along with the comments: the math a and b from the comments having the same names as the code a and b, despite those storing 2*math a (and b respectively).

I added new comments to explain it. Does that help enough?

They definitely help, but what tripped me up is really the reuse of the letter "a" to mean both "a" (in the comment) and "2*a" (in the code). so maybe the comment or the code could instead talk about x and y instead?

Thanks, but I'm disinclined to make more changes. The overwhelmingly important concept here is "midpoint", and I don't want to obscure it under layers of pedantic name proliferation. In my mind, the midpoints are named "a" and "b", period, and the shift in perspective by 1 bit position is a triviality much better overlooked than emphasized 😉

reflect that it never merges the most recent run anymore. Putting the new run on the stack before that function is called is a waste of time now, and just complicated the function with now-silly code to pop the run off at the start and push it back at the end. Now the caller pushes it on the stack only after that function returns.

except in an assert. So get rid of it. Assert the run is adjacent in its caller instead.

bedevere-bot · 2021-09-06T17:54:44Z

@tim-one: Please replace # with GH- in the commit message next time. Thanks!

bpo-34561: Switch to Munro & Wild "powersort" merge strategy.

e311b34

the-knights-who-say-ni added the CLA signed label Sep 1, 2021

bedevere-bot added the awaiting core review label Sep 1, 2021

tim-one and others added 4 commits September 1, 2021 11:01

Various changes. Most significantly, got rid of the new struct

1c6dfcf

member recording a run's starting index - that can be computed cheaply on-the-fly, by storing instead the key array's base address just once in the MergeState struct.

Close to done! Some code fiddling, changed comments to match the

b253852

code, and a new writeup for listsort.txt.

Typo repair.

cbd2e6d

📜🤖 Added by blurb_it.

c6b1806

tim-one commented Sep 1, 2021

View reviewed changes

Misc/NEWS.d/next/Core and Builtins/2021-09-01-19-21-48.bpo-34561.uMAVA-.rst Outdated Show resolved Hide resolved

Update Misc/NEWS.d/next/Core and Builtins/2021-09-01-19-21-48.bpo-345…

560a3f1

…61.uMAVA-.rst

tim-one self-assigned this Sep 2, 2021

tim-one requested review from rhettinger and taleinat September 2, 2021 18:59

cfbolz reviewed Sep 3, 2021

View reviewed changes

tim-one added 3 commits September 3, 2021 12:09

Comment changes to address good points raised in review.

70fd24e

The starting-index argument to found_new_run wasn't actually used,

56f8ccf

except in an assert. So get rid of it. Assert the run is adjacent in its caller instead.

tim-one merged commit 5cb4c67 into python:main Sep 6, 2021

bedevere-bot removed the awaiting core review label Sep 6, 2021

tim-one deleted the power branch September 6, 2021 17:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bpo-34561: Switch to Munro & Wild "powersort" merge strategy. #28108

bpo-34561: Switch to Munro & Wild "powersort" merge strategy. #28108

tim-one commented Sep 1, 2021 •

edited by bedevere-bot

Loading

tim-one commented Sep 2, 2021

tim-one commented Sep 2, 2021

tim-one commented Sep 3, 2021 •

edited

Loading

cfbolz left a comment

cfbolz Sep 3, 2021

tim-one Sep 3, 2021

cfbolz Sep 4, 2021

tim-one Sep 4, 2021

bedevere-bot commented Sep 6, 2021

bpo-34561: Switch to Munro & Wild "powersort" merge strategy. #28108

bpo-34561: Switch to Munro & Wild "powersort" merge strategy. #28108

Conversation

tim-one commented Sep 1, 2021 • edited by bedevere-bot Loading

tim-one commented Sep 2, 2021

tim-one commented Sep 2, 2021

tim-one commented Sep 3, 2021 • edited Loading

cfbolz left a comment

Choose a reason for hiding this comment

cfbolz Sep 3, 2021

Choose a reason for hiding this comment

tim-one Sep 3, 2021

Choose a reason for hiding this comment

cfbolz Sep 4, 2021

Choose a reason for hiding this comment

tim-one Sep 4, 2021

Choose a reason for hiding this comment

bedevere-bot commented Sep 6, 2021

tim-one commented Sep 1, 2021 •

edited by bedevere-bot

Loading

tim-one commented Sep 3, 2021 •

edited

Loading