-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
bpo-34561: Switch to Munro & Wild "powersort" merge strategy. #28108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
member recording a run's starting index - that can be computed cheaply on-the-fly, by storing instead the key array's base address just once in the MergeState struct.
code, and a new writeup for listsort.txt.
Misc/NEWS.d/next/Core and Builtins/2021-09-01-19-21-48.bpo-34561.uMAVA-.rst
Outdated
Show resolved
Hide resolved
@rhettinger , I requested your review because you're the only active developer I recall who did significant work on fiddling our sort algorithm - although this change is at a deeper level than, e.g., adding a @taleinat , I requested your review because you once said you'd be happy to review a change of mine nobody else wanted to touch. So I'm calling your bluff 😉. Both, this change isn't expected to make dramatic changes in timing. Indeed, I can barely detect any in ordinary cases (for example, running To see a clear - but still modest - timing difference, here's a case contrived to be unusually poor for the current merge strategy: runs = 190000, 180000, 10000, 10000
from time import perf_counter as now
xs = []
for r in runs:
xs.extend(range(r * 320))
t0 = now()
xs.sort()
t1 = now()
print(len(xs), t1 - t0) |
Here's a slightly more effective way to contrive a "bad case" for the current code, which also allows specifying (a close approximation to) the desired list length. N = 100_000_000 # target length of list
MAX_MINRUN = 64
base = N >> 1
runs = base, base - MAX_MINRUN, MAX_MINRUN, 1
xs = []
for r in runs:
xs.extend(range(r))
from time import perf_counter as now
t0 = now()
xs.sort()
t1 = now()
print(len(xs), t1 - t0) |
And another bad-case generator. This exaggerates the effects of merging patterns by using only two distinct objects as list elements, so essentially all operations are done on objects sitting in L1 cache. It can also generate bad cases for powersort. However, note that they're relatively less bad than the timsort bad cases are for timsort. In fact, before I pulled the "only two objects" trick to slash cache misses, it was hard to be sure they were "bad" cases (ya, they were provably so, but not really measurably so 😉). N = 100_000_000 # target length of list
MAX_MINRUN = 64
if 1: # bad for timsort
base = N >> 1
runs = base, base - MAX_MINRUN, MAX_MINRUN, 1
else: # bad for powersort
base = N // (2 * MAX_MINRUN)
runs = (base - 1)*MAX_MINRUN, MAX_MINRUN, MAX_MINRUN, base * MAX_MINRUN
xs = []
from itertools import repeat
for r in runs:
xs.append(0)
xs.extend(repeat(1, r-1))
from time import perf_counter as now
t0 = now()
xs.sort()
t1 = now()
print(len(xs), t1 - t0)
assert xs[: len(runs) + 1] == [0] * len(runs) + [1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a super thorough review, but I did read through all the code and noticed a typo and a nitpick. (aside: it's a pity that we can't easily test those implementation-level C functions, it would be super nice to stick a random test for powerloop
somewhere and compare it against a high level python implementation).
* b = s1 + n1 + n2/2 = a + (n1 + n2)/2 | ||
*/ | ||
Py_ssize_t a = 2 * s1 + n1; /* 2*a */ | ||
Py_ssize_t b = a + n1 + n2; /* 2*b */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bit of a nitpick, but I found this confusing when reading the code and trying to follow along with the comments: the math a and b from the comments having the same names as the code a and b, despite those storing 2*math a (and b respectively).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added new comments to explain it. Does that help enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They definitely help, but what tripped me up is really the reuse of the letter "a" to mean both "a" (in the comment) and "2*a" (in the code). so maybe the comment or the code could instead talk about x and y instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, but I'm disinclined to make more changes. The overwhelmingly important concept here is "midpoint", and I don't want to obscure it under layers of pedantic name proliferation. In my mind, the midpoints are named "a" and "b", period, and the shift in perspective by 1 bit position is a triviality much better overlooked than emphasized 😉
reflect that it never merges the most recent run anymore. Putting the new run on the stack before that function is called is a waste of time now, and just complicated the function with now-silly code to pop the run off at the start and push it back at the end. Now the caller pushes it on the stack only after that function returns.
except in an assert. So get rid of it. Assert the run is adjacent in its caller instead.
@tim-one: Please replace |
First stab at this - work in progress, but so far so good. Still needs changes to listsort.txt, and, in listobject.c, at least reducing the max stack space needed (for remembering pending runs to be merged ).
https://bugs.python.org/issue34561