Sharing some personal thoughts: Publishing a paper can be an imperfect, but usef...

mlochbaum · on June 13, 2023

pdqsort paper: https://arxiv.org/abs/2106.05123

fluxsort compiles in C++ just fine.

Pre-sorted data is just the simplest example of adaptivity. Partially-sorted input can't be tracked easily by the caller and I believe that something like merging a few sorted arrays by concatenating then sorting is pretty common. Fluxsort, glidesort, ipnsort, vergesort all have methods that allow for merging such patterns quickly. According to the benchmarks in the article, many are also doing better at low-cardinality data.

If you don't consider any of this adaptivity useful, it's very strange to compare to std::sort which does make some attempt to be adaptive. And strange that your documentation wouldn't mention the difference. But we all know why you're doing that. I suppose you'll keep your fingers in your ears regarding 32-bit radix sort, which has benchmarked at 1450MB/s for a million elements (2.77 ns/v) on M1: https://github.com/mlochbaum/rhsort/issues/2#issuecomment-15... .

The fulcrum partition from crumsort is an in-place adaptation of fluxsort's out-of-place partition, which is why I mentioned it. It's faster when things stop fitting in L2 in my testing.

janwas · on June 13, 2023

> According to the benchmarks in the article, many are also doing better at low-cardinality data.

VQSort handles low cardinality quite well. Unfortunately only a few benchmarks in the article (not including that one) were re-run after our performance bug fix.

> it's very strange to compare to std::sort which does make some attempt to be adaptive.

We chose std::sort mainly because it seems to be widely used.

> But we all know why you're doing that. I suppose you'll keep your fingers in your ears regarding 32-bit radix sort

huh, I'm curious what's the background or context for this remark? FYI I published a fast 32-bit radix sort in 2010: https://arxiv.org/abs/1008.2849 The trend has been towards less and less memory bandwidth per core, and we are often seeing servers bottlenecked on that.