Those projects often have dozens of other priorities beyond just one specific AP...

jart · on Oct 2, 2024

> musl doesn't even have things like SIMD optimized string comparison routines. You would be shocked at how many CPU cycles in a non-trivial program are spent on those tasks

Building GNU Make with Cosmo or glibc makes cold startup go 2x faster for me on large repos compared to building it with Musl, due to vectorized strlen() alone (since SIMD is 2x faster than SWAR). I sent Rich a patch last decade adding sse to strlen(), since I love Musl, and Cosmo is based on it. But alas he didn't want it. Even though he seems perfectly comfortable using ARM's strlen() assembly.

> glibc's malloc is tolerable but fails handily to more modern alternatives in overall speed and scalability

The focus and attention I put into cosmo mutexes isn't unique. I put that care into everything else too, and malloc() is no exception. Cosmo does very well at multi-threaded memory allocation. I can pick benchmark parameters where it outperforms glibc and jemalloc by 100x. I can also pick params where jemalloc wins by 100x. But I'm reasonably certain cosmo can go faster than glibc and musl in most cases while using less memory too. You have Doug Lea to thank for that.

Every day is a good day working on cosmo, because I can always find an opportunity to dive into another rabbit hole. Even ones as seemingly unimportant as clocks: https://github.com/jart/cosmopolitan/commit/dd8544c3bd7899ad...

senderista · on Oct 3, 2024

> You have Doug Lea to thank for that.

Wait, you don't mean your allocator is based on dlmalloc, do you?

jart · on Oct 4, 2024

Yes. Is there something wrong with that? If your program links pthread_create() then cosmo creates a dlmalloc arena for each core on your computer and uses sched_getcpu() to index them combined with raw nsync locks.

senderista · on Oct 5, 2024

I thought that’s pretty much what ptmalloc did (glibc malloc is forked ptmalloc3)?

wrsh07 · on Oct 3, 2024

Another example is hash maps: all the large companies built better maps in cpp (folly, absl), but the number of apps that are performance sensitive and still use std::unordered_map will be astounding forever.

(Why not upstream? ABI compatibility which is apparently a sufficient veto reason for anything in cpp)

menaerus · on Oct 3, 2024

No, that's rubbish that became a meme. There is no universally the best design of a hash-map. Each design always comes with its own trade-offs and so is the case with open-addressing vs separate chaining in conflict resolution. Iterator instability and memory-bandwidth intensive rehashing just to name a few.

Design of your hash-map, or any other algorithm or data-structure, is always and only dictated by your workload. Microbenchmark suite from xyz company/person will hardly represent that reality.

So the more generous answer would be it's not because of "ABI" meme but because std::unordered_map is just fine for the most cases out there. And this is what general purpose standard library is ought to be able to support.

wrsh07 · on Oct 3, 2024

Lol this is so incorrect it's almost funny. Google saved something like single digit % of CPU and memory fleet wide by switching to this map. That's not a micro benchmark

menaerus · on Oct 3, 2024

Learn to read with understanding and learn to have some respect as well. Never have I said that it cannot make a difference but that it's not universal and that it depends on the workload.

wrsh07 · on Oct 5, 2024

Ok let me try to unpack what I'm saying so that we're on the same page:

1. Until c++ [toolchains] do an abi break, you will find better performance in libraries like boost/folly/absl than in the STL. Titus wrote about this in "abi now or never" ~4 years ago. His thesis was "the language is leaving performance on the table, and by default if you care about performance you should not use the STL." Afaict this is all still true. If you think this is a meme, you're wrong, it's existential for companies like Google.

2. You mention a bunch of things that one might consider when picking a hashmap. "There might be a workload where I need one map vs another." Yes, that's true. However, my original statement was saying: if you care about performance you should not use unordered map. This is sort of like saying "if you care about performance you should not use bubble sort." Of course there are lots of considerations when choosing a sorting algorithm. Does it need to be cache tuned/oblivious? Do you have a lot of runs in your data (eg is it mostly sorted?)? HOWEVER, I will happily tell you to prefer a gently optimized quick sort over bubble sort for ~all workloads, and if you need further performance then go consider those things.

The performance of the absl/folly maps is not a joke. It's the basis for the rust hash map. These are good, general purpose hash maps. That's not a meme. It's also not a meme to say that the cpp working groups are generally unwilling to make ABI breaks for small improvements (indeed, in the past decade they haven't made an ABI break to bundle large improvements)

wrsh07 · on Oct 5, 2024

Microbenchmark suite is very different from fleet wide profiling

The whole point of switching every single use of unordered map to node/flat hash map is because it is always better. It always uses less memory (much less memory per node).

Edit: what did I say that was disrespectful? I called you out for being wrong because you're very wrong