Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In the mid 80s to 90s, there was a lot of interest around "query responsive" data structures. There still is, I'm sure, but that was what I studied and I've moved on since then.

Of these, the fibbonaci and pairing heaps were the most well known products, I reckon.

They essentially move "Active" parts of the data structure upwards near the top of the tree. The use case is different (being heaps), but I wouldn't be surprised if that line of R&D has continued, or if there's a huge gap in performance that can be reclaimed with "intelligent" guesses like this one.

I'm rambling and conjecturing, but maybe someday branch prediction will get a huge boost to their already tangible smarts and we'll see another explosion in practical CPU throughput, combined with a little help from algs and data structures, of course.



My personal favorite of these is the Splay Tree[1]. I've never used it in production, but it's really simple to implement and reason about. Though my main experience was trying (and failing) to prove amortized bounds on the operations.

[1] https://en.wikipedia.org/wiki/Splay_tree


If anything, the incredibly convoluted amortized analysis that goes into proving bounds on the cost of splay tree operations is perhaps more iconic than the actual splay tree data structure.


Tarjan and Sleator are legends.


Does that mean any access to these data structures needs an exclusive lock? One advantage to the "path hints" approach described in the article is that the path hints can be stored separately from the btree itself. The author writes:

> For single-threaded programs, it’s possible to use one shared path hint per B-tree for the life of the program.

> For multi-threaded programs, I find it best to use one path hint per B-tree , per thread.

> For server-client programs, one path hint per B-tree, per client should suffice.

Seems likely multiple threads/clients would have different access patterns anyway, so the hints might actually be better this way in addition to more concurrent.

edit: also, this approach feels a lot lighter-weight to me than mutating the tree: I can imagine the mutations involving frequent allocation/deallocation where this hint is just a vector that grows to the maximum depth and stays there.


Had the same thought. Making every access RW seems like a cure worse than the disease in a multithreaded situation.


> mid 80s


There are ways to make it work. The simplest would be to just not do any splaying most of the time. No splaying means no locking is necessary, just an atomic read to check if anything changed. Apparently it's actually good for performance[1] to splay only occasionally instead of on every access. Another would be to use atomics to do lockless updates. Probably best to only do partial splaying, or only occasionally like above, but it's doable[2].

If the data is actually stored in an array and the tree structure is actually just a map that stores the "nodes" as a tuple (left, element, right) in another array, where left, element and right are indexes into the backing array, then you could just have one such map per thread and only had to use synchronization if anything is added or deleted. Or even have purely thread local addition/deletion if that is sufficient and no locking at all is desirable. A multi-splay tree essentially.

Storing nodes as elements in an array is probably a good idea for performance in general, even in the single threaded case. It means you don't have to allocate each element individually and that things are stored in a contiguous block of memory. The elements are not going to be in the correct order since they fixed in place in the backing array, but they're still going to be much more local than if they're allocated individually. Allocating from an arena/bump allocator would have similar advantages though.

You could also store everything in a single array, replacing the node's element index with the data itself and then use atomics again. This is how I've seen it done in the wild, just without the atomics. Really depends on what you're doing, though. If the individual items are small because they mostly store pointers into some other heap allocated data or just a bunch of numbers, so that the whole map+element structure could fit in the cache, then using multiple maps probably gives better performance.

Can't find any references for this because I just came up with the idea of using multiple maps (not the array storage itself) into a single backing array. Multi-splay trees[3] look close in principle, but they use a reference tree instead of a flat array. It's probably not substantial enough of a difference to really call it a different data structure though.

Edit: Another idea would be a delayed queue splay tree, where some kind of reference to the accessed node gets added to a queue and once the queue reaches a certain size, or some other condition is met, the tree gets locked and all the splaying is done in a single batch. Some operations could even be optimized away if computing that is less costly than doing the operations itself Is. In that case it might actually be useful even in the single threaded case. Somewhere relates to the randomized splaying and probably has been done before though.

[1]: http://www14.in.tum.de/personen/albers/papers/ipl02.pdf

[2]: https://digitalscholarship.unlv.edu/cgi/viewcontent.cgi?arti...

[3]: https://www.cs.cmu.edu/~jonderry/multi-splay.pdf


Those techniques sound possible but the path hint approach seems so much simpler to apply than batching splay updates.

Particularly with epoch GC: When there's heavy reading and rare writes, I like using epoch GC so readers never block. Writes then need to create a whole new map. I suppose one could use something like the delayed queue splay tree you're describing, publishing a new map occasionally, and ensuring this doesn't race with other updates. But it'd be a pain to get right when you could just use path hints instead. Is there an overwhelming benefit I'm missing to justify this complexity and the expense of full map rebuilds?


> we'll see another explosion in practical CPU throughput, combined with a little help from algs and data structures

The learned index[1] is basically a variation of this sort of "hint" which relies on getting the right answer most of the time, but from learning the distribution function as a model instead of relying on the programmer to heuristic it.

(from that paper)

> The key idea is that a model can learn the sort order or structure of lookup keys and use this signal to effectively predict the position or existence of records

I haven't seen anything mix this sort of work with an Eytzinger traversal, but there's always diminishing returns for these.

[1] - https://arxiv.org/abs/1712.01208


> fibbonaci




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: