Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"Batch-mode processing is always more efficient."

Not sure where I first heard that, but it applies here. Essentially it is almost the same thing as saying that computers are often set up to exploit economies of scale.

Thus building an index all at once after a large set of changes are made is more efficient than incrementally updating an index as each change is made.



It is more accurate to say that sequential I/O is always more efficient than random - even if there is no physical seek time anymore


The reasons why batch mode processing can be more efficient go beyond just sequential I/O.

If I do a little of something now and a little later and yet more even later, I will probably have to deal with caches that don't have my data in them because other things happen in the intervening time. If I do it all at once, then a lot of the work benefits from already-warm caches. (This can apply to disk caches, CPU data caches, and even instruction caches.)

Not to mention that sometimes batch mode processing opens up opportunities to use a more efficient algorithm (even if sometimes just by a constant factor). For example, if you maintain an index in a balanced tree and keep adding to it piecemeal, you do extra work continually rebalancing that tree. Whereas in theory if you built an index all at once, you could collect all the data, sort it using some kind of fast sort like mergesort, and then write out a final tree which is already balanced as desired and doesn't need to be rearranged as data comes in in random order.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: