"Batch-mode processing is always more efficient." Not sure where I first heard t...

gaius · on Jan 26, 2018

It is more accurate to say that sequential I/O is always more efficient than random - even if there is no physical seek time anymore

adrianmonk · on Jan 28, 2018

The reasons why batch mode processing can be more efficient go beyond just sequential I/O.

If I do a little of something now and a little later and yet more even later, I will probably have to deal with caches that don't have my data in them because other things happen in the intervening time. If I do it all at once, then a lot of the work benefits from already-warm caches. (This can apply to disk caches, CPU data caches, and even instruction caches.)

Not to mention that sometimes batch mode processing opens up opportunities to use a more efficient algorithm (even if sometimes just by a constant factor). For example, if you maintain an index in a balanced tree and keep adding to it piecemeal, you do extra work continually rebalancing that tree. Whereas in theory if you built an index all at once, you could collect all the data, sort it using some kind of fast sort like mergesort, and then write out a final tree which is already balanced as desired and doesn't need to be rearranged as data comes in in random order.