What part are we checksumming? No matter how fast CRC32 is, you still have to tr...

eknkc · on Nov 9, 2024

Pages?. Postgres does that apparently if enabled. And when a page is read, it is already read… The server just checks the checksum. Only overhead should be recalculating the checksum and validating it.

Now, in case a bit flips somewhere you do not touch often, it would probably keep chugging along without noticing that. Which kind of makes sense?

heftig · on Nov 9, 2024

All of it. With a checksums-of-checksums scheme like a Merkele tree, you can effectively and efficiently checksum all the data and keep incremental changes cheap. You only need to update the checksums of the data blocks you touched and their ancestor nodes in the tree.

ajsnigrutin · on Nov 9, 2024

+1 for this.

I use sqlite for some "logging-like" thing a lot, the file is in the ~1.5GB range and growing, and every minute some data is logged to it. Having to read 1.5GB of data from the disk every minute, to add a (few times) timestamp and one 64 bit number to that data seems pointless.

MertsA · on Nov 10, 2024

If you already know the checksum for some huge chunk of the message then you don't need to recompute it to append some data and get a new checksum (at least for CRC). On the read side you would want to have checksums at whatever granularity you want to be able to read but for a larger combined CRC checksum you don't need to ever reread data to append or prepend to it.