Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It was committed successfully, if there's a valid commit frame in the WAL. It simply wasn't checkpointed at this point.

If the WAL has valid data frames and a valid commit frame after a corrupted frame, there's a strong likelihood this was caused by accidental corruption, so an argument could be made that silently deleting such a WAL is not necessarily the best idea.

A lesson to be learned here, is that the durability of stuff in the WAL is reduced, even if you use PRAGMA synchronous=FULL.



That's true. The protocol aware recovery paper [1] talked about the challenges of disentangling corruption in the middle of the log vs uncommitted data at the end of the log. This is an issue in other log-based systems as well.

The OP made it sound like an oversight rather than an implication of assuming that the filesystem won't return corrupt data that was successfully written and fsync'ed. Sqlite is pretty upfront about the tradeoffs: https://www.sqlite.org/howtocorrupt.html#_failure_to_sync

1: https://blog.acolyer.org/2018/02/27/protocol-aware-recovery-...


This is likely to happen in the case that your hardware is telling lies about when it has committed data to disk.

In this case, it makes sense to cut short the WAL application, but you are probably right in saying it should throw an error (or at least a warning).


You're right that write reordering might cause this (an invalid frame followed by a few valid frames).

That shouldn't happen if you use PRAGMA synchronous=FULL, but in WAL mode it's very common to use NORMAL.

Not sure what's the better strategy here, but I'd definitely appreciate a mode that warns me before silently truncating a WAL that has valid frames in it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: