Offline high speed data ingestion of multi-thousand file, multi hundred GB data ...

Offline high speed data ingestion of multi-thousand file, multi hundred GB data sets, followed by rapid transfer to permanent online storage (and replication fan-out, etc).

Seems convenient to allow optimization for high speed sequential reads and random read/writes at different parts of the life cycle, along with indexing, crcs, signatures, etc.

One big issue with zip storing the index at the end of course is a truncated file basically lost most of it’s context and is generally unrecoverable even in part, which this could also help with from a durability perspective.

Storing it at the beginning (without a end pointer) opens up the possibility you have a valid looking archive you’re touching that is truncated and missing a lot of data, and won’t know until you look past the end (or validate total bytes or whatever, which doesn’t work well when steaming).

Storing the index at the beginning, pointer and file sig at the end, and all the other format extensions does solve for all this. Which is convenient.