Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thanks for sharing your experience. It's non-trivial and surprising behavior like this that drove me to build a custom system[0] myself.

When I started researching version control tools for large files, I remember feeling like git-annex and Git LFS were awkwardly bolted onto Git; Git simply wasn't designed for large files. Then I found DVC[1], and its approach rang true for me. However, after using DVC for a year or so, I grew tired of its many puzzling behaviors (most of which are outlined in the README at [0]). In the end, I built the tool I wanted for the job -- one that is exceptionally simple and fast.

[0]: https://github.com/kevin-hanselman/dud

[1]: https://dvc.org/



That's an interesting system, but this is pretty "in your face" -- it seems like to takes over some git ops and other steps.

We went the other way and made an "invisible" system -- there were two scripts: one would move data file to cache, upload it, and create metadata file that needs to be checked in; and second one which will take a list of metadata files and make sure that each of them is downloaded to cache. We then modified all of our code to call second script before trying to read large files.

The overall experience was that one didn't have to know anything about our large-file system unless they wanted to add a new file. It is just sometimes, one would run a program and that program would automatically download few large data files.

(We could do this really simple design because (1) we had a centralized large-file storage, (2) all of our large files were immutable and (3) they were only consumed by our code which we could modify)




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: