Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Unfortunately I don't have deep knowledge of our infrastructure, so I can't answer all of these questions myself.

That said, a decent chunk of this info can be found in the discussions on our Infrastructure issue tracker[1].

The last infrastructure update[2] includes some slide decks that contain more data (albeit it's now a little under 2 months old).

Looking at our internal Grafana instance, it looks like we're using about 1.25 TiB combined on NFS and just under 16 TiB on Ceph. We're working on migrating the data currently hosted on Ceph back to NFS soon[3].

I'll get someone from the infrastructure team to respond with more info.

[1]: https://gitlab.com/gitlab-com/infrastructure/issues [2]: https://about.gitlab.com/2016/09/26/infrastructure-update/ [3]: https://gitlab.com/gitlab-com/infrastructure/issues/711



>>If one of the hosts delays writing to the journal, then the rest of the fleet is waiting for that operation alone, and the whole file system is blocked. When this happens, all of the hosts halt, and you have a locked file system; no one can read or write anything and that basically takes everything down.

I have seen similar issues where a GC pause on one server, freeze the entire cluster.

Is this one single monolithic file system? On the service side, can the code be asynchronous with request queues for each shard? This can help free up threads from getting blocked and serve requests for other shards.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: