I imagine you need to make and destroy sandboxed environments quite often. How f...

topsycatt · 2025-04-01T14:26:49 1743517609

Sorry for the delay in replying!

We actually use gVisor (as stated in the article) and it has a very nifty feature called checkpoint_restore (https://gvisor.dev/docs/user_guide/checkpoint_restore/) which lets us start up sandboxes extremely efficiently. Then the filesystem is just a CoW overlay.

ryao · 2025-04-02T05:11:47 1743570707

Thanks for the response. I had misread the article’s description of gVisor and mistook it as something meant to protect the rest of the system rather than something that handled the filesystem part of the sandbox. It is an interesting tool.

dullcrisp · 2025-03-29T14:43:24 1743259404

What’s ZFS? That doesn’t sound like a Google internal tool I’ve ever heard of.

x-complexity · 2025-03-29T15:08:09 1743260889

https://en.wikipedia.org/wiki/ZFS

It's a filesystem, to put it simply.

2OEH8eoCRo0 · 2025-03-29T14:46:52 1743259612

Oh boy. Get ready for the zealots

blixt · 2025-03-29T10:27:57 1743244077

Seconding this. Also curious if this is done with microkernels (I put Unikraft high on the list of tech I'd use for this kind of problem, or possibly the still-in-beta CodeSandbox SDK – and maybe E2B or Fly but didn't have as good experiences with those).

luke-stanley · 2025-03-29T09:02:34 1743238954

I use ZFS, but isn't the situation the sandbox is in totally different? Why would it be optimal?

ryao · 2025-03-29T17:25:37 1743269137

If you are making sandboxes, you need to put the files in place each time. With ZFS clones, you can keep referencing the same files repeatedly, so the amount of changes to memory needed to create an environment are minimized. Let’s say the sandbox is 1GB and each clone operation does less than 1MB of memory writes. Then you have a >1000x reduction in writing needed to make the environment.

Furthermore, ZFS ARC should treat each read operation of the same files as reading the same thing, while a sandbox made the traditional way would treat the files as unique, since they would be full copies of each other rather than references. ZFS on the other hand should only need to keep a single copy of the files cached for all environments. This reduces memory requirements dramatically. Unfortunately, the driver has double caching on mmap()’ed reads, but the duplication will only be on the actual files accessed and the copies will be from memory rather than disk. A modified driver (e.g. OSv style) would be able to eliminate the double caching for mmap’ed reads, but that is a future enhancement.

In any case, ZFS clones should have clear advantages over the more obvious way of extracting a tarball every time you need to make a new sandbox for a Python execution environment.

o11c · 2025-03-29T17:39:36 1743269976

It's worth noting that if you go down a layer, LVM snapshots are filesystem-independent.

ryao · 2025-03-29T18:06:57 1743271617

You need to preallocate space on LVM2 for storing changes and if it fills, bad things happen. You have write amplification of 4MB per write by default on LVM2, while ZFS just writes what is needed, since LVM2 isn't aware of the filesystem structures. All of the advantages WRT cache are gone if you use LVM2 too. Correct me if I am wrong.

That said, if you really want to use block devices, you could use zvols to get something similar to LVM2 out of ZFS, but it is not as good as using snapshots on ZFS' filesystems. The write amplification would be lower by default (8KB versus 4MB). The page cache would still duplicate data, but the buffer cache duplication should be bypassed if I recall correctly.

RunningDroid · 2025-03-29T16:13:44 1743264824

I believe they were referring to the use of ZFS snapshots for a Copy-on-Write type setup