A really powerful server should not cost you anywhere near $40k unless you're renting bare metal in AWS or something like that.
Getting rid of the overhead is possible but hard, unless you're willing to sacrifice things people really want.
1. Docker. Adds a few hundred msec of startup time to containers, configuration complexity, daemons, disk caches to manage, repositories .... a lot of stuff. In rigorously controlled corp environments it's not needed. You can just have a base OS distro that's managed centrally and tell people to target it. If they're building on e.g. the JVM then Docker isn't adding much. I don't use it on my own companies CI cluster for example, it's just raw TeamCity agents on raw machines.
2. VMs. Clouds need them because they don't trust the Linux kernel to isolate customers from each other, and they want to buy the biggest machines possible and then subdivide them. That's how their business model works. You can solve this a few ways. One is something like Firecracker where they make a super bare bones VM. Another would be to make a super-hardened version of Linux, so hardened people trust it to provide inter-tenant isolation. Another way would be a clean room kernel designed for security from day one (e.g. written in Rust, Java or C#?)
3. Drives on a distributed network. Honestly not sure why this is needed. For CI runners entirely ephemeral VMs running off read only root drive images should be fine. They could swap to local NVMe storage. I think the big clouds don't always like to offer this because they have a lot of machines with no local storage whatsoever, as that increases the density and allows storage aggregation/binpacking, which lowers their costs.
Basically a big driver of overheads is that people want to be in the big clouds because it avoids the need to do long term planning or commit capital spend to CI, but the cloud is so popular that providers want to pack everyone in as tightly as possible which requires strong isolation and the need to avoid arbitrary boundaries caused by physical hardware shapes.
Getting rid of the overhead is possible but hard, unless you're willing to sacrifice things people really want.
1. Docker. Adds a few hundred msec of startup time to containers, configuration complexity, daemons, disk caches to manage, repositories .... a lot of stuff. In rigorously controlled corp environments it's not needed. You can just have a base OS distro that's managed centrally and tell people to target it. If they're building on e.g. the JVM then Docker isn't adding much. I don't use it on my own companies CI cluster for example, it's just raw TeamCity agents on raw machines.
2. VMs. Clouds need them because they don't trust the Linux kernel to isolate customers from each other, and they want to buy the biggest machines possible and then subdivide them. That's how their business model works. You can solve this a few ways. One is something like Firecracker where they make a super bare bones VM. Another would be to make a super-hardened version of Linux, so hardened people trust it to provide inter-tenant isolation. Another way would be a clean room kernel designed for security from day one (e.g. written in Rust, Java or C#?)
3. Drives on a distributed network. Honestly not sure why this is needed. For CI runners entirely ephemeral VMs running off read only root drive images should be fine. They could swap to local NVMe storage. I think the big clouds don't always like to offer this because they have a lot of machines with no local storage whatsoever, as that increases the density and allows storage aggregation/binpacking, which lowers their costs.
Basically a big driver of overheads is that people want to be in the big clouds because it avoids the need to do long term planning or commit capital spend to CI, but the cloud is so popular that providers want to pack everyone in as tightly as possible which requires strong isolation and the need to avoid arbitrary boundaries caused by physical hardware shapes.