Yes, but usage is not uniform even when you have millions of users. It smooths the usage lines, but the peaks and troughs become more extreme the more users you have. At 3am usage in the US goes down to effectively 0. Maybe you can use the compute for Asia customers, but then you compete with local compute that has far better latency.
Then you have seasonal peaks/troughs, such as the school year vs summer.
When you want 4 9s of uptime and good latency, you either have to overprovision hardware and eat idling costs, or rent compute and pay overhead. Both cost a lot.
> But most people want output now, not in 10 hours.
At 65t/s, that's 2.5 million tokens output.