Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

When you combine that with serving millions of users, it also gets amortised over several million users.

> But most people want output now, not in 10 hours.

At 65t/s, that's 2.5 million tokens output.



Yes, but usage is not uniform even when you have millions of users. It smooths the usage lines, but the peaks and troughs become more extreme the more users you have. At 3am usage in the US goes down to effectively 0. Maybe you can use the compute for Asia customers, but then you compete with local compute that has far better latency.

Then you have seasonal peaks/troughs, such as the school year vs summer.

When you want 4 9s of uptime and good latency, you either have to overprovision hardware and eat idling costs, or rent compute and pay overhead. Both cost a lot.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: