To borrow a concept of cloud server renting, there's also the factor of oversell...

To borrow a concept of cloud server renting, there's also the factor of overselling. Most open source LLM operators probably oversell quite a bit - they don't scale up resources as fast as OpenAI/Anthropic when requests increase. I notice many openrouter providers are noticeably faster during off hours.

In other words, it's not just the model size, but also concurrent load and how many gpus do you turn on at any time. I bet the big players' cost is quite a bit higher than the numbers on openrouter, even for comparable model parameters.