Depending on how powerful your model is, a few tokens per second per data center would still be extraordinarily valuable. It's not out of the realm of possibility that a next generation super intelligence could be trained with a couple hundred lines of pytorch. If that's the case, a couple tokens per second per data center is a steal.