On a CPU I'd estimate it would get a maximum of around 5 tokens per second (a token being a sub-word token, so generally a couple of letters). I suspect it'd be more like 1 token per second on the large model without additional optimisation.
Yes models can be split up. See eg Hugging Face Accelerate.
I'd expect significant performance improvements over the next few months are more people work on this in the same way the stable diffusion is now fairly usable on a CPU. It's always going to be slow on a CPU, but the smaller models might be usable for experimentation at some point.
Yes models can be split up. See eg Hugging Face Accelerate.