I'm out of the loop for local models. For my M3 24gb ram macbook, what token throughput can I expect?
Edit: I tried it out, I have no idea in terms of of tokens but it was fluid enough for me. A bit slower than using o3 in the browser but definitely tolerable. I think I will set it up in my GF's machine so she can stop paying for the full subscription (she's a non-tech professional)
Yeah, was super quick and easy to set up using Ollama. I had to kill some processes first to avoid memory swap though (even with 128gb memory). So a slightly more quantized version is maybe ideal, for me at least.
Edit: I tried it out, I have no idea in terms of of tokens but it was fluid enough for me. A bit slower than using o3 in the browser but definitely tolerable. I think I will set it up in my GF's machine so she can stop paying for the full subscription (she's a non-tech professional)