Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm out of the loop for local models. For my M3 24gb ram macbook, what token throughput can I expect?

Edit: I tried it out, I have no idea in terms of of tokens but it was fluid enough for me. A bit slower than using o3 in the browser but definitely tolerable. I think I will set it up in my GF's machine so she can stop paying for the full subscription (she's a non-tech professional)



Apple M4 Pro w/ 48GB running the smaller version. I'm getting 43.7t/s


Curious if anyone is running this on a AMD Ryzen AI Max+ 395 and knows the t/s.


3 year old M1 MacBook Pro 32gb, 42 tokens/sec on lm studio

Very much usable


Wondering about the same for my M4 max 128 gb


It should fly on your machine


Yeah, was super quick and easy to set up using Ollama. I had to kill some processes first to avoid memory swap though (even with 128gb memory). So a slightly more quantized version is maybe ideal, for me at least.

Edit: I'm talking about the 120B model of course


40 t/s




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: