Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Did you run it the best way possible? im no expert, but I understand it can affect inference time greatly (which format/engine is used)


I ran it via Ollama, which I assume uses the best way. Screenshot in my post here: https://bsky.app/profile/pamelafox.bsky.social/post/3lvobol3...

I'm still wondering why my MPU usage was so low.. maybe Ollama isn't optimized for running it yet?


Might need to wait on MLX




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: