Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For reference. The llama.cpp people are not going smaller. Most of those models run on 32 bit floats with the dequantization happening on the fly.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: