For reference. The llama.cpp people are not going smaller. Most of those models ...

		imtringued on May 7, 2024 \| parent \| context \| favorite \| on: Apple introduces M4 chip For reference. The llama.cpp people are not going smaller. Most of those models run on 32 bit floats with the dequantization happening on the fly.