I recently wrote a guide on getting: - llama.cpp - OpenCode - Qwen3-Coder-30B-A3...

freeone3000 · 2026-02-28T22:52:21 1772319141

We can also run LM Studio and get it installed with one search and one click, exposed through an OpenAI-compatible API.

kpw94 · 2026-02-28T23:07:01 1772320021

On my 32GB Ryzen desktop (recently upgraded from 16GB before the RAM prices went up another +40%), did the same setup of llama.cpp (with Vulkan extra steps) and also converged on Qwen3-Coder-30B-A3B-Instruct (also Q4_K_M quantization)

On the model choice: I've tried latest gemma, ministral, and a bunch of others. But qwen was definitely the most impressive (and much faster inference thanks to MoE architecture), so can't wait to try Qwen3.5-35B-A3B if it fits.

I've no clue about which quantization to pick though ... I picked Q4_K_M at random, was your choice of quantization more educated?

zargon · 2026-03-01T02:23:03 1772331783

Quant choice depends on your vram, use case, need for speed, etc. For coding I would not go below Q4_K_M (though for Q4, unsloth XL or ik_llama IQ quants are usually better at the same size). Preferably Q5 or even Q6.

robby_w_g · 2026-02-28T22:51:57 1772319117

Does your MBP have 32 GB of ram? I’m waiting on a local model that can run decently on 16 GB

copperx · 2026-02-28T22:24:51 1772317491

How fast does it run on your M1?