Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is there any solution that is 1- fully local 2- open source 3- fast on CPU only 4- provide reasonable good results for smart auto complete ?

I don't want my work to depend on proprietary or even worse, online, software. We (software engineer) got lucky that all the good tools are free software and I feel we have a collective interest in making sure it stays that way (unless we want to be like farmers having to pay Monsanto tax for just being able to work because we don't know how to work differently anymore)



Fast on cpu it's just not realistic.

Open source, fast and good: openrouter with opensource models (Qwen, Llama,etc...) It's not local but these is no vendor lockin, you can switch to another provider or invest in a gpu.


> 3- fast on CPU only

Unless you've got a CPU with AI-specific accelerators and unified memory, I doubt you're going to find that.

I can't imagine any model under 7B parameters is useful, and even with dual-channel DDR5-6400 RAM (Which I think is 102 GB/s?) and 8-bit quantization, you could only generate 15 tokens/sec, and that's assuming your CPU can actually process that fast. Your memory bandwidth could easily be the bottleneck.

EDIT: If I have something wrong, I'd rather be corrected so I'm not spreading incorrect information, rather than being silently downvoted.


deepseek-1b, qwen2.5-coder:1.5b, and starcoder2-3b are all pretty fast on cpu due to their small size, you're not going to be able to have conversations with them or ask them to perform transformations on your code but autocomplete should work well


Starcoder3B works great in my second hand rtx2080. Can’t run 7B, just a hair too little Ram, but still great completions


You should definitely be able to run 7B at q6_k and that might be outperformed by 15b w/ a sub 4bpw imatrix quant, iQ3_M should fit into your vram. (i personally wouldn't bother with sub 4bpw quants on models < ~70b parameters)

Though if it all works great for you then no reason to mess with it, but if you want to tinker you can absolutely run larger models at smaller quant sizes, q6_k is basically indistinguishable from fp16 so there's no real downside.


You can set up Cursor with a local LLM, connected via an open source ngrok substitute (can’t remember which ones are good). Only recommend doing this on a Mac with a lot of ram so you can use one of the actually useful coding models though (e.g. QwenCoder2.5-32b).


Continue VSCode extension.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: