Why is ollama so many people’s go-to? Genuinely curious, I’ve tried it but it feels overly stripped down / dumbed down vs nearly everything else I’ve used.
Lately I’ve been playing with Unsloth Studio and think that’s probably a much better “give it to a beginner” default.
Ollama is good enough to dabble with, and getting a model is as easy as ollama pull <model name> vs figuring it out by yourself on hugging face and trying to make sense on all the goofy letters and numbers between the forty different names of models, and not needing a hugging face account to download.
So you start there and eventually you want to get off the happy path, then you need to learn more about the server and it's all so much more complicated than just using ollama. You just want to try models, not learn the intricacies of hosting LLMs.
to be fair, llama.cpp has gotten much easier to use lately with llama-server -hf <model name>. That said, the need to compile it yourself is still a pretty big barrier for most people.
I started with ollama and now I'm using llama.cpp/llama-server's Router Mode that allows you to manage multiple models through a single server instance.
One thing I haven't figured out: Subjectively, it feels like ollama's model loading was nearly instant, while I feel like I'm always waiting for llama.cpp to load models, but that doesn't make sense because it's ultimately the same software. Maybe I should try ollama again to convince myself that I'm not crazy and that ollama's model loading wasn't actually instant.
> That said, the need to compile it yourself is still a pretty big barrier for most people.
My distro (NixOS) has binary packages though...
And there's packages in the AUR (Arch), GURU (Gentoo), and even Debian Unstable. Now, these might be a little behind, but if you care that much you can download binaries from GitHub directly.
Ollama got some first-mover advantage at the time when actually building and git pulling llama.cpp was a bit of a moat. The devs' docker past probably made them overestimate how much they could lay claim to mindshare. However, no one really could have known how quickly things would evolve... Now I mostly recommend LM-studio to people.
LM Studio has been around longer. I’ve used it since three years ago. I’d also agree it is generally a better beginner choice then and now.
Unsloth Studio is more featureful (well integrated tool calling, web search, and code execution being headline features), and comes from the people consistently making some of the best GGUF quants of all popular models. It also is well documented, easy to setup, and also has good fine-tuning support.
I run Little Snitch[1] on my Mac, and I haven't seen LM Studio make any calls that I feel like it shouldn't be making.
Point it to a local models folder, and you can firewall the entire app if you feel like it.
Digressing, but the issue with open source software is that most OSS software don't understand UX. UX requires a strong hand and opinionated decision making on whether or not something belongs front-and-center and it's something that developers struggle with. The only counterexample I can think of is Blender and it's a rare exception and sadly not the norm.
LM Studio manages the backend well, hides its complexities and serves as a good front-end for downloading/managing models. Since I download the models to a shared common location, If I don't want to deal with the LM Studio UX, I then easily use the downloaded models with direct llama.cpp, llama-swap and mlx_lm calls.
Ollama's org had people flood various LLM/programming related Reddits and Discords and elsewhere, claiming it was an 'easy frontend for llama.cpp', and tricked people.
Only way to win is to uninstall it and switch to llama.cpp.
Ollama user with the opposite question -- why not? What am I missing out on? I'm using it as the backend for playing with other frontend stuff and it seems to work just fine.
And as someone running at 16gb card, I'm especially curious as to if I'm missing out on better performance?
Ollama has had bad defaults forever (stuck on a default CTX of 2048 for like 2 years) and they typically are late to support the latest models vs llamacpp. Absolutely no reason to use it in 2026.
> Ollama user with the opposite question -- why not? What am I missing out on? I'm using it as the backend for playing with other frontend stuff and it seems to work just fine.
Used to be an Ollama user. Everything that you cite as benefits for Ollama is what I was drawn to in the first place as well, then moved on to using llama.cpp directly. Apart from being extremely unethical, The issue is that they try to abstract away a bit too much, especially when LLM model quality is highly affected by a bunch of parameters. Hell you can't tell what quant you're downloading. Can you tell at a glance what size of model's downloaded? Can you tell if it's optimized for your arch? Or what Quant?
`ollama pull gemma4`
(Yes, I know you can add parameters etc. but the point stands because this is sold as noob-friendly. If you are going to be adding cli params to tweak this, then just do the same with llama.cpp?)
That became a big issue when Deep Seek R1 came out because everyone and their mother was making TikToks saying that you can run the full fat model without explaining that it was a distill, which Ollama had abstracted away. Running `ollama run deepseek-r1` means nothing when the quality ranges from useless to super good.
> And as someone running at 16gb card, I'm especially curious as to if I'm missing out on better performance?
I'd go so far as to say, I can *GUARANTEE* you're missing out on performance if you are using Ollama, no matter the size of your GPU VRAM. You can get significant improvement if you just run underlying llama.cpp.
Secondly, it's chock full of dark patterns (like the ones above) and anti-open source behavior. For some examples:
1. It mangles GGUF files so other apps can't use them, and you can't access them either without a bunch of work on your end (had to script a way to unmangle these long sha-hashed file names)
2. Ollama conveniently fails contribute improvements back to the original codebase (they don't have to technically thanks to MIT), but they didn't bother assisting llama.cpp in developing multimodal capabilities and features such as iSWA.
3. Any innovations to the do is just piggybacking off of llama.cpp that they try to pass off as their own without contributing back to upstream. When new models come out they post "WIP" publicly while twiddling their thumbs waiting for llama.cpp to do the actual work.
It operates in this weird "middle layer" where it is kind of user friendly but it’s not as user friendly as LM Studio.
After all this, I just couldn't continue using it. If the benefits it provides you are good, then by all means continue.
IMO just finding the most optimal parameters for a models and aliasing them in your cli would be a much better experience ngl, especially now that we have llama-server, a nice webui and hot reloading built into llama.cpp
> 1. It mangles GGUF files so other apps can't use them, and you can't access them either without a bunch of work on your end (had to script a way to unmangle these long sha-hashed file names)
This is what pushed me away from Ollama. All I wanted was to scp a model from one machine to another so I didn't have to re-download it and waste bandwidth. But Ollama makes it annoying, so I switched to llama.cpp. I did also find slightly better performance on CPU vs Ollama, likely due to compiling with -march=native.
> (they don't have to technically thanks to MIT)
Minor nit: I'm not aware of any license that requires improvements to be upstreamed. Even GPL just requires that you publish derivative source code under the GPL.
A full-module add-on in this power class is about $7 at 1,000 unit scale [0]. It would be around $3 with your own custom PCB design in terms of BoM addon at scale. That’s power only. Add another dollar or two for 10/100 PHY.
The trick is as others have said in what adding it to your design does in terms of complicating compliance design.
I never felt during this era that the information about these chips was hard to come by as the author claims. Retrospectively I appreciate that’s because I grew up living by a large, well funded library in a tech centric town, so they always had all the latest tech publications.
I started this project after watching Andrej Karpathy's recent interview on No Priors where he explained that he had to hand-write microgpt, a 200-line GPT implementation in Python which distills the essence of all the algorithms behind creating Transformers, because the LLMs he asked weren't able to do it.
I wanted to test if this is still true: whether a "microgpt" in that spirit could be brought into existence with minimal manual intervention, just clear expression of intent to an LLM. This is an experiment not just in producing a tiny GPT artifact, but in seeing how close you can get to the essence of microgpt just through careful prompting, without writing a single line yourself.
The year is 2006 and Netvibes is hosting a huge party in San Francisco after raising in the Web 2.0 craze. They are yet to find out they will become a footnote in history to be rediscovered in 20 years’ time.
This is very similar to a project I created https://github.com/Entrpi/autonomy-golf and have been using as a gamified development process on active projects.
The key insight was to not just handwave or guess at how much is automated, but make evaluation and review part of the continuous development loop. I first implemented in https://github.com/Entrpi/autoresearch-everywhere where I used it to deliberately automate more, in the spirit of Karpathy's upstream (and to very good effect. I have some of the best autoresearch results anywhere, and the platform is far more robust than it started).
I like these efforts to neatly categorise the extent of AI usage in a project. I do think they need some kind of neutrally worded classification but this and the original post are fine attempts at this emerging niche. It's important to some of us and I look forward to what ends being adopted.
I liked this not because it's a good story. It is, but that's beside the point. I liked this because it's my story. Not literally so, but the shape of it is. He's struck a nerve at the heart of growing up eager and curious and seeing a computer as a pathway to your dreams.
This is very much in line with what I found fascinating about optimizing microgpt for speed (0). Or rather, what I was able to do with it after doing so. It's so small and so fast to train, you can really dig deep into the optimization landscape. I've spent all my free time this past week digging into it.
0: https://entrpi.github.io/eemicrogpt/
(The writeup is from a few days ago, and I'm still running experiments before I do a big rewrite. Slowrun is good food for thought.)
Lately I’ve been playing with Unsloth Studio and think that’s probably a much better “give it to a beginner” default.
reply