More

easygenes · 2026-04-03T10:40:31 1775212831

Why is ollama so many people’s go-to? Genuinely curious, I’ve tried it but it feels overly stripped down / dumbed down vs nearly everything else I’ve used.

Lately I’ve been playing with Unsloth Studio and think that’s probably a much better “give it to a beginner” default.

diflartle · 2026-04-03T11:52:57 1775217177

Ollama is good enough to dabble with, and getting a model is as easy as ollama pull <model name> vs figuring it out by yourself on hugging face and trying to make sense on all the goofy letters and numbers between the forty different names of models, and not needing a hugging face account to download.

So you start there and eventually you want to get off the happy path, then you need to learn more about the server and it's all so much more complicated than just using ollama. You just want to try models, not learn the intricacies of hosting LLMs.

flux3125 · 2026-04-03T14:36:04 1775226964

to be fair, llama.cpp has gotten much easier to use lately with llama-server -hf <model name>. That said, the need to compile it yourself is still a pretty big barrier for most people.

ryandrake · 2026-04-03T16:55:38 1775235338

I started with ollama and now I'm using llama.cpp/llama-server's Router Mode that allows you to manage multiple models through a single server instance.

One thing I haven't figured out: Subjectively, it feels like ollama's model loading was nearly instant, while I feel like I'm always waiting for llama.cpp to load models, but that doesn't make sense because it's ultimately the same software. Maybe I should try ollama again to convince myself that I'm not crazy and that ollama's model loading wasn't actually instant.

dTal · 2026-04-03T20:21:37 1775247697

You don't need to compile it yourself though? Unless you want CUDA support on Linux I guess, dunno why you'd need such a silly thing though:

https://github.com/ggml-org/llama.cpp/releases

MarsIronPI · 2026-04-03T21:59:09 1775253549

> That said, the need to compile it yourself is still a pretty big barrier for most people.

My distro (NixOS) has binary packages though...

And there's packages in the AUR (Arch), GURU (Gentoo), and even Debian Unstable. Now, these might be a little behind, but if you care that much you can download binaries from GitHub directly.

polotics · 2026-04-03T11:07:21 1775214441

Ollama got some first-mover advantage at the time when actually building and git pulling llama.cpp was a bit of a moat. The devs' docker past probably made them overestimate how much they could lay claim to mindshare. However, no one really could have known how quickly things would evolve... Now I mostly recommend LM-studio to people.

What does unsloth-studio bring on top?

easygenes · 2026-04-03T11:14:34 1775214874

LM Studio has been around longer. I’ve used it since three years ago. I’d also agree it is generally a better beginner choice then and now.

Unsloth Studio is more featureful (well integrated tool calling, web search, and code execution being headline features), and comes from the people consistently making some of the best GGUF quants of all popular models. It also is well documented, easy to setup, and also has good fine-tuning support.

xenophonf · 2026-04-03T12:50:16 1775220616

LM Studio isn't free/libre/open source software, which misses the point of using open weights and open source LLMs in the first place.

vonneumannstan · 2026-04-03T13:32:28 1775223148

Disagree, there are a lot of reasons to use open source local LLMs that aren't related to free/libre/oss principles. Privacy being a major one.

ekianjo · 2026-04-03T15:22:33 1775229753

If you care about privacy making sure the closed source software does not call home is a concern...

the_lucifer · 2026-04-03T17:37:42 1775237862

I run Little Snitch[1] on my Mac, and I haven't seen LM Studio make any calls that I feel like it shouldn't be making.

Point it to a local models folder, and you can firewall the entire app if you feel like it.

Digressing, but the issue with open source software is that most OSS software don't understand UX. UX requires a strong hand and opinionated decision making on whether or not something belongs front-and-center and it's something that developers struggle with. The only counterexample I can think of is Blender and it's a rare exception and sadly not the norm.

LM Studio manages the backend well, hides its complexities and serves as a good front-end for downloading/managing models. Since I download the models to a shared common location, If I don't want to deal with the LM Studio UX, I then easily use the downloaded models with direct llama.cpp, llama-swap and mlx_lm calls.

[1]: https://obdev.at

DiabloD3 · 2026-04-03T12:42:56 1775220176

Advertising, mostly.

Ollama's org had people flood various LLM/programming related Reddits and Discords and elsewhere, claiming it was an 'easy frontend for llama.cpp', and tricked people.

Only way to win is to uninstall it and switch to llama.cpp.

linolevan · 2026-04-03T16:01:57 1775232117

What I really don't get is why more people don't talk about LMStudio, I switched to it months ago and it seems like a straight upgrade.

alfiedotwtf · 2026-04-03T17:01:24 1775235684

Isn’t LMStudio closed source?

brcmthrowaway · 2026-04-03T16:32:23 1775233943

How does LMStudio compare to Unsloth Studio?

jrm4 · 2026-04-03T14:54:57 1775228097

Ollama user with the opposite question -- why not? What am I missing out on? I'm using it as the backend for playing with other frontend stuff and it seems to work just fine.

And as someone running at 16gb card, I'm especially curious as to if I'm missing out on better performance?

ekianjo · 2026-04-03T15:24:07 1775229847

Ollama has had bad defaults forever (stuck on a default CTX of 2048 for like 2 years) and they typically are late to support the latest models vs llamacpp. Absolutely no reason to use it in 2026.

the_lucifer · 2026-04-03T18:06:50 1775239610

> Ollama user with the opposite question -- why not? What am I missing out on? I'm using it as the backend for playing with other frontend stuff and it seems to work just fine.

Used to be an Ollama user. Everything that you cite as benefits for Ollama is what I was drawn to in the first place as well, then moved on to using llama.cpp directly. Apart from being extremely unethical, The issue is that they try to abstract away a bit too much, especially when LLM model quality is highly affected by a bunch of parameters. Hell you can't tell what quant you're downloading. Can you tell at a glance what size of model's downloaded? Can you tell if it's optimized for your arch? Or what Quant?

`ollama pull gemma4`

(Yes, I know you can add parameters etc. but the point stands because this is sold as noob-friendly. If you are going to be adding cli params to tweak this, then just do the same with llama.cpp?)

That became a big issue when Deep Seek R1 came out because everyone and their mother was making TikToks saying that you can run the full fat model without explaining that it was a distill, which Ollama had abstracted away. Running `ollama run deepseek-r1` means nothing when the quality ranges from useless to super good.

> And as someone running at 16gb card, I'm especially curious as to if I'm missing out on better performance?

I'd go so far as to say, I can *GUARANTEE* you're missing out on performance if you are using Ollama, no matter the size of your GPU VRAM. You can get significant improvement if you just run underlying llama.cpp.

Secondly, it's chock full of dark patterns (like the ones above) and anti-open source behavior. For some examples:

1. It mangles GGUF files so other apps can't use them, and you can't access them either without a bunch of work on your end (had to script a way to unmangle these long sha-hashed file names) 2. Ollama conveniently fails contribute improvements back to the original codebase (they don't have to technically thanks to MIT), but they didn't bother assisting llama.cpp in developing multimodal capabilities and features such as iSWA. 3. Any innovations to the do is just piggybacking off of llama.cpp that they try to pass off as their own without contributing back to upstream. When new models come out they post "WIP" publicly while twiddling their thumbs waiting for llama.cpp to do the actual work.

It operates in this weird "middle layer" where it is kind of user friendly but it’s not as user friendly as LM Studio.

After all this, I just couldn't continue using it. If the benefits it provides you are good, then by all means continue.

IMO just finding the most optimal parameters for a models and aliasing them in your cli would be a much better experience ngl, especially now that we have llama-server, a nice webui and hot reloading built into llama.cpp

MarsIronPI · 2026-04-03T22:06:50 1775254010

> 1. It mangles GGUF files so other apps can't use them, and you can't access them either without a bunch of work on your end (had to script a way to unmangle these long sha-hashed file names)

This is what pushed me away from Ollama. All I wanted was to scp a model from one machine to another so I didn't have to re-download it and waste bandwidth. But Ollama makes it annoying, so I switched to llama.cpp. I did also find slightly better performance on CPU vs Ollama, likely due to compiling with -march=native.

> (they don't have to technically thanks to MIT)

Minor nit: I'm not aware of any license that requires improvements to be upstreamed. Even GPL just requires that you publish derivative source code under the GPL.

wolvoleo · 2026-04-03T14:35:14 1775226914

For me it's just the server. I use openwebui as interface. I don't want it all running on the same machine.

easygenes · 2026-04-03T09:28:59 1775208539

A full-module add-on in this power class is about $7 at 1,000 unit scale [0]. It would be around $3 with your own custom PCB design in terms of BoM addon at scale. That’s power only. Add another dollar or two for 10/100 PHY.

The trick is as others have said in what adding it to your design does in terms of complicating compliance design.

[0] https://www.digikey.com/en/products/detail/silvertel/AG9705-...

easygenes · 2026-04-01T14:32:58 1775053978

The historic charts are at the bottom of the page, btw.

easygenes · 2026-03-26T15:43:34 1774539814

I never felt during this era that the information about these chips was hard to come by as the author claims. Retrospectively I appreciate that’s because I grew up living by a large, well funded library in a tech centric town, so they always had all the latest tech publications.

easygenes · 2026-03-24T05:13:56 1774329236

I started this project after watching Andrej Karpathy's recent interview on No Priors where he explained that he had to hand-write microgpt, a 200-line GPT implementation in Python which distills the essence of all the algorithms behind creating Transformers, because the LLMs he asked weren't able to do it.

I wanted to test if this is still true: whether a "microgpt" in that spirit could be brought into existence with minimal manual intervention, just clear expression of intent to an LLM. This is an experiment not just in producing a tiny GPT artifact, but in seeing how close you can get to the essence of microgpt just through careful prompting, without writing a single line yourself.

easygenes · 2026-03-22T12:14:28 1774181668

The year is 2006 and Netvibes is hosting a huge party in San Francisco after raising in the Web 2.0 craze. They are yet to find out they will become a footnote in history to be rediscovered in 20 years’ time.

easygenes · 2026-03-16T04:57:20 1773637040

This is very similar to a project I created https://github.com/Entrpi/autonomy-golf and have been using as a gamified development process on active projects.

The key insight was to not just handwave or guess at how much is automated, but make evaluation and review part of the continuous development loop. I first implemented in https://github.com/Entrpi/autoresearch-everywhere where I used it to deliberately automate more, in the spirit of Karpathy's upstream (and to very good effect. I have some of the best autoresearch results anywhere, and the platform is far more robust than it started).

lsh0 · 2026-03-16T06:53:52 1773644032

I like these efforts to neatly categorise the extent of AI usage in a project. I do think they need some kind of neutrally worded classification but this and the original post are fine attempts at this emerging niche. It's important to some of us and I look forward to what ends being adopted.

easygenes · 2026-03-13T04:58:03 1773377883

I liked this not because it's a good story. It is, but that's beside the point. I liked this because it's my story. Not literally so, but the shape of it is. He's struck a nerve at the heart of growing up eager and curious and seeing a computer as a pathway to your dreams.

qnleigh · 2026-03-15T07:01:43 1773558103

100%. Few things on this site have resonated with me so much.

easygenes · 2026-03-11T12:52:15 1773233535

Cool! I’ve been working on adding the same thing for Apple Silicon within my general “make autoresearch a serious tool” project here: https://github.com/Entrpi/autoresearch-everywhere

easygenes · 2026-03-05T20:56:02 1772744162

This is very much in line with what I found fascinating about optimizing microgpt for speed (0). Or rather, what I was able to do with it after doing so. It's so small and so fast to train, you can really dig deep into the optimization landscape. I've spent all my free time this past week digging into it.

0: https://entrpi.github.io/eemicrogpt/ (The writeup is from a few days ago, and I'm still running experiments before I do a big rewrite. Slowrun is good food for thought.)