Hacker Newsnew | past | comments | ask | show | jobs | submit | mwcampbell's commentslogin

> Boilerplate and scaffolding

Have we really reached the limit of how much we can reliably automate these things via good old metaprogramming and/or generator scripts, without resorting to using unreliable and expensive statistical models via imprecise natural language?

> Refusing to use AI out of principle is as irrational as adopting it out of hype.

I'm not sure about this. For some people, holding consistently to a principle may be as satisfying, or even necessary, as the dopamine hit of creation mentioned in the article.


I spend a lot of time side by side with other devs watching them code and providing guidance. A trend I'm starting to sense is that developer velocity is just as much hindered by unfamiliarity with their tools as much as it is wrestling with the core problem they really want to solve.

When to use your mouse, when to use your keyboard, how to locate a file you want to look at in your terminal or IDE, how to find commands you executed last week, etc. It's all lacking. When devs struggle with these fundamentals, I suspect the desire to bypass all this with a singular "just ask the LLM" interface increases.

So when orgs focus on a "devs should use LLMs more to accelerate", I really wish the focus was more "find ways to accelerate", which could more reliably mean "get more proficient with your tools".

I think there's a lot of good that can be gained from formalizing conventions with templating engines (another tool worth learning), rather than relying on stochastic template generation.


We had Joel Spolsky saying that users don't read back in 2000, around the same time that Steve Krug published _Don't Make Me Think_: https://www.joelonsoftware.com/2000/04/26/designing-for-peop...

So was the learned helplessness already ingrained by 2000? How far back does it go?


Every little detail matters though. In SQL, do you want your database field to have limited length? If so, pay attention to validation, including cases where the field's content is built up in some other way than just entering text in a free-form text field (e.g. stuffing JSON into a database field). If not, make sure you don't use some generic "string" field type provided by your database abstraction layer that has an implicit limited length. Want to guess why that scenario's on my mind? Yeah, I neglected to pay attention to that detail, and an LLM might too. In CSS, little details affect the accessibility of the UI.

So we need to pay attention to every detail that doesn't have a single obviously correct answer, and keep the volume of code we're producing to a manageable enough level that we actually can pay attention to those details. In cases where one really is just literally moving data from here to there, then we should use reliable, deterministic code generation on top of a robust abstraction, e.g. Rust's serde, to take care of that gruntwork. Where that's not possible, there are details that need our attention. We shouldn't use unreliable statistical text generators to try to push past those details.


> So we need to pay attention to every detail that doesn't have a single obviously correct answer

I really, really wish that were the case. But look at the modern web. Look at iOS apps. Look at how long discord takes to launch on a modern computer. Look how big and slow everything is. Most end user applications released today do not pay attention to those small details. Definitely not in early versions of the software. And they're still successful. At least, successful enough.

I'd love a return to the "good old days" where we count bytes and make tight, fast software with tiny binaries that can perform well even on 20 year old computers. But I've been outvoted. There aren't enough skilled programmers who care about this stuff. So instead our super fast computers from the future run buggy junk.

Does claude even make worse choices than many of the engineers at these companies? I've worked with several junior engineers who I'd trust a lot less with small details than I trust claude. And thats claude in 2026. What about claude in 2031, or 2036. Its not that far away. Claude is getting better at software much faster than I am.

I don't think the modern software development world will make the sort of software that you and I would like to use. Who knows. Maybe LLMs will be what changes that.


> But look at the modern web. Look at iOS apps. Look at how long discord takes to launch on a modern computer. Look how big and slow everything is. Most end user applications released today do not pay attention to those small details. Definitely not in early versions of the software. And they're still successful. At least, successful enough.

The main issue is that we have a lot of good tech that are used incorrectly. Each components are sound, but the whole is complex and ungainly. They are code chimeras. Kinda like using a whole web browser to build a code editor, or using react as the view layer for a TUI, or adding a dependency just to check if a file is executable.

It's like the recently posted project which is a lisp where every function call spawn a docker container.


Yep I think this is broadly true. Though its still not clear to me if vibe coding is going to make this better or worse.

I disagree on Kubernetes versus ECS. For me, the reasons to use ECS are not having to pay for a control plane, and not having to keep up with the Kubernetes upgrade treadmill.

This. k8s is primarily resume driven development in most software shops. Hardly any product or service really needs its complexity.

To replace Kubernetes, you inevitably have to reinvent Kubernetes. By the time you build in canaries, blue/green deployments, and rolling updates with precise availability controls, you've just built a bespoke version of k8s. I'll take the industry standard over a homegrown orchestration tool any day.

We've used ECS back when we were on AWS, and now GCE.

We didn't have to invent any homegrown orchestration tool. Our infra is hundreds of VMs across 4 regions.

Can you give an example of what you needed to do?


Really? What deploys your code now? I'm SRE, walk me through high level. How do I roll back?

It used be Google Deployment Manager but that's dead soon so terraform.

To roll back you tell GCE to use the previous image. It does all the rolling over for you.

Our deployment process looks like this:

- Jenkins: build the code to debian packages hosted on JFrog

- Jenkins: build a machine image with ansible and packer

- Jenkins: deploy the new image either to test or prod.

Test deployments create a new Instance Group that isn't automatically attached to any load balancer. You do that manually once you've confirmed everything has started ok.


ECS deployments. Automatically rolls back on failure. Not sexy but it works reliably.

The amount of tools and systems here that work because of k8s is signficiant. K8s is a control plane and an integration plane.

I wish luck to the imo fools chasing the "you may not need it" logic. The vacuum that attitude creates in its wake demands many many many complex & gnarly home-cooked solutions.

Can you? Sure, absolutely! But you are doing that on your own, glueing it all together every step of the way. There's no other glue layer anywhere remotely as integrative, that can universally bind to so much. The value is astronomical, imho.


This idea is a major theme in this story by Robert Kingett: https://sightlessscribbles.com/the-colonization-of-confidenc...


> the actual object-level question ("is this tool useful for this task")

That's not the only question worth asking though. It could be that the tool is useful, but has high negative externalities. In that case, the question "what kind of person uses/rejects this" is also worth considering. I think that if generative AI does have high negative externalities, then I'd like to be the kind of person that rejects it.


What's the name of this DLL? I assume it's separate from the monster chrome.dll, and that the model is proprietary.


chrome_screen_ai.dll is the name of the dll (libchromescreenai.so on linux) and yes it is proprietary. It isn't included by default, Chrome uses its component service to download it automatically when you open a PDF file that doesn't have pre-existing OCR'd text on it. You can download it separately from here: https://chrome-infra-packages.appspot.com/p/chromium/third_p...


Given that it's a 400B-parameter model, but it's a sparse MoE model with 13B active parameters per token, would it run well on an NVIDIA DGX Spark with 128 GB of unified RAM, or do you practically need to hold the full model in RAM even with sparse MoE?


Even with MoE, holding the model in RAM while individual experts are evaluated in VRAM is a bit of a compromise. Experts can be swapped in and out of VRAM for each token. So RAM <-> VRAM bandwidth becomes important. With a model larger than RAM, that bandwidth bottleneck gets pushed to the SSD interface. At least it's read-only, and not read-write, but even the fastest of SSDs will be significantly slower than RAM.

That said, there are folks out there doing it. https://github.com/lyogavin/airllm is one example.


> Experts can be swapped in and out of VRAM for each token.

I've often wondered how much it happens in practice. What does the per-token distribution of expert selection actually look like during inference? For example does it act like uniform random variable, or does it stick with the same 2 or 3 experts for 10 tokens in a row? I haven't been able to find much info on this.

Obviously it depends on what model you are talking about, so some kind of survey would be interesting. I'm sure this must but something that the big inference labs are knowledgeable about.

Although, I guess if you are batching things, then even if a subset of experts is selected for a single query, maybe over the batch it appears completely random, that would destroy any efficiency gains. Perhaps it's possible to intelligently batch queries that are "similar" somehow? It's quite an interesting research problem when you think about it.

Come to think of it, how does it work then for the "prompt ingestion" stage, where it likely runs all experts in parallel to generate the KV cache? I guess that would destroy any efficiency gains due to MoE too, so the prompt ingestion and AR generation stages will have quite different execution profiles.


The model is explicitly trained to produce as uniform a distribution as possible, because it's designed for batched inference with a batch size much larger than the expert count, so that all experts are constantly activated and latency is determined by the highest-loaded expert, so you want to distribute the load evenly to maximize utilization.

Prompt ingestion is still fairly similar to that setting, so you can first compute the expert routing for all tokens, load the first set of expert weights and process only those tokens that selected the first expert, then load the second expert and so on.

But if you want to optimize for single-stream token generation, you need a completely different model design. E.g. PowerInfer's SmallThinker moved expert routing to a previous layer, so that the expert weights can be prefetched asynchronously while another layer is still executing: https://arxiv.org/abs/2507.20984


Thanks, really interesting to think about these trade-offs.


I thought paging was so inefficient that it wasn't worth doing vs using CPU inference for the parts of the model that are in system memory. Maybe if you have a good GPU and a turtle of a CPU, but still somehow have the memory bandwidth to make shuffling data in and out of the GPU worthwhile? I'm curious to know who is doing this and why.


With a non-sequential generative approach perhaps the RAM cache misses could be grouped together and swapped on a when available/when needed prioritized bases.


Can run with mmap() but it is slower. 4-bit quantized there is a decent ratio between the model size and the RAM, with a fast SSD one could try to see how it works. However when a model is 4-bit quantized there is often the doubt that it is not better than an 8-bit quantized model of 200B parameters, it depends on the model, on the use case, ... Unfortunately the street for local inference of SOTA model is being stopped by the RAM prices and the GPU request of the companies, leaving us with little. Probably today the best bet is to buy Mac Studio systems and then run distributed inference (MLX supports this for instance), or a 512 GB Mac Studio M4 that costs, like 13k$.


I think 512 GB Mac Studio was M3 Ultra.

Anyways, isn't a new Mac Studio due in a few months? It should be significantly faster as well.

I just hope RAM prices don't ruin this...


Talking about RAM prices, you can still get a framework Max+ 395 with 128GB RAM for ~$2,459 USD. They have not increased the price for it yet.

https://frame.work/products/desktop-diy-amd-aimax300/configu...


Pretty sure those use to be $1999 ... but not entirely sure


Yep. You be right. Looks like they increased it earlier this month. Bummer!


No.

128GB vram gets you enough space for 256B sized models. But 400B is too big for the DGX Spark, unless you connect 2 of them together and use tensor parallel.


Impressive work.

I wonder if you've looked into what it would take to implement accessibility while maintaining your no-Rust-dependencies rule. On Windows and macOS, it's straightforward enough to implement UI Automation and the Cocoa NSAccessibility protocols respectively. On Unix/X11, as I see it, your options are:

1. Implement AT-SPI with a new from-scratch D-Bus implementation.

2. Implement AT-SPI with one of the D-Bus C libraries (GLib, libdbus, or sdbus).

3. Use GTK, or maybe Qt.


This is obviously bullshit. If he were really worried about the things he says he is, he'd put the brakes on his company, or would never have started it in the first place.


hes addressed this a multitude of times. he wants to slow down but the chinese will not, therefore you cannot cede the frontier to authoritarians. its a nuclear arms race.


So if someone (actually, practically everyone) who runs an AI company says AI is dangerous, it's bullshit. If someone who is holding NVDA put options says it, they're talking their book. If someone whose job is threatened by AI says it, it's cope. If someone who doesn't use AI says it, it's fear of change. Is there someone in particular you want to hear it from, or are you completely immune to argument?


I actually do believe that AI is dangerous, though for different reasons than the ones he focuses on. But I don't think he really believes it, since if he did, he wouldn't be spending billions to bring it into existence.


> So if someone (actually, practically everyone) who runs an AI company says AI is dangerous, it's bullshit.

My instinct is to take his words as a marketing pitch.

When he says AI is dangerous, it is a roundabout way to say it is powerful and should be taken seriously.


Yes, exactly.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: