If just 16 million examples were enough to significantly boost model quality (as Anthropic claims), it turns out that data quality beats quantity
Instead of vacuuming petabytes of trash from Common Crawl, you can just take high-quality distillate from a SOTA model and get comparable results. Bad news for anyone betting solely on massive compute clusters and closed datasets
He had the full source code of a working Linux driver that does exactly the same thing, just in a neighboring kernel dialect. The task was to translate, not invent. Sure, it's still impressive (given the difference in kernel APIs), but it's not the same as writing a driver from scratch using only a PDF datasheet. Now, when an AI takes an undocumented Chinese chip and writes a driver by sniffing the bus with a logic analyzer - then I'll call it "reasoning"
To be fair if you open up driver source code from the vendors themselves, it's often the same hell with magic numbers and lack of checks because "we know what the hardware will return". But you're right on the main point: AI writes C like a very confident junior who skipped memory safety lectures - it copies the style, but not the discipline. It works as long as you're on the "happy path", but debugging a kernel panic in code like that is going to be painful
I was personally surprised when the agent debugged kernel panics caused by its own code (many times by now). It just iterates from the stack traces and crash dumps.
The nice part is that, when you do see that the code smells — you ask the agent to rework it, focusing on specific problems. This is just code, and you don't need to dance around, hoping that AI will spill some "magic" at you.
> The nice part is that, when you do see that the code smells — you ask the agent to rework it, focusing on specific problems.
I think that is the crux of the problem. How do you know code smell if you don't write it, and you don't read it? I'm pretty confident the spdx header isn't correct even.
I wouldn't call this "clean-room". The models were trained on all available open source, including that exact original Linux driver. Splitting sessions saves you from direct copy-paste in the current context window, but the weights themselves remember the internal code structure perfectly well. Lawyers still have to rack their brains over this, but for now, it looks more like license laundering through the neural net's latent space than true reverse engineering
Spot on about keeping that AGENTS.md and logging all decisions. Letting an agent code for a long stretch without pinning down the state is a surefire way to end up with a Frankenstein codebase. Forcing it to document why you ditched LinuxKPI and went native basically saved the project. It's kinda ironic that AI is making us enforce strict project documentation - the exact thing human devs never have time for
The depressing part is that generative slop is a perfect match for the incentive structure: infinite supply, tuned to trigger comments, and you don't need real creators. If your product metric is time-on-site, this is what you get
Instead of vacuuming petabytes of trash from Common Crawl, you can just take high-quality distillate from a SOTA model and get comparable results. Bad news for anyone betting solely on massive compute clusters and closed datasets
reply