I can write a spec for an entirely new endpoint, and Claude figures out all of the middleware plumbing and the database queries. (The catch: this is in Rust and the SQL is raw, without an ORM. It just gets it. I'm reviewing the code, too, and it's mostly excellent.)
I can ask Claude to add new data to the return payloads - it does it, and it can figure out the cache invalidation.
These models are blowing my mind. It's like I have an army of juniors I can actually trust.
I'm not sure I'd call agents an
army of juniors. More like a high school summer intern who has infinite time to do deep dives into StackOverflow but doesn't have nearly enough programming experience yet to have developed a "taste" for good code
In my experience, agentic LLMs tend to write code that is very branchy with cyclomatic complexity. They don't follow DRY principles unless you push them very hard in that direction (and even then not always), and sometimes they do things that just fly in the face of common sense. Example of that last part: I was writing some Ruby tests with Opus 4.6 yesterday, and I got dozens of tests that amounted to this:
x = X.new
assert x.kind_of?(X)
This is of course an entirely meaningless check. But if you aren't reading the tests and you just run the test job and see hundreds of green check marks and dozens of classes covered, it could give you a false sense of security
> In my experience, agentic LLMs tend to write code that is very branchy with cyclomatic complexity
You are missing the forest for the trees. Sure, we can find flaws in the current generation of LLMs. But they'll be fixed. We have a tool that can learn to do anything as well as a human, given sufficient input.
LLMs have been a thing for about three years now, so you can't have been hearing this for very long. In those three years, the rate of progress has been astounding and there is no sign of slowing down.
I can write a spec for an entirely new endpoint, and Claude figures out all of the middleware plumbing and the database queries. (The catch: this is in Rust and the SQL is raw, without an ORM. It just gets it. I'm reviewing the code, too, and it's mostly excellent.)
I can ask Claude to add new data to the return payloads - it does it, and it can figure out the cache invalidation.
These models are blowing my mind. It's like I have an army of juniors I can actually trust.