This is a wildly out of touch thing to say

fourside · 2026-01-08T15:38:31 1767886711

Did you read the article?

dhorthy · 2026-01-08T15:51:02 1767887462

I read it. i agree this is out of touch. Not because the things its saying are wrong, but because the things its saying have been true for almost a year now. They are not "getting worse" they "have been bad". I am staggered to find this article qualifies as "news".

If you're going to write about something that's been true and discussed widely online for a year+, at least have the awareness/integrity to not brand it as "this new thing is happening".

flumpcakes · 2026-01-08T15:59:31 1767887971

Perhaps the advertising money from the big AI money sinks is running out and we are finally seeing more AI scepticism articles.

minimaxir · 2026-01-08T16:00:52 1767888052

> They are not "getting worse" they "have been bad".

The agents available in January 2025 were much much worse than the agents available in November 2025.

Snuggly73 · 2026-01-08T16:16:17 1767888977

Yes, and for some cases no.

The models are gotten very good, but I rather have an obviously broken pile of crap that I can spot immediately, than something that is deep fried with RL to always succeed, but has subtle problems that someone will lgtm :( I guess its not much different with human written code, but the models seem to have weirdly inhuman failures - like, you would just skim some code, cause you just cant believe that anyone can do it wrong, and it turns out to be.

minimaxir · 2026-01-08T16:18:33 1767889113

That's what test cases are for, which is good for both humans and nonhumans.

Snuggly73 · 2026-01-08T16:26:16 1767889576

Test cases are great, but not a total solution. Can you write a test case for the add_numbers(a, b) function?

Snuggly73 · 2026-01-08T16:42:25 1767890545

Well, for some reason it doesnt let me respond to the child comments :(

The problem (which should be obvious) is that with a/b real you cant construct an exhaustive input/output set. The test case can just prove the presence of a bug, but not its absence.

Another category of problems that you cant just test and have to prove is concurrency problems.

And so forth and so on.

minimaxir · 2026-01-08T16:34:34 1767890074

Of course you can. You can write test cases for anything.

Even an add_numbers function can have bugs, e.g. you have to ensure the inputs are numbers. Most coding agents would catch this in loosely-typed languages.

Snuggly73 · 2026-01-08T16:32:46 1767889966

I mean "have been bad" doesnt exclude "getting worse" right :)