Checking the arithmetic in every paper published seems like an good use case for...

ironbound · 2026-01-15T13:17:55 1768483075

LLM's are why we're in this mess, they can't do math or count r's

gordonhart · 2026-01-15T13:29:00 1768483740

Modern reasoning models are actually pretty good at arithmetic and almost certainly would have caught this error if asked.

Source: we benchmark this sort of stuff at my company and for the past year or so frontier models with a modest reasoning budget typically succeed at arithmetic problems (except for multiplication/division problems with many decimal places, which this isn't).

RobotToaster · 2026-01-15T13:42:49 1768484569

Interesting, how have you found they have been performing at more complex things like calculus and analysis?

speedgoose · 2026-01-15T13:53:39 1768485219

It’s on the front page of HN once in a while.

literalAardvark · 2026-01-15T14:01:33 1768485693

They can't do math?

ChatGPT 5.2 has recently been churning through unsolved Erdös problems.

I think right now one is partially validated by a pro and the other one I know of is "ai-solved" but not verified. As in: we're the ones who can't quite keep up.

https://arxiv.org/abs/2601.07421

And the only reason they can't count Rs is that we don't show them Rs due to a performance optimization.

ironbound · 2026-01-15T15:42:14 1768491734

You can feed it the Hodge Conjecture for all I care, the current algorithms are a joke and without real breakthroughs your just generating left to right text with billions in hardware.

literalAardvark · 2026-01-15T18:18:15 1768501095

Guess frontier math and programming are just left to right text then.

nine_k · 2026-01-15T13:36:54 1768484214

An LLM usually has a powerful digital computer right in its disposal, and could use it as a tool to do precise calculations.

brookst · 2026-01-15T13:46:15 1768484775

More accurate to say they can’t see r’s. They process language but not letters.

UqWBcuFx6NV4r · 2026-01-15T13:57:37 1768485457

Yes, yes. We’ve all seen the same screenshots. Very funny.

Those of us who don’t base our technical understandings on memes are well aware of the tooling at the disposal of all modern reasoning models gives them the capability to do such things.

Please don’t bring the culture war here.