Checking the arithmetic in every paper published seems like an good use case for LLMs. Has someone built a better version than uploading a PDF to ChatGPT and asking it to check the arithmetic?
Modern reasoning models are actually pretty good at arithmetic and almost certainly would have caught this error if asked.
Source: we benchmark this sort of stuff at my company and for the past year or so frontier models with a modest reasoning budget typically succeed at arithmetic problems (except for multiplication/division problems with many decimal places, which this isn't).
ChatGPT 5.2 has recently been churning through unsolved Erdös problems.
I think right now one is partially validated by a pro and the other one I know of is "ai-solved" but not verified. As in: we're the ones who can't quite keep up.
You can feed it the Hodge Conjecture for all I care,
the current algorithms are a joke and without real breakthroughs your just generating left to right text with billions in hardware.
Yes, yes. We’ve all seen the same screenshots. Very funny.
Those of us who don’t base our technical understandings on memes are well aware of the tooling at the disposal of all modern reasoning models gives them the capability to do such things.