The improvement in LLMs has come in the form of more successful one shots, more ...

The improvement in LLMs has come in the form of more successful one shots, more successful bug finding, more efficient code, less time hand-holding the model.

"Problem solving" (which definitely has improved, but maybe has a spikey domain improvement profile) might not be the best metric, because you could probably hand hold the models of 12 months ago to the same "solution" as current models, but you would spend a lot of time hand holding.