If you mean for Anthropic in particular, I don't think so. But it's not the first time a major AI lab publishes an incremental update of a model that is worse at some benchmarks. I remember that a particular update of Gemini 2.5 Pro improved results in LiveCodeBench but scored lower overall in most benchmarks.
Ask it to create an iOS app which natively runs Gemma via Litert-lm.
It’s incredibly trivial to find stuff outside their capabilities. In fact most stuff I want AI to do it just can’t, and the stuff it can isn’t interesting to me.