You can make the argument that stagnant progress isn’t actually not progress, when it comes to AI progress. Kilcher and Karpathy recently had a video where they discussed how some new model (PALM or Dalle2 I forget which) showed zero progress during X thousand training cycles, and then suddenly rapid progress after those training cycles. It was as if the model was spending thousands of training cycles on grokking the concept, and then finally grokked it. It could simply be that as we continue to increase the number of parameters and data quality on these models that we will continue to see progress on the route to AGI as a whole, but only in step change functions that require many training cycles
How much more parameters do you need?
PALM is 530 BILLIONs and underperform in NLP tasks vs XLnet (300 millions), as such very large language model are extreme failures. They do not improve the state of the art once you have proper datasets and do full shot learning and I'm not even talking about fine-tuning.
Very large languages model hide to the layman that they are the gigantesque failure in NLP ever by showing they improve the state of the art in zero or few shot learning.
Who cares this is so cringe. Full size learning is what matter the most and even full size learning do not yield satisfying accuracy on most Nlp tasks (but close enough)
Therefore the only use of PALM is to have mediocre (70-80%) accuracy which is better than previous SOTA, only for tasks that have no good quality existing datasets.
And 530 billion is close to the max we can realistically achieve, it already cost ~10 millions in hardware and underperform a 300 million model in full size learning (e.g dependency parsing, word sense disambiguation, coreference resolution, NER, etc)
It's crazy people don't realize this gigantic failure but as always it's because they don't care enough