I think both of your points are solving different problems that what I was suggesting.
My point is that an interesting scientific question is: "is the huge size of the GPT-3 model intrinsic to the problem of NLP, or is it an artifact of our current algorithms?"
One way to answer that is to apply the same algorithms and methods to mechanics data generated from, let's say, classical mechanics; and compare the generated model size with the size of the classical mechanics description. If the model ends up needing roughly the same amount of parameters as classical mechanics, then that would be a strong suggestion that NLP may intrinsically require a huge model as well. Otherwise, it would leave open the hope that and understanding can be modeled with fewer parameters than GPT-3 requires.
Your examples are still in this realm of engineering - trying to apply the black box model to see what we can get, instead of studying the model itself to try to understand it and how it maps to the problem it's trying to solve.
My point is that an interesting scientific question is: "is the huge size of the GPT-3 model intrinsic to the problem of NLP, or is it an artifact of our current algorithms?"
One way to answer that is to apply the same algorithms and methods to mechanics data generated from, let's say, classical mechanics; and compare the generated model size with the size of the classical mechanics description. If the model ends up needing roughly the same amount of parameters as classical mechanics, then that would be a strong suggestion that NLP may intrinsically require a huge model as well. Otherwise, it would leave open the hope that and understanding can be modeled with fewer parameters than GPT-3 requires.
Your examples are still in this realm of engineering - trying to apply the black box model to see what we can get, instead of studying the model itself to try to understand it and how it maps to the problem it's trying to solve.