> People on GPT-3 are working in a way that is academically valid (able to make results that are meaningful to a community) but from engineering it is like building a bridge with one end or a tall tower that carries no load.
To me it seemed like the opposite. They are essentially working without any hypothesis of how their model actually works, without any model of the way it actually learns or the way it produces the results that it does, and instead placing blind trust in various metrics that are improving.
They are treating this as an engineering problem - how can we make the best human-sounding text generator - and not like a traditional research problem. GPT-3 has not taught us anything about anything except "how to generate text that seems human-like to humans". We have no firm definition of what that means, we have no idea of why it works, we have no idea of any systematic failures in its model, we know next to nothing about it, other than its results on some metrics.
Imagine the same applied to physics - if instead of inventing QM and Relativity or Mechanics, physicists got it in their head to try to feed raw data into a black box and see how well it predicts some observed movements.
In fact, this would be a pretty interesting experiment: how large would a deep learning model that could accurately predict what mechanics predicts get, given only raw data (object positions, velocities, masses, colors, surface roughness, shape, taste etc.)? Unfortunately, I don't think anyone has been interested in this type of experiment, because it is not useful from an engineering (or profit) perspective.
>Imagine the same applied to physics - if instead of inventing QM and Relativity or Mechanics, physicists got it in their head to try to feed raw data into a black box and see how well it predicts some observed movements.
In fact, this would be a pretty interesting experiment: how large would a deep learning model that could accurately predict what mechanics predicts get, given only raw data (object positions, velocities, masses, colors, surface roughness, shape, taste etc.)? Unfortunately, I don't think anyone has been interested in this type of experiment, because it is not useful from an engineering (or profit) perspective.
Isn't that what googles alphafold is doing pretty much?
and it seems GPT-3 formed concepts related words together without being asked, its not picking the next best word strictly as a matter of statistic probability. So why wouldn't that apply to physics simulations / chemistry etc?
feed it chemical formulas and balancing equations from old chem 101 textbooks and it will fill in the blanks and start teaching itself how those things relate just by being corrected enough, then you can see if it has any predictive value.
I think both of your points are solving different problems that what I was suggesting.
My point is that an interesting scientific question is: "is the huge size of the GPT-3 model intrinsic to the problem of NLP, or is it an artifact of our current algorithms?"
One way to answer that is to apply the same algorithms and methods to mechanics data generated from, let's say, classical mechanics; and compare the generated model size with the size of the classical mechanics description. If the model ends up needing roughly the same amount of parameters as classical mechanics, then that would be a strong suggestion that NLP may intrinsically require a huge model as well. Otherwise, it would leave open the hope that and understanding can be modeled with fewer parameters than GPT-3 requires.
Your examples are still in this realm of engineering - trying to apply the black box model to see what we can get, instead of studying the model itself to try to understand it and how it maps to the problem it's trying to solve.
To me it seemed like the opposite. They are essentially working without any hypothesis of how their model actually works, without any model of the way it actually learns or the way it produces the results that it does, and instead placing blind trust in various metrics that are improving.
They are treating this as an engineering problem - how can we make the best human-sounding text generator - and not like a traditional research problem. GPT-3 has not taught us anything about anything except "how to generate text that seems human-like to humans". We have no firm definition of what that means, we have no idea of why it works, we have no idea of any systematic failures in its model, we know next to nothing about it, other than its results on some metrics.
Imagine the same applied to physics - if instead of inventing QM and Relativity or Mechanics, physicists got it in their head to try to feed raw data into a black box and see how well it predicts some observed movements.
In fact, this would be a pretty interesting experiment: how large would a deep learning model that could accurately predict what mechanics predicts get, given only raw data (object positions, velocities, masses, colors, surface roughness, shape, taste etc.)? Unfortunately, I don't think anyone has been interested in this type of experiment, because it is not useful from an engineering (or profit) perspective.