When I read comments like this--and yes I read the article and understand it was generated by an algorithm--I can't help but think the next AI winter is around the corner.
This does not impress me in the slightest.
Taking billions and billions of input corpora and making some of them _sound like_ something a human would say is not impressive. Even if it's at a high school vocabulary level. It may have underlying correlative structure, but there's nothing interesting about the generated artifacts of these algorithms. If we're looking for a cost-effective way to replace content marketing spam... great! We've succeeded! If not, there's nothing interesting or intelligent in these models.
I'll be impressed the day I can see a program that can 1) only rely on its own limited experiential inputs and not billions of artifacts (from already mature persons), and 2) come up with the funny insights of a 3-year-old.
Little children can say things that sound nonsensical but are intelligent. This sounds intelligent but is nonsensical.
I think you are underestimating what an advance these models are over previous NLP models in terms of quality. Before GPT-2 we didn't even have models that could reliably generate grammatical sentences. Now we have things that generate coherent (if not beautiful) paragraphs. It seems easy in retrospect, but some of the smartest people around have been working on this for decades.
> I think you are underestimating what an advance these models are over previous NLP models in terms of quality.
Yeah I mean, I agree. But in my opinion, it's a case of "doing the wrong thing right" instead of a more useful "doing the right thing wrong."
I grant that these automated models are useful for low-value classification/generation tasks at high-frequency scale. I don't think that in any way is related to intelligence though, and the only reason I think they've been pursued is because of immediate economic usefulness _shrug_.
When high-value, low-frequency tasks begin to be reproduced by software without intervention, I think we'll be closer to intelligence. This is just mimicry. Change the parameters even in the slightest (e.g. have this algorithm try to "learn" the article it created to actually do something in the world) and it all falls down.
This kind of facile moving the goalposts is imho a cheap shot and (not imho, fact) is a recurring phenomenon as we make incremental progress toward AI.
Progress is often made with steps that would have been astonishing a few years ago. And every time the bar is raised higher. Rightly so, but characterizing this as doing the wrong thing is missing the point of what we, and the system, are learning.
Yes it's not intelligence. But then, it's not even clear that we ourselves can define intelligence at all… not all philosophers agree on this. Daniel Dennett (philosopher and computer scientist) for example thinks that consciousness may be just a collection of illusions and tricks a mind plays with itself as it models different facets of and lenses into what it stores and perceives.
> This kind of facile moving the goalposts is imho a cheap shot and (not imho, fact) is a recurring phenomenon as we make incremental progress toward AI.
I think you missed my point. I think we're going in the wrong direction for AI entirely, and these "advances" are fundamentally misguided. OpenAI is explicitly about "intelligence," and so we should question if this is in fact that.
It's clear that humans have fundamental intelligence much better than all of this stuff with 6 orders of magnitude less input (at least of the same data sort) on a problem.
Perhaps it would be better to say, "I think the ML winter is just around the corner" as opposed to "the AI winter is just around the corner." That said, this really is math, and these algos still don't actually do anything resembling true intelligence.
It’s actually about AI which is distinct from intelligence.
>6 orders of magnitude less input
That is utterly mistaken.
We have the input of millions of generations of evolution which have shaped our brains and given us a lot of instinctive knowledge that we do not need to learn from environmental input that happens during our lifetime.
Instead it was learned over the course of billions of years, during the lifetimes of other organisms that preceded us.
Our brain structure was developed and tuned by all these inputs to have some built in pretrained models. That’s what instincts are. Billions of years in the making. Millions, at the very least, if you want to restrict it to recent primates, although doing so is nonsensical.
What's absolutely crazy is somehow we think of our DNA base pairs as somehow more important than the physical context that DNA ends up in (society, humans, talking, etc.)
We have the ability to be intelligent and make thoughts with 1 millionth the amount of textual data as this OpenAI GPT-3 study. Maybe... just maybe... intelligence is far more related to things other than just having more data.
I'll actually expand on this and throw this out there: intelligence is in a way antagonistic to more data.
A more intelligent agent needs less knowledge to make a better decision. It's like a function that can do the same computation with fewer inputs. A less intelligent agent requires a lookup table of previously computed intelligent things instead of figuring it out on its own. I think all these "AI" studies are glorified lookup tables.
Throw some novel text prompt/task at it and see what happens. If it was just "glorified lookup tables" then the result should be consistently garbage.
Note in particular that "like a function that can do the same computation with fewer inputs" maps very well to GPT-3 - it can complete many interesting tasks by just having a few samples provided to it, instead of having to fine-tune it with more training.
> Note in particular that "like a function that can do the same computation with fewer inputs" maps very well to GPT-3 - it can complete many interesting tasks by just having a few samples provided to it, instead of having to fine-tune it with more training.
The reason it doesn't need more training is because it's already trained itself with millions of lifetimes of human data and encoded that in the parameters!
Humans aren't born trained with data. The fact that we're throwing more and more data at this problem is crazy. The compression ratio of GPT-3 is worse than GPT-2.
> The reason it doesn't need more training is because it's already trained itself with millions of lifetimes of human data and encoded that in the parameters!
You know what else is trained by the experiences of thousands of individual (and billions of collective) human lifetimes of data? And several trillions of non-human ones?
> Humans aren't born trained with data.
That's either very wrong or about to evolve into a no true scotsman regarding what counts as data.
AKA "why is it so hard to swat a fly?" because they literally have a direct linkage betweeen sensing incoming air pressure and jumping. Thats why fly swatters don't make a lot of air pressure.
Why do you yank your hand back when you get burned? It's not a controlled reaction. Where did you learn it? You didn't.
If you think the brain is much more than a chemical computer you are sadly mistaken. I would encourage you (not really but it's funny to say) to go experiment with psychedelics and steroids and you will quickly realize that these substances can take over your own perceived intelligence.
The most fascinating of all of this is articles/documentaries about trans people that have started taking hormones and how their perception of the world -drastically- changed. From "feeling" a fast car all of a sudden, to being able to visualize flavors. It's absolutely amazing.
Humans are exposed to much more input than just the things they read. Think of every thing you've ever seen and how much data that represents. All of that is your training data. Is it more or less than GPT-3?
With that style of argumentation, you can say that NNs have even more input than humans: they also have all of the technical development of the last 50,000 years built into them.
Not really. Out evolution and existence in our current form rely on many things that have happened in the entire universe up to this point. But I’m not saying each of our brains and bodies encode all that information. We just benefit from it with an intricate physical structure that would have been difficult to create any other way.
But we’re not done yet. Give it time. We can go in plenty of directions. Just because you don’t think the current direction is right, that doesn’t rule out other directions happening. And who’s to say this stuff won’t end up being somehow useful? There’s a great talk on how trying to make progress toward an objective in evolutionary algorithms is not a good way to get there.
I marvel at the jump in BLEU or other measures but I'll second the sentiment, it alone is not showing we are making leaps toward what we need. Yes its a large gradient step minimizing our error, but is it really in the right direction? However I will admit GPT-3 being directed by some yet to be invented causal or counterfactual inference model might be the something which defies my expectations.
> A computer that is actually fluent in English — as in, understands the language and can use it context-appropriately.
Did you never do grammar diagrams in grade school? :-)
The "context" and structure of language is a formula. When you have billions of inputs to that formula, it's not surprising you can get a fit or push that fit backwards to generate a data set.
This algorithm does not "understand" the things it's saying. If it did, that wouldn't be the end of the chain. It could, without training, make investment decisions on that advice, because it would understand the context of what it had just come up with. Plenty of other examples abound.
Humans or animals don't get to have their firmware "upgraded" or software "retrained" every time a new hype paper comes out. They have to use a very limited and basically fixed set of inputs + their own personal history for the rest of their lives. And the outputs they create become internalized and used as inputs to other tasks.
We could make 1M models that do little tasks very well, but unless they can be combined in such a way that the models cooperate and have agency over time, this is just a math problem. And I do say "just" in a derogatory way here. Most of this stuff could have been done by the scientific community decades ago if they had the hardware and quantity of ad clicks/emails/events/gifs to do what are basically sophisticated linear algebra tasks.
> I'll be impressed the day I can see a program that can 1) only rely on its own limited experiential inputs
Hasn't the typical human taken in orders of magnitude more data than this example? And the data has been of both direct sensory experience and texts from other people as well.
> Hasn't the typical human taken in orders of magnitude more data than this example?
Have you read GPT-3s 175 billion parameters (words, sentences, papers, I don't care) of anything? Do you know all the words used in that corpus? Nobody has or does.
A child of a small age can listen to a very small set of things and not just come up with words to communicate with mama and papa what they learned, but they can reuse it. And this I think is key, because the language part of that is at least partially secondarily. The little kid understands what they're talking about even if they have a hard time communicating it to an adult. The fact they take creative leaps to use their extremely limited vocabulary to communicate their knowledge is amazing.
Your post was generated using GPT-3 and 175 billion parameters of pre-existing human writing, contextualized, distilled, and cross-referenced with terminology we've agreed on for centuries. It's a parrot, and I remain unimpressed.
Take the learned knowledge of GPT-3 (because it must be so smart right?) and have it actually do something. Buy stocks, make chemical formulas, build planes. If you are not broke or dead by the end of that exercise, I'll be impressed and believe GPT-3 knows things.
What's unimpressive about a stunningly believable parrot? I think, at the very least, that GPT-3 is knowledgeable enough to answer any trivia you throw at it, and creative enough to write original poetry that a college student could have plausibly written.
Not everything worth doing is as high-stakes as buying stocks, making chemical formulas, or building planes.
Sigh. When DNA becomes human, it doesn't have a-priori access to all the world's knowledge and yet it still develops intelligence without it. And that little DNA machine learns and grows over time.
When thousands of scientists and billions of human artifacts and 1000X more compute are put into the philosophical successor of GPT-3, it won't be as impressive as what happens when a 2 year old becomes a 3 year old. (It will probably make GPT-4 even less impressive than GPT-3, because the inputs vis-a-vis outputs will be even that much more removed from what humans already do.)
> That post was generated using GPT-3 and 175 billion parameters of pre-existing human writing, contextualized, distilled, and cross-referenced with terminology we've agreed on for centuries.
DNA is nothing like the training of GPT. DNA does not encord a massive amount of statistics of words and language and how concepts, words, etc, relate to one another.
All DNA does it encode for how to grow, build, and mantain a human body. That human body has the potential to learn a language and communicate, but if you put a baby human inside an empty room and drop in food, it will never learn language and never communicate. DNA isn't magic and comparing "millions of years of evolution" of DNA is nothing like the Petabytes of data that GPT-3 needs to operate.
Again DNA has no knowledge embedded in it, it has no words or data embedded. Data in the sense that we imagine Wikipedia stored in JSON files on a hard disk. DNA stores an algorithm for growth of a human, that's it.
The GPT-3 model is probably > 700GB in size. That is, for GPT to be able to generate text it needs an absolutely massive "memory" of existing text which it can recite verbatim. In contrast, young human children can generate more novel insights with many orders of magnitude less data in "memory" and less training time.
"Knows things" is kind of vague. I'm pretty sure GPT-3 would obliterate all traditional knowledge bases we have. Even bert could achieve state of the art results when the questions are phrased as the cloze task.
If you mean that anything except full general intelligence is unimpressive than that seems like a fairly high standard.
I recall a researcher filming their child from the day they were born until they began to speak. They wanted to find how many times a child had to hear a word in order to be able to repeat it back to the parent. The result, I think, was that if the child heard the word 2,000 times at all, they would be able to repeat it. But, if they heard the word 600 times at the same place, for instance the end of the couch, that would be enough to repeat it.
The human brain requires less training, but to some extent it is pretrained by our genetic code. The human brain will take on a predictable structure with any sort of training.
I think that it’s the opposite. This algorithm requires many examples of text on the specific topic. Probably more than most humans would require.
> While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples [0]
I don’t know what constitutes an example in this case but let’s assume it means 1 blog article. I don’t know many humans that read thousands or tens of thousands of blog articles on a specific topic. And if I did I’d expect that human to write a much more interesting article.
To me, this and other similar generated texts from OpenAI feel bland / generic.
Take a listen to the generated music from OpenAI - https://openai.com/blog/jukebox/. It’s pretty bad, but in a weird way. It’s technically correct - in key, on beat, ect. And even some of the music it generates is technically hard to do, but it sounds so painfully generic.
> All the impressive achievements of deep learning amount to just curve fitting
Judea Perl [1]
Given one blog article in a foreign language: Would a human be able to write coherent future articles?
With no teacher or context whatsoever how many articles would one have to read before they could write something that would 'fool' a native speaker? 1000, 100,000?
I have no idea how to measure the quantity/quality of contextual and sensory data we are constantly processing from just existing in the real world, however, it is vital to solving these tasks in a human way - yet it is a dataset that no machine has access to
I would argue comparing 'like for like' disregards the rich data we swim amongst as humans, making it an unfair comparison
GPT-3 was trained on half a trillion words (common crawl, webtext, two book corpuses, and wikipedia, IIRC). At about 100 words per minute, that's almost ten thousand years of continuous speech. By my estimate it's probably a few thousand times what people actually hear in a lifetime. We don't experience nearly the volume of language that it did.
Why then, the continued obsession with building single-media models?
Is focusing on the Turing test and language proficiency bringing us further away from the goals of legitimate intelligence?
I would argue "yes", which was my original comment. At no point in us trying to replicate what an adult sounds like have we actually demonstrated anything remotely like the IQ of a small child. And there's this big gap where it's implied by some that this process goes 1) sound like an adult -> 2) think like an adult, which seems to be missing the boat imo. (There's logically this intermediate step where we have this adult-sounding monster AI child.)
If we could constrain the vocabulary to that a child might be exposed to, the correlative trickery of these models would be more obvious. The (exceptionally good) quality of these curve fits wouldn't trick us with vocabulary and syntax that looks like something we'd say. The dumb things would sound dumb, and the smart things would sound smart. And maybe, probably even, that would require us fusing in all sorts of other experiential models to make that happen.
> Why then, the continued obsession with building single-media models?
I think it's literally just working with available data. With some back of the envelope math, GPT-3's training corpus is thousands of lifetimes of language heard. All else equal, I'm sure the ML community would almost unanimously agree that thousands of lifetimes of other data with many modes of interaction and different media would be better. It would take forever to do and would cost insane amounts of money. But some kinds of labels are relatively cheap, and some data don't need labels at all, like this internet text corpus. I think that explains the obsession with single-media models. There's a lot more work to do and this is, believe it or not, still the low hanging fruit.
> thousands of lifetimes of other data with many modes of interaction and different media would be better.
But why not just 1 lifetime of different kinds of data? Heck, why not an environment of 3 years of multi-media data that a child would experience? That wouldn't cost insane amounts of money (or probably anything even close to what we've spent on deep learning as a species).
A corpus limited to the experiences of a single agent would create a very compelling case for intelligence if at the end of that training there was something that sounded and acted smart. It couldn't "jump the gun" as it were, by a lookup of some very intelligent statement that was made somewhere else. It would imply the agent was creatively generating new models as opposed to finding pre-existing ones. It'd even be generous to plain-ol'-AI as well as deep learning, because it would allow both causal models to explain learned explicit knowledge (symbolic), or interesting tacit behavior (empirical ML).
> But why not just 1 lifetime of different kinds of data? Heck, why not an environment of 3 years of multi-media data that a child would experience? That wouldn't cost insane amounts of money (or probably anything even close to what we've spent on deep learning as a species).
How would you imagine creating such an environment in a way that allows you to train models quickly?
No new technology is impressive when it comes incrementally. A camera that automatically records the latitude and longitude of where each photo was taken would have blown my mind as a child. I couldn't have conceived any way it might have worked. But nearly all cameras do that now and at most it's a curiosity or a privacy worry, not a blown mind.
The article says more about the state of tech blogging than it does GPT-3. I kept thinking "great, another one of these, when are they actually going to show me any results?"
We've been conditioned to accept articles where there's a lot of words and paragraphs and paragraphs of buildup, but nothing actually being said.
This does not impress me in the slightest.
Taking billions and billions of input corpora and making some of them _sound like_ something a human would say is not impressive. Even if it's at a high school vocabulary level. It may have underlying correlative structure, but there's nothing interesting about the generated artifacts of these algorithms. If we're looking for a cost-effective way to replace content marketing spam... great! We've succeeded! If not, there's nothing interesting or intelligent in these models.
I'll be impressed the day I can see a program that can 1) only rely on its own limited experiential inputs and not billions of artifacts (from already mature persons), and 2) come up with the funny insights of a 3-year-old.
Little children can say things that sound nonsensical but are intelligent. This sounds intelligent but is nonsensical.