It's ridiculous to equate this to a traditional Markov chain language model. Her...

f00zz · on July 28, 2020

Right, I admit that I don't know the first thing about ML, so I tried an experiment.

Consider a language with the tokens "{[()]}" and the following grammar:

S := S S | '{' S '}' | '[' S ']' | '(' S ')' | <empty>

That is, "[()]" and "[]()" are valid sequences, but "[(])" or "))))" aren't. A child would quickly figure out the grammar if presented some valid sequences.

I generated all 73206 valid sequences with 10 tokens and used it as input to the RNN text generator code at http://karpathy.github.io/2015/05/21/rnn-effectiveness/. After 500,000 iterations I'm still getting invalid sequences.

Am I doing something stupid, or is a RNN text generator weaker than a child (or a pushdown automaton)? Is GPT fundamentally more powerful than this?

Veedrac · on July 28, 2020

GPT-3 can generate well-formed programs, so yes, it does things well beyond this complexity.

> After 500,000 iterations I'm still getting invalid sequences.

How frequently? If it's only the occasional issue it might be down to the temperature-based sampling that code uses, which means it will, with some small probability, return arbitrarily unlikely outputs.

mrfusion · on July 28, 2020

How can it do that? Did it read “tana” and the meaning somewhere?

perl4ever · on July 28, 2020

I suspect people overestimate the intelligence because they just can't grasp how much data it's ingested or don't have a visceral sense of what an ocean of data can contain. There's a saying that "quantity has a quality all its own".

dragongod2718 · on July 28, 2020

I don't think this is an overestimation of intelligence. That ability is itself intelligence.