If you want to train a model to have a general understanding of the physical world, one way is to show it videos and ask it to predict what comes next, and then evaluate it on how close it was to what actually came next.
To really do well on this task, the model basically has to understand physics, and human anatomy, and all sorts of cultural things. So you're forcing the model to learn all these things about the world, but it's relatively easy to train because you can just collect a lot of videos and show the model parts of them -- you know what the next frame is, but the model doesn't.
Along the way, this also creates a video generation model - but you can think of this as more of a nice side effect rather than the ultimate goal.
It doesn’t have to understand anything, none of these demonstrate reasoning or understanding.
All these models have just “seen” enough videos of all those things to build a probability distribution to predict the next step.
This is not bad, or make it inherently dumb, a major component of human intelligence is built on similar strategies.
I couldn’t tell what grammatical rules are broken in text or what physical rules in a photograph but can tell it is wrong using the same methods .
Inference can take it far with large enough data sets, but sooner or later without reasoning you will hit a ceiling .
This is true for humans as well, plenty of people go far in life with just memorization and replication do a lot of jobs fairly competently, but not in everything.
Reasoning is essential for higher order functions and transformers is not the path for that
That's like saying that your brain doesn't understand anything, it just analyzes the visual data coming in via your eyes and predicts the next step of reality
The brain also does that . It doesn’t do it exclusively, but we do it an awful lot .
we do extensive amount of pattern matching and drop enormous amount of sensory input very quickly because we expect patterns and assume a lot about our surroundings.
Unlearning this is a hard skill to pick up. There are many versions of training from martial arts to meditation that attempt to achieve this .
Point is that alone is not sufficient, the other core component is reasoning and understanding , transformers and learning on data is insufficient .
Parrot and few other animals can imitate human speech very well , that doesn’t mean they are understanding the speech or constructing .
Don’t get me wrong, i am not saying it is not useful, it is , but this attribution of reasoning and understanding to models that foundationally has no such building block is just being impressed by a speaking parrot
I think people are just fundamentally not willing to attribute intelligence to things that can't have conversations. This is why the incredible belief was possible that babies or dogs don't feel pain. Once the AI is given some long term memory all of these ideas that AI is just a parrot will suddenly be gone and I personally think that it will probably be pretty easy to give robots memories and their own personal motivations. All you have to achieve is to train them in realtime and the rest is an optimization, you want the training to make sense and have it not store/believe every single thing that it is being told etc.
It is also the corollary: we tend to attribute intelligence to things merely because it can have conversations from the first golden era of AI in 1960's that is always the case.
Mimicking more patterns like emotion and motivation may be better user experience, it doesn't make the machine any smarter, just a better mime.
Your thesis is that as we mimic reality more and more the differences will not matter, this is a idea romanticized by popular media like Blade Runner.
I believe there are classes of applications, particularly if the goal singularity or better than human super intelligence, emulating human responses no matter how sophisticated won't take you take there. Proponents may hand wash this as moving the goalposts, it is only refining the tests to reflect the models of the era.
If the proponents of AI were serious about their claims of intelligence than they should also be pushing for AI rights , there is no such serious discourse happening, only issues related to human data privacy rights on what can be used by AI models for learning or where they can the models be allowed to work.
> If the proponents of AI were serious about their claims of intelligence than they should also be pushing for AI rights , there is no such serious discourse happening
It's beginning to happen. Anthropic hired their first AI welfare researcher from Eleos AI, which is an organization specifically dedicated to investigating this question: https://eleosai.org/
To really do well on this task, the model basically has to understand physics, and human anatomy, and all sorts of cultural things. So you're forcing the model to learn all these things about the world, but it's relatively easy to train because you can just collect a lot of videos and show the model parts of them -- you know what the next frame is, but the model doesn't.
Along the way, this also creates a video generation model - but you can think of this as more of a nice side effect rather than the ultimate goal.