This shows the model architecture, be it transformer, Mamba, SSM or RWKV - doesn't really matter when compared to the impact of the training set. We're spending too much time debating models when we should be talking about language data, a reservoir of human experience won at great sacrifice by humanity.
And the same data when used to train humans creates modern capable people. Alone, without society and language, we would be mere shadows of ourselves. What does it say when AI acquires so many capabilities from language data? maybe intelligence was not centered in the brain. It's a social process.
I agree that on balance, we should spend more effort on data than modeling, but it is just not true that modeling doesn't matter. Transformer-2023 is different from Trnasformer-2020 and cumulative improvement is significant. https://arxiv.org/abs/2312.00752 did such benchmark.
If choice between Transformer and RWKV doesn't seem to matter to you, the only reason is that while Transformer-2020 evolved to Transformer-2023, RWKV-v1 (which is from 2021) also evolved to RWKV-v5. If you use Transformer-2020 or RWKV-v1 today you will feel the difference.
They describe Transformer++ as "A Transformer with an improved architecture, namely rotary positional encodings and SwiGLU MLP", no linear bias terms and RMSNorm instead of LayerNorm.
But modern transformers have many more tricks than that. Such as pre-norm, sparse better use of residual layers, sparse attention masks and so on.
> maybe intelligence was not centered in the brain. It's a social process.
Is that controversial? We are stand on the shoulder of giants before us and that is why we insist on training younglings for couple of decades on past learnings before they are believed to be of any useful. Even the smartest person won't survive long if dropped in 10000 BC.
It's very much worth discussing architecture since, if Mamba or Based end up working as good as Transformers, a lot of current problems related to quadratic scaling are solved.
Of course it is in the brain, the brain created and evolved the language as a very powerful tool. If intelligence was in the language then other animals would be as intelligent as us
Other animals don't have our advanced language. In fact it is the lack of language transmission that keeps them down. What I am arguing is that we're pretty limited individually, only together, and with plenty of time, do we get so smart.
LLMs learning from the same text and gaining human like capabilities shows just how much of intelligence is crystalized in culture. If it works without brains, then brains were not the essential ingredient.
Humans without culture would need 10,000 years or more to recover, and have to pay the same price as the first time around. Culture is smarter than us.
Knowledge transfer over generations is a function of the brain.
Other species have much more limited ability to transfer knowledge intergenerationally, and that is because the human brain's capability for symbolic language is much more advanced than other animals', who are not able to encode knowledge nearly as efficiently.
Sure but that's still only possible for the human brain, other species brains aren't capable of encoding knowledge and using that to collaborate with other members.
And the same data when used to train humans creates modern capable people. Alone, without society and language, we would be mere shadows of ourselves. What does it say when AI acquires so many capabilities from language data? maybe intelligence was not centered in the brain. It's a social process.