Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This shows the model architecture, be it transformer, Mamba, SSM or RWKV - doesn't really matter when compared to the impact of the training set. We're spending too much time debating models when we should be talking about language data, a reservoir of human experience won at great sacrifice by humanity.

And the same data when used to train humans creates modern capable people. Alone, without society and language, we would be mere shadows of ourselves. What does it say when AI acquires so many capabilities from language data? maybe intelligence was not centered in the brain. It's a social process.



I agree that on balance, we should spend more effort on data than modeling, but it is just not true that modeling doesn't matter. Transformer-2023 is different from Trnasformer-2020 and cumulative improvement is significant. https://arxiv.org/abs/2312.00752 did such benchmark.

If choice between Transformer and RWKV doesn't seem to matter to you, the only reason is that while Transformer-2020 evolved to Transformer-2023, RWKV-v1 (which is from 2021) also evolved to RWKV-v5. If you use Transformer-2020 or RWKV-v1 today you will feel the difference.


They describe Transformer++ as "A Transformer with an improved architecture, namely rotary positional encodings and SwiGLU MLP", no linear bias terms and RMSNorm instead of LayerNorm.

But modern transformers have many more tricks than that. Such as pre-norm, sparse better use of residual layers, sparse attention masks and so on.


> maybe intelligence was not centered in the brain. It's a social process.

Is that controversial? We are stand on the shoulder of giants before us and that is why we insist on training younglings for couple of decades on past learnings before they are believed to be of any useful. Even the smartest person won't survive long if dropped in 10000 BC.


It's very much worth discussing architecture since, if Mamba or Based end up working as good as Transformers, a lot of current problems related to quadratic scaling are solved.


Of course it is in the brain, the brain created and evolved the language as a very powerful tool. If intelligence was in the language then other animals would be as intelligent as us


Other animals don't have our advanced language. In fact it is the lack of language transmission that keeps them down. What I am arguing is that we're pretty limited individually, only together, and with plenty of time, do we get so smart.

LLMs learning from the same text and gaining human like capabilities shows just how much of intelligence is crystalized in culture. If it works without brains, then brains were not the essential ingredient.

Humans without culture would need 10,000 years or more to recover, and have to pay the same price as the first time around. Culture is smarter than us.


Scientific advancement requires both brains and knowledge transfer over generations.

"If I have seen further, it is by standing on the shoulders of giants."


Knowledge transfer over generations is a function of the brain.

Other species have much more limited ability to transfer knowledge intergenerationally, and that is because the human brain's capability for symbolic language is much more advanced than other animals', who are not able to encode knowledge nearly as efficiently.


The point is it's a function of many connected brains, not just one brain.


Sure but that's still only possible for the human brain, other species brains aren't capable of encoding knowledge and using that to collaborate with other members.


What if it's all the same electron[0]?

[0]: https://www.nobelprize.org/prizes/physics/1965/feynman/lectu...


Model architecture matters. RWKV takes significantly less energy than transformer models.

It’s better for the environment and much faster.

It’s not _only_ about performance


> We're spending too much time debating models when we should be talking about language data

Models still matter a lot. There's arguably still an abilities gap between LLMs and general intelligence that can only be bridged by a new model.


Both are problems to solve, architecture is as much of a problem as data.

Data has the problem of getting successively tainted by LLMs as well as the lack of open high-quality datasets.

While architecture has the problem of shifting too much focus on a flawed architecture - transformers.


I expect this is why openai were focusing on partnering with people with high quality data

https://openai.com/blog/data-partnerships




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: