This shows the model architecture, be it transformer, Mamba, SSM or RWKV - doesn...

sanxiyn · on Jan 29, 2024

I agree that on balance, we should spend more effort on data than modeling, but it is just not true that modeling doesn't matter. Transformer-2023 is different from Trnasformer-2020 and cumulative improvement is significant. https://arxiv.org/abs/2312.00752 did such benchmark.

If choice between Transformer and RWKV doesn't seem to matter to you, the only reason is that while Transformer-2020 evolved to Transformer-2023, RWKV-v1 (which is from 2021) also evolved to RWKV-v5. If you use Transformer-2020 or RWKV-v1 today you will feel the difference.

thomasahle · on Jan 29, 2024

They describe Transformer++ as "A Transformer with an improved architecture, namely rotary positional encodings and SwiGLU MLP", no linear bias terms and RMSNorm instead of LayerNorm.

But modern transformers have many more tricks than that. Such as pre-norm, sparse better use of residual layers, sparse attention masks and so on.

blackoil · on Jan 29, 2024

> maybe intelligence was not centered in the brain. It's a social process.

Is that controversial? We are stand on the shoulder of giants before us and that is why we insist on training younglings for couple of decades on past learnings before they are believed to be of any useful. Even the smartest person won't survive long if dropped in 10000 BC.

rafaelero · on Jan 29, 2024

It's very much worth discussing architecture since, if Mamba or Based end up working as good as Transformers, a lot of current problems related to quadratic scaling are solved.

jack_pp · on Jan 29, 2024

Of course it is in the brain, the brain created and evolved the language as a very powerful tool. If intelligence was in the language then other animals would be as intelligent as us

visarga · on Jan 29, 2024

Other animals don't have our advanced language. In fact it is the lack of language transmission that keeps them down. What I am arguing is that we're pretty limited individually, only together, and with plenty of time, do we get so smart.

LLMs learning from the same text and gaining human like capabilities shows just how much of intelligence is crystalized in culture. If it works without brains, then brains were not the essential ingredient.

Humans without culture would need 10,000 years or more to recover, and have to pay the same price as the first time around. Culture is smarter than us.

klipt · on Jan 29, 2024

Scientific advancement requires both brains and knowledge transfer over generations.

"If I have seen further, it is by standing on the shoulders of giants."

mediaman · on Jan 29, 2024

Knowledge transfer over generations is a function of the brain.

Other species have much more limited ability to transfer knowledge intergenerationally, and that is because the human brain's capability for symbolic language is much more advanced than other animals', who are not able to encode knowledge nearly as efficiently.

klipt · on Jan 29, 2024

The point is it's a function of many connected brains, not just one brain.

jack_pp · on Jan 29, 2024

Sure but that's still only possible for the human brain, other species brains aren't capable of encoding knowledge and using that to collaborate with other members.

shawn-butler · on Jan 29, 2024

What if it's all the same electron[0]?

[0]: https://www.nobelprize.org/prizes/physics/1965/feynman/lectu...

dartos · on Jan 29, 2024

Model architecture matters. RWKV takes significantly less energy than transformer models.

It’s better for the environment and much faster.

It’s not _only_ about performance

naasking · on Jan 29, 2024

> We're spending too much time debating models when we should be talking about language data

Models still matter a lot. There's arguably still an abilities gap between LLMs and general intelligence that can only be bridged by a new model.

fuu_dev · on Jan 29, 2024

Both are problems to solve, architecture is as much of a problem as data.

Data has the problem of getting successively tainted by LLMs as well as the lack of open high-quality datasets.

While architecture has the problem of shifting too much focus on a flawed architecture - transformers.

IanCal · on Jan 29, 2024

I expect this is why openai were focusing on partnering with people with high quality data

https://openai.com/blog/data-partnerships