5.5 min to train on a PDP/11 you mean to tell me we could have been doing this a...

rahen · 2026-03-28T21:35:57 1774733757

Yes. The Cray supercomputers from the 80s were crazy good matmul machines in particular. The quad-CPU Cray X-MP (1984) could sustain 800 MFLOPS to 1 GFLOPS, and with a 1 GB SSD, had enough computer power and bandwidth to train a 7-10M-parameter language model in about six months, and infer at 18-25 tok/sec.

A mid-90s Cray T3E could have handled GPT-2 124M, 24 years before OpenAI.

I also had a punch-card computer from 1965 learn XOR with backpropagation.

The hardware was never the bottleneck, the ideas were.

lucasfin000 · 2026-03-29T14:24:06 1774794246

Post-quantum crypto is a good example of this. Lattice-based schemes were theorized in the 90s, but they took decades to actually reach production. The math existed, the hardware existed, and the ideas for making it work were just not there yet.

CamperBob2 · 2026-03-29T04:13:53 1774757633

The hardware was never the bottleneck, the ideas were.

For sure. Minsky and Papert really set us back.

Onavo · 2026-03-29T06:26:57 1774765617

They should have lived to see the results of the bitter lesson.

CamperBob2 · 2026-03-29T15:50:15 1774799415

Minsky came close (d. 2016) -- although he may have had other interests later in life, if the Epstein file dumps are to be believed.