Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

5.5 min to train on a PDP/11 you mean to tell me we could have been doing this all along???


Yes. The Cray supercomputers from the 80s were crazy good matmul machines in particular. The quad-CPU Cray X-MP (1984) could sustain 800 MFLOPS to 1 GFLOPS, and with a 1 GB SSD, had enough computer power and bandwidth to train a 7-10M-parameter language model in about six months, and infer at 18-25 tok/sec.

A mid-90s Cray T3E could have handled GPT-2 124M, 24 years before OpenAI.

I also had a punch-card computer from 1965 learn XOR with backpropagation.

The hardware was never the bottleneck, the ideas were.


Post-quantum crypto is a good example of this. Lattice-based schemes were theorized in the 90s, but they took decades to actually reach production. The math existed, the hardware existed, and the ideas for making it work were just not there yet.


The hardware was never the bottleneck, the ideas were.

For sure. Minsky and Papert really set us back.


They should have lived to see the results of the bitter lesson.


Minsky came close (d. 2016) -- although he may have had other interests later in life, if the Epstein file dumps are to be believed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: