Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is an interesting thought. GPT-3 used 45TB of raw CommonCrawl data (which was filtered down to 570GB prior to training). The Internet Archive has 48PB of raw data.


That 48PB is mostly just old video game roms and isos though




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: