Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This paper was written by a very small team at Google. It strikes me as similar in that regard to the original transformers paper. If this technique scales well, Google is no doubt already exploiting it for their next generation models -- and I think there are signs that Gemini 2.0 models already exploit this.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: