Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Do you mean "all variants of the same stacked transformer architecture converge in performance"? Or do you know of tests against some other architecture? The diffusion-based LLMs?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: