Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> the RoPE embeddings in Code Llama were designed for this.

The RoPE embeddings were not "designed" for that. The original RoPE was not designed with length extrapolation in mind. Subsequent tweaks to extrapolate RoPE (e.g. position interpolation) are post-hoc tweaks (with optional tuning) to an entirely vanilla RoPE implementation.



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: