Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Add a dedicated "parking spot" like GPT-OSS does and eat the gradient flow tax on that

Not familiar with this topic, but intrigued-anywhere I can read more about it?



Looked for it briefly, think the best I got is this older discussion:

https://news.ycombinator.com/item?id=44834918


OpenAI have talked about it. The neural architecture needs to let the model handle the case where there's nothing worth attending to, as softmax requires attention to be allocated to all tokens but sometimes there's nothing worth it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: