Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"Secret language" is clickbait, but it seems like systematically exploring how it responds to gibberish might find something interesting?

Also, I'm wondering if there is some way that these models could have a decent error response rather than responding to every input?



It acts like a reverse Rorschach test, where they give you a nonsensical picture and ask for a forced caption from the subject. If you set the task to generate something no matter what, you get something no matter what.

It is trivial to make it reject gibberish prompts. Just use a generative model to estimate the probability of the input, it's what language models do by definition.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: