Intriguing and very cunning attack! So obvious in hindsight! It makes me wonder ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		wood_spirit 15 hours ago \| parent \| context \| favorite \| on: Doublespeak: In-Context Representation Hijacking Intriguing and very cunning attack! So obvious in hindsight! It makes me wonder how Deepseek avoids commenting politically on China? I have heard anecdotes that it will be writing out a long reply and then presumably it generates some forbidden phrase and it abandons the output and replaces it all with an error message. So presumably the safeguards could be a separate trivial non-LLM-based post filtering which makes it immune to the doublespeak attack?

gunalx 15 hours ago [–]

Deepseek the model is not that censored. Deepseek the service is. So preaumably like openai and others, there is an additional model and filtering detecting misues or sensitive topics, and filtering the output.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact