Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yeah, that's exactly the problem: it's string concatenation, like we used to do with SQL queries.

I called it "prompt injection" to name it after SQL injection - but with hindsight that was a bad choice of name, because SQL injection has an easy fix (escaping text correctly / parameterizing your queries) but that same solution doesn't actually work with prompt injection.

Quite a few LLMs offer a concept of a "system prompt", which looks a bit like your pseudocode there. The OpenAI ones have that, and Anthropic just announced the same feature for their Claude 2.1 model.

The problem is the system prompt is still concatenated together with the rest of the input. It might have special reserved token delimiters to help the model identify which bit is system prompt and which bit isn't, and the models have been trained to pay more attention to instructions in the system prompt, but it's not infallible: you can still put instructions in the regular prompt that outweight the system prompt, if you try hard enough.



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: