Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> There's not much you can do about it, as sibling comment mentions it's a known gap. There is some work [0] in this space on the investigative side to trace the leak's source, but again the only way it would work is if you can obtain a leaked copy post hoc (leaked to press, discovered through some other means, etc.).

Those kinds of watermarks seem like they'd fail to a sophisticated actor. For instance, if that echomark-type of watermark becomes widespread. I supposed groups like the New York Times would update their procedures to not publish leaked documents verbatim or develop technology to scramble the watermark (e.g. reposition things subtly (again) and fix kerning issues).

With generative AI, the value of a photograph or document as proof is probably going to go down, so it probably won't be that big of an issue.

 help



> I supposed groups like the New York Times would update their procedures to not publish leaked documents verbatim or develop technology to scramble the watermark

Like knuckleheads, The Intercept provided the Pentagon a copy of a scanned document they received from a whistleblower, which directly led to Reality Winner's identity being discovered.


You could do really sneaky things like alter the space between words or other formatting tricks.

Print it out, scan it back in, and OCR that.

Then have an AI or intern paraphrase it.


I think that's exactly what will happen.

When a competent journalist gets a leaked document, they'll learn to only summarize it, but won't quote it verbatim or duplicate it. That'll circumvent and kind of passive leak-detection system that could reveal their source.

Then the only thing that would reveal the source is if the authority starts telling suspected leakers entirely different things, to see what gets out.


> Then the only thing that would reveal the source is if the authority starts telling suspected leakers entirely different things, to see what gets out.

This is called a canary trap [0], a well-trodden technique in the real world and fiction alike.

0: https://en.wikipedia.org/wiki/Canary_trap


Then you fix that loophole by subtlety altering the phrasing or formatting that you send everyone

That's why I said you paraphrase, rather than using the exact phrasing and formatting of the original doc.

Include slightly different details in each version. Then if the paraphrase mentions one of them, you've identified the source.

Yes, I'm aware of that approach.

It's likely tougher than it seems; the big important bits that the news will care about have to match up when checked, and anyone with high-level access to this stuff likely has a significantly sized staff who also has access to it. Paraphrasing reduces the chance of some minute detail tweak being included in the reporting at all.

You also have to actively expect and plan to do it in advance, which takes a lot of labor, time, and chances of people comparing notes and saying "what the fuck, we're being tested". You can't canary trap after the leak.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: