I wonder if you can use lower quality models (or some other non-llm related process) to inject more "noise" into the text in between stages. Of course it wouldn't help retain uniqueness from the original source text, just add more in between.
I’m not convinced removing RLHF would really make the probabilities generator give us distributions that can diverge from the mean while remaining useful.
In other words, this might not a problem that can be overcome in LLMs alone.