I’m no LLM expert but I think there is a distinction to be made between the LLM dataset and the output it gives to the user. What you’re suggesting is that it’s difficult to remove something from the dataset, which may be the case. But that doesn’t mean the user will necessarily be able to access it.
My guess is that this is much easier to attack from the user end.
Removing something from the dataset requires full training from scratch(~$100 million for base unaligned GPT-4). You can't like, edit the database file and keep the AI. the database file _is_ the AI.
My guess is that this is much easier to attack from the user end.