It's the usage and not the training that needs to be policed, and the answer there is going to be that Google or OpenAI or whoever is going to make bank by creating a fine tuned model which can detect copyright infringements and providing access to it to companies to double check gen AI outputs for exact or "similar enough" infringements.
https://www.eff.org/deeplinks/2023/04/how-we-think-about-cop...
It's the usage and not the training that needs to be policed, and the answer there is going to be that Google or OpenAI or whoever is going to make bank by creating a fine tuned model which can detect copyright infringements and providing access to it to companies to double check gen AI outputs for exact or "similar enough" infringements.