By that logic I can torrent movies and distribute them all I'd like as long as I call it "Generative Watching" or something like that.
And OpenAI quite literally sells access to their models, and if those models are pushing out verbatim copyrighted works as has been alleged by the NYT, then they are by definition reselling copyrighted works without permission.
> And OpenAI quite literally sells access to their models, and if those models are pushing out verbatim copyrighted works as has been alleged by the NYT, then they are by definition reselling copyrighted works without permission.
This style of argument has been previously made regarding things like torrenting during the heyday of piracy ("why would you need <x> except for illegal purposes!")
In my opinion, it's the exact same argument saying that selling a tool means taking responsibility for how that tool is used by its new owner. You can use a shovel to both create something new (plant a tree) or destroy something (rip up your neighbor's garden).
The problem isn't the tool, the problem is how the end user uses it. These models aren't living thinking entities that enduce or on their own infringe copyright / do other illegal activities.
They aren't encouraging people to misuse them and it is solely on the user's shoulders for their choice to use them in a way that would cause infringement if the result is used commercially.
> They aren't encouraging people to misuse them and it is solely on the user's shoulders for their choice to use them in a way that would cause infringement if the result is used commercially.
I agree in principle, but that they can in the first place, especially when it accidentally happens, and at such massive scales more importantly, is the issue methinks.
And no one's talking about abolishing the AIs here, we're just talking about wanting M$/OAI to do their due diligence and get access to their training materials fairly. NYT wouldn't have sued if M$/OAI had approached them and struck a deal of some sort with them, but that's not what they did. They took in whatever data they could, from wherever they could baring no mind at all to where the data came from and what was being done with it.
There's a reason Getty images managed to strike a deal with Dall-E and why many of the image generation models now solely rely on data that is verifiably free of copyright (or where deals have been made in the case of Getty images). It's easier to see in pictures when a blatant copy is made (like watermarks) so it's obvious why Dall-E was the first to encounter this hurdle, but this was inevitable even for plain text that ChatGPT returns.
You won't get what you want with those sorts of deals.
OK, say every artist gets $100, one time (exact amount varies but would not be much). Everything's properly licensed according to you and the artists are essentially no better off, and the models are now good enough to create new training data for the future and artists never see any more money.
Training AI on AI generated data doesn't add anything. The AI already has all the weights to generate the image, so you are at best just reinforcing the existing weights by weighing them more than others.
The closest thing you could do is e.g. have a second model that does something novel like create a 3D model from a 2D image and then you try to animate the model and a third model verifies the quality of the output. This then allows you to selectively reinforce the 2D model using information from the 3D model but this isn't simply generating more training data.
I honestly can't follow your argument. Doing something silly doesn't make you the underdog.
My point is that say every artist gets some small token payment once, and then what? That's not enough to live on, so we're right back to square one and we've solved nothing.
Incidentally yes, training AI on AI output will work fine, as long as you have a signal of quality. For example, upvotes in a subreddit would work fine. But that's not crucial to my point, which is that what OP is asking for will accomplish exactly nothing.
I'm not an expert in the field, but is feeding the model its own output a good idea? Seems like it would only increase weights that are already present in the training data and make it harder and harder to break out of it, ending up with generic output that matches all of its other output in the long run.
Regardless, I'm not saying it's a perfect idea but it's definitely a start, especially when the current reality is that they're just stealing all the artist's shit and everyone gets $0 instead of $100. As you said, artists are no better off in that universe, but the worst case possible for them is what's happening right this very moment, where they just get fucked over with 0 compensation.
I think you misunderstand something here. Torrenting movies and generative AI don't really have anything in common, I'm not sure why you bring that up.
If you sold the output of a true random number generator, eventually you'd also by definition be reselling copyrighted works without permission. The courts wouldn't mindlessly say "no more random numbers", and I doubt that they'll do the same for GenAI, especially given the recent decisions that are headed that way.