1. Equality under the law is important in its own right. Even if a law is wrong, it isn’t right to allow particular corporations to flaunt it in a way that individuals would go to prison for.
2. GPL does not allow you to take the code, compress it in your latent space, and then sell that to consumers without open sourcing your code.
Sure, that's what the paper says. Most people don't care what that says until some ramifications actually occur. E.g. a cease and desist letter. Maybe people should care, but companies have been stealing IP from individuals long before GPL, and they still do.
Whether AI training in general is fair use and whether an AI that spits out a verbatim copy of something from the training data has produced an infringing copy are two different questions.
If there is some copyrighted art in the background in a scene from a movie, maybe that's fair use. If you take a high resolution copy of the movie, extract only the art from the background and want to start distributing that on its own, what do you expect then?
Training seems fine. I learn how to write something by looking at example code, then write my own program, that's widely accepted to be a fair use of the code. Same if I learn multiple things from reading encyclopedias, then write an essay, that's good.
However if I memorise that code and write it down that's not fair use. If I copy the encyclopedia that's bad.
The problem then comes into "how trivial can a line be before it's copyrighted"
Fair use is a case by case fact question dependent on many factors. Trial judges often get creative in how they apply these. The courts are not likely to apply a categorical approach to it like that despite what some professors have written.
> 1. Equality under the law is important in its own right. Even if a law is wrong, it isn’t right to allow particular corporations to flaunt it in a way that individuals would go to prison for.
We're talking about the users getting copyright-laundered code here. That's a pretty equal playing field. It's about the output of the AI, not the AI itself, and there are many models to choose from.
What does "usable" mean? Today's best open source or open weight model is how many months behind the curve of closed models? Was every LLM unusable for coding at that point in time?
2. GPL does not allow you to take the code, compress it in your latent space, and then sell that to consumers without open sourcing your code.