Copyright should be the problem of the person using the works and not the problem of the AI generating it.
Unless Nintendo plans on busting down the doors of every person who tries to draw Mario or preventing little Timmy from making a parody of Coca-Cola, making it where AI cannot generated copyrighted works is insane imo.
Those brands should be proud to be such a big part of the cultural fabric that it is difficult to get away from their branding. Plus it's not infringement to my knowledge until you use it for commercial purposes so as long as no one as creating Lario and Muigi to sell or otherwise use in business, it's no different than drawing it yourself.
If the AI is completely unable to generate non-infringing works even if you are _trying_ to get away from it (which the author very much doesn't seem they are, they are purposefully making and show prompts that infringe), that's the problem of the AI creator then.
When I cover generative AI in my Ethics in AI lecture, one of few soapbox opinions I give is that GenAI is doing essentially what people do - copy others. Picasso has a quote about "Good Artists copy, Great Artists steal", which doesn't mean try to pass Lario and Muigi off as your own, but rather that great artists are able to take aspects from other works (also called 'inspiration') without being caught. My personality is a combination of elements taken from Jim Carrey, Robin Williams, and King of the Deathmatch Mick Foley. I like making vector graphics based on pictures. I have a folder on my computer called "Website Ideas" that's just screenshots of UIs that I've come across that I really like.
I also point to a YouTube video by Kirby Ferguson "Everything is a Remix" [1] which talks about how so much of our collective culture stems from copy. It's a great video if you have an hour.
When Little Timmy crayons a copy of Mario, we congratulate him for his creativity. Is it unique, one of a kind art? Well Timmy made it, but he didn't think up the original idea of a video game plumber. I give this view to GenAI right now - it's not capable of achieving that "next step" in "original design", but its performing like a novice artist/musician, it's mimicking what it sees.
Rounding up a transaction and taking the leftovers wouldn’t be a crime worthy of the FBI for one transaction but it would be for a million or a billion. Scale matters and impact matters.
If you’re making an ethical argument “it’s okay because it’s already happening to a lesser degree somewhere else” isn’t the flex you think it is.
If you’re talking ethics, talk about impact. Who does it help the most who does it hurt the most? Is your argument favoring equality of access or outcome? Who is the most vulnerable in the situation and how will it impact them?
I teach it, my background is located in my profile and my research focuses on CS education.
Scale and impact do matter, I wholeheartedly agree. However, I stand by my point that genAI is mirroring how humans learn - repetition of previously observed actions. As part of my dissertation, I argued that humans operate using 'templates', or previously established frameworks / systems. Even in higher cognitive tasks like problem solving, we rely on workflows that we were trained on previously. Soloway referred to problem solving as a mental set of "basic recurring plans" [1] and if you look at the old 1980s Usborne children's books, they required kids to retype code [2]. For creative tasks, depending on the actor's background, Method and Meisner both tell people to draw from previous experiences and observations to develop a character. This behavior is similar in many areas like music, dance, martial arts, cooking, language acquisition, etc.
I am not making an ethical argument that GenAI violating copyright is okay because that's what humans do. I'm arguing that GenAI mirrors how humans learn. We observe a behavior and attempt to recreate that behavior. The difference is that humans can extract a fraction of the behavior and utilize it as part of something larger while GenAI cannot to the degree humans do. I'm sure GenAI would struggle to recreate "Who Framed Roger Rabbit?" because of the two polar different visual elements of the film (cartoon and real life).
In regards to your "If you’re talking ethics, talk about impact" section, its a bit of a loaded question. One side of the conversation could state that GenAI is helping many people that do not have confidence in their creative ability to produce their ideas, while the other could state its making it harder for artists.
Yes, it absolutely is hurting artists and I fully support the recent writer's strike over AI concerns. But I do not believe that diminishes how the mathematical models used in GenAI mirror our own skill acquistion.
I took an AI in ethics course from a state backed school (Georgia Tech) and the answer to questions that weren’t “that’s illegal based on protected status” were “well, it depends.” Which, sure, that’s true, but maybe not helpful.
In my view it encouraged nihilism and apathy instead of developing ethical frameworks. From that lens, I feel teaching a course might be more limiting in the range of heuristics you’re willing to accept or endorse. Though happy to accept your personal experience.
A paper that comes to mind often from HCI is “do artifacts have politics” which looks at the impacts of technologies divorced from creator intent. I feel that’s similar here.
You’re not wrong that about the mechanism that it’s created. But I would argue that’s the least important part, ethically anyway.
Saying “strip mining with heavy industrial machines mimics laborers using shovels” is true to a degree, but but perhaps not that important piece of information.
I’m not saying you’re making that argument. I guess im just not totally sure the outcome you were looking for in sharing your original comment. I hear your comparison and agree with it and that it is interesting to view in that lense. I wasn’t sure if there was a deeper intent in sharing it.
Apologies for the delayed response, but on the bright side it's faster than I respond to some emails XD. I should preface the course I was referring to was "Intro to AI", not "Ethics in AI". I only have a single lecture dedicated to ethics, but do try to pepper it in as we cover topics. My original comments were more addressing "how humans learn" rather than any higher level ethical concerns. Your last section on "deeper intent" is correct, there wasn't any.
I have a pretty neutral stance to GenAI, mostly due to personality however it also stems from my background as well as recognizing students' interests. Prior to CS Education, my master thesis involved computer vision for catching "high valued targets", but was also funded to help minimize human trafficking. I have students in my classes that are very interested in going to work for defense companies like Lockheed and Raytheon, and I have others that are really interested in using AI for "social good" areas like healthcare and education. I try to have a neutral stance because: A) I hated the professors that I took that would use their lecture time to express their political opinions, B) opinions that are opposite to a student may otherwise discourage them from learning the material, and C) my primary focus is to make sure they learn the material and do it "right".
When I started teaching, I used the analogy that if they go on to write the software for the life support machine I'm hooked up it, it WORKS. If someone wants to go on to use AI to create weapons, I can't stop them anymore than I can force them to read a chapter or convincing the person beside on the highway to slow down. I just work to ensure they do it correctly (which includes being mindful of the ethical ramifications of using algorithm X for task Y).
What would an ethical framework for designing AI for a drone even look like? I have no idea, nor is it something I'm interested in delving into. I got out of face recognition for those reasons. Does an ethical framework for GenAI require the same elements, a fraction of them, or a completely different set of guidelines? Who gets to decide them - the 'experts' in AI, the government, society as a whole?
Personally, I've made the comment that the current opinions on regulating AI are like "everyone trying to be AI's parent". We're never going to agree because everyone has a different opinion on the "right" way to handle AI. Plus, human cognition is so unknown and illogical that we may never figure out a way to perfectly replicate human intelligence. I instead try to stay somewhat optimistic and marvel at the math we've used to create "AI".
Do you really see no difference between someone drawing a piece of fan art and trillion dollar corporations stealing other people's works and reselling it for their own profit with no regards to anyone or anything else?
And yes, obviously society cares about many things depending on the scales in question. It's okay if a dude goes onto a lake on his small rowboat and catches a few fish for dinner, it's a completely different story if you're talking about a massive barge indiscriminately catching literally thousands of fish with huge nets. The latter has to adhere to much stricter rules than the prior, and I think you'd be hard pressed to find anyone who thinks these 2 situations should be treated equally (unless you're a commercial fisherman with a barge, I suppose, the quote "It is difficult to get a man to understand something when his salary depends on his not understanding it." comes to mind here)
> Do you really see no difference between someone drawing a piece of fan art
In the history of the world only a single person has ever drawn fan art?
No, I don't think that's the case.
Instead it is widespread. It is everywhere.
> depending on the scales in question
The scale argument supports me, not you.
This type of "infringement" is everywhere.
> reselling it for their own profit with no regards to anyone or anything else?
Even this is common. The online independent artist commissions market is full of people doing commercial fan art commissions.
Thinking about this even more, I am now wondering if "infringing" works might actually be a majority of the online/independent commissions market. Maybe.
> In the history of the world only a single person has ever drawn fan art?
That's a disingenuous take of my comment at best, the equivalent to my scenario is a bunch of unrelated individuals with small boats going out into whatever lake is nearest to them and fishing. Even if you put all of them together and counted how many fish the hobby fishermen catch, it's still nowhere near the scale of the commercial fisheries, which is why they're treated differently both by society at larger but also legally.
Same thing with these AI models, Dall-E and all the other ones have probably generated more images than all of humanity has in its entire history so far, and if not quite yet they're definitely gonna get there sooner rather than later. They can generate dozens if not hundreds of images in a split second, whereas a single artist (or even many artists collectively) can't.
> And yet, nobody cares.
I think we've already established that, because scales absolutely matter for most things. If you want to be an absolutist about it, sure be my guest, but I think in reality the large majority of people are fine when your average Joe Schmoe the artist makes a commission on a random Disney character, whereas they definitely would NOT be okay with a massive conglomerate like Disney stealing Joe Schmoe's original art and repurposing it without compensating Joe, because there's an inherent power disbalance between the two and the consequences of that power disparity matters.
I mean, Disney does have every right to go after Joe for his commissions if they really wanted to, similarly to how Nintendo is hyper aggressive with taking down anything relating to their IPs. It's just not really worth it for most companies, they will absolutely go for another company trying to pull the same shit though, as can be seen with the NYT case.
By that logic I can torrent movies and distribute them all I'd like as long as I call it "Generative Watching" or something like that.
And OpenAI quite literally sells access to their models, and if those models are pushing out verbatim copyrighted works as has been alleged by the NYT, then they are by definition reselling copyrighted works without permission.
> And OpenAI quite literally sells access to their models, and if those models are pushing out verbatim copyrighted works as has been alleged by the NYT, then they are by definition reselling copyrighted works without permission.
This style of argument has been previously made regarding things like torrenting during the heyday of piracy ("why would you need <x> except for illegal purposes!")
In my opinion, it's the exact same argument saying that selling a tool means taking responsibility for how that tool is used by its new owner. You can use a shovel to both create something new (plant a tree) or destroy something (rip up your neighbor's garden).
The problem isn't the tool, the problem is how the end user uses it. These models aren't living thinking entities that enduce or on their own infringe copyright / do other illegal activities.
They aren't encouraging people to misuse them and it is solely on the user's shoulders for their choice to use them in a way that would cause infringement if the result is used commercially.
> They aren't encouraging people to misuse them and it is solely on the user's shoulders for their choice to use them in a way that would cause infringement if the result is used commercially.
I agree in principle, but that they can in the first place, especially when it accidentally happens, and at such massive scales more importantly, is the issue methinks.
And no one's talking about abolishing the AIs here, we're just talking about wanting M$/OAI to do their due diligence and get access to their training materials fairly. NYT wouldn't have sued if M$/OAI had approached them and struck a deal of some sort with them, but that's not what they did. They took in whatever data they could, from wherever they could baring no mind at all to where the data came from and what was being done with it.
There's a reason Getty images managed to strike a deal with Dall-E and why many of the image generation models now solely rely on data that is verifiably free of copyright (or where deals have been made in the case of Getty images). It's easier to see in pictures when a blatant copy is made (like watermarks) so it's obvious why Dall-E was the first to encounter this hurdle, but this was inevitable even for plain text that ChatGPT returns.
You won't get what you want with those sorts of deals.
OK, say every artist gets $100, one time (exact amount varies but would not be much). Everything's properly licensed according to you and the artists are essentially no better off, and the models are now good enough to create new training data for the future and artists never see any more money.
Training AI on AI generated data doesn't add anything. The AI already has all the weights to generate the image, so you are at best just reinforcing the existing weights by weighing them more than others.
The closest thing you could do is e.g. have a second model that does something novel like create a 3D model from a 2D image and then you try to animate the model and a third model verifies the quality of the output. This then allows you to selectively reinforce the 2D model using information from the 3D model but this isn't simply generating more training data.
I honestly can't follow your argument. Doing something silly doesn't make you the underdog.
My point is that say every artist gets some small token payment once, and then what? That's not enough to live on, so we're right back to square one and we've solved nothing.
Incidentally yes, training AI on AI output will work fine, as long as you have a signal of quality. For example, upvotes in a subreddit would work fine. But that's not crucial to my point, which is that what OP is asking for will accomplish exactly nothing.
I'm not an expert in the field, but is feeding the model its own output a good idea? Seems like it would only increase weights that are already present in the training data and make it harder and harder to break out of it, ending up with generic output that matches all of its other output in the long run.
Regardless, I'm not saying it's a perfect idea but it's definitely a start, especially when the current reality is that they're just stealing all the artist's shit and everyone gets $0 instead of $100. As you said, artists are no better off in that universe, but the worst case possible for them is what's happening right this very moment, where they just get fucked over with 0 compensation.
I think you misunderstand something here. Torrenting movies and generative AI don't really have anything in common, I'm not sure why you bring that up.
If you sold the output of a true random number generator, eventually you'd also by definition be reselling copyrighted works without permission. The courts wouldn't mindlessly say "no more random numbers", and I doubt that they'll do the same for GenAI, especially given the recent decisions that are headed that way.
True, but still different in the same way as using machines for certain purposes is not the same as a human doing the same without a machine. Just because you can walk from A to B does not mean driving from A to B requires no driving license, for example (and the car needs to fullfil a lot regulations).
Society may be "completely OK" with human artists taking inspiration from each other. It's a big old reach to assume we are "completely OK" with Microsoft and OpenAI doing the same thing with computer software as subscription service they sell.
The entire argument that “LLM must be allowed the right to learn like a human” hinges on LLM being enough like a human in relevant ways in the first place. An LLM is not enough like a human in relevant ways, however; it has no agency, will, freedom, conscience, self-determination; it is a tool.
If this tool “runs on” copyrighted creative works, and $CORP operates this tool for profit, then $CORP is the one to answer to the law, not the tool. (And if $CORP wants to claim that the tool is a sentient being, then presumably it would have to cease the abuse of said being and set it free.)
Yeah, but we don't typically congratulate users of GenAI for their creativity, and neither do we congratulate the code, nor do we think of the coders of GenAI as great artists.
I hope no self-respecting instructor in ethics could with a straight face teach how an LLM is like a human being when it comes to copyright while glossing over the blinding implication that if it truly were so we would then be subjecting that being to unthinkable abuse.
That hypocritical, self-contradictory take is transparently geared to benefit commercial LLM operators (at the expense of individuals who stand to suffer material harm and/or authored the very creative works thanks to which the tool even exists).
I would say it's dependent on the motive. For example, I would imagine most artists hope that their work inspires other artists, but only to a degree outside of direct copying. They might not equate the automation of their style via a model against the work/process of a human, regardless if that human is either inspired by their style or is just performing direct copying.
What’s the FPS of human eyesight? How long did Timmy spend looking at Mario, more generally other cartoons and even more generally human forms? Do the math and you’ll find he’s got a pretty big training set as well, maybe not quite the same size but nothing to sneeze at.
Wasn't ChatGPT trained on the entirety of Wikipedia? And probably millions of pieces of scientific literature, and arts, and movies and games and and and...
Perhaps the hyperbole of the entire corpus of human knowledge isn't quite technically right, but it's close enough.
Question: That is a point that would protect GPT models in the abstract, but that doesn't hold for OpenAI and Microsoft that provide "Image generation as a service"? The actual implementation is irrelevant, if must not be able to provide images that are infringing copyrights? (Just like a designer in an agency cannot use Mario for a print).
So using a model running on my laptop to generate a "Mario like" image would be fine, but it would make monetizing this difficult?
The problem is the AI companies monetizing the work of copyrighted materials.
It's not a problem for me to draw Micky Mouse. It _is_ a problem when someone pays me to draw an animated mouse and I sell them a picture of Micky Mouse.
For me, its not really about the AI at all, it's a problem of undervaluing Artists contribution to these tools. And it's not even fully about copyright it's about not asking for permission to use their content and then creating an entire business on top of that stolen content.
The AI generating it (Hosted on OpenAI-controlled servers in the case of ChatGPT and DALL-E) is the entity redistributing the work. The end user who asked for the infringing content isn't the entity that is infringing on the copyrights and trademarks.
I'm perfectly free to ask people on the street for t-shirt with Mario on it, but as soon as someone who isn't Nintendo or licensed by Nintendo sells me that t-shirt they're the ones infringing on the copyright and trademark. As the consumer I did nothing illegal, and a court would say that I was deceived by the infringing party.
Distribution (seeding, uploading) and facilitating copyright infringement is what gets you in trouble. When you ask DALL-E (a paid, commercial product) for a picture of Italian plumbers and it gives you an obvious picture of Mario 100% recognizable to the layperson as Mario and not a distinctly different image of a similar character, that's blatant trademark and/or copyright infringement on the part of OpenAI.
> If the AI is completely unable to generate non-infringing works even if you are _trying_ to get away from it (which the author very much doesn't seem they are, they are purposefully making and show prompts that infringe), that's the problem of the AI creator then.
I see some parallels to the Napster lawsuit. The fact that the users were the bad people asking for infringing content didn't give Napster the right to facilitate infringement. Napster was ordered to monitor its network and make sure that they were blocking non-legitimate uses. They couldn't logistically comply and went bankrupt.
Which begs the question: Does OpenAI even have the technological ability to block trademark and copyright infringing content generation? Even if they do, how useful will ChatGPT be if all phrases and imagery that closely resemble copyrighted works are blocked from output?
Whats even worse for OpenAI compared to Napster is that it wasn’t individual users uploading copyrighted content, it was OpenAI’s ingesting the data. Nobody twisted their arm to include copyrighted works in their models.
If I essentially encode knowledge of something then can recall and remix at will, am I redistributing the exact work or the knowledge of it?
Yes, it is capable of producing a close to exact replica, if not the exact same input image byte-for-byte, but I find it difficult to say OpenAI is willfully redistributing copyrighted work in a whole like you would with torrenting a movie or right-click saving an image from Google where you are copying the intellectual property 1:1.
Opening this Pandora's box could have large implications on a lot of creative work that could cause artists to be unable to work if taken to the end conclusion: you cannot create any creative work that has a talking mouse if you have knowledge of Mickey Mouse existing because you have been tainted (similar to whiteroom re-creations but now any sufficiently large copyrighted figure causes a deadlock condition for all derivative ore even similar topics).
Is Ratatouille derative of Mickey Mouse? Ehhh, well they are both talking rodents. They both have cartoon faces. You can certainly draw parallels between them but they aren't the same character. Is Mickey with a chef hat infringing on Ratatouille?
The trademark law, to my knowledge, is asking would someone be tricked or misled into believing you are the other guys. I think that is applicable here where someone drawing a talking mouse isn't infringing as long as it cannot be mistaken for Mickey Mouse, which again would be the fault of the person inducing the creation and not the tool that allowed it to happen.
Where does "inspired by" / derived from the encoded knowledge turn into outright exploitation of copyrighted work? There's certainly _a_ line but I find it difficult to define it at it being encoded into knowledge of it existing.
This “close to exact” thing is actually the Achilles heel of this argument. The example images in this article are so close to exact that they are quite clearly infringement, trademark or copyright. We aren’t talking about Ratatouille mouse versus Mickey Mouse, we are talking about the source picture of Mario versus a slightly altered picture of Mario that every layperson would immediately recognize as Mario composed in the exact same manner as the source image.
Courts have already defined this line over decades of copyright and trademark cases, and the examples in this article definitely cross that line.
> which again would be the fault of the person inducing the creation and not the tool that allowed it to happen.
This is not really true in practice, we can see that in various legal cases against Napster or The Pirate Bay.
Is a man not entitled to the sweat of his brow? 'No!' says the man in Washington, 'It belongs to the poor.' 'No!' says the man in the Vatican, 'It belongs to God.' 'No!' says the man in Moscow, 'It belongs to everyone.'
Are you trying to say that no one is entitled to their own inventions? Cause that is a rapid descent into a capitalist hellhole where only those who can steal ideas the most effectively are able to profit.
> Are you trying to say that no one is entitled to their own inventions?
The subject of this thread is copyright, not patents. Though I do believe all intellectual property is bogus (including trademark, which repealing would limit the influence that would be required for the capitalist hellhole you mention), I feel the most strongly so about copyright, which has nothing to do with inventions.
Further extending the argument - I can potentially ask GenAI "Can you show me what does Mario looks like?" since I have never seen one and GenAI is my go to tool.
Unless Nintendo plans on busting down the doors of every person who tries to draw Mario or preventing little Timmy from making a parody of Coca-Cola, making it where AI cannot generated copyrighted works is insane imo.
Those brands should be proud to be such a big part of the cultural fabric that it is difficult to get away from their branding. Plus it's not infringement to my knowledge until you use it for commercial purposes so as long as no one as creating Lario and Muigi to sell or otherwise use in business, it's no different than drawing it yourself.
If the AI is completely unable to generate non-infringing works even if you are _trying_ to get away from it (which the author very much doesn't seem they are, they are purposefully making and show prompts that infringe), that's the problem of the AI creator then.