Or... things are about to get worse for copyright holders.
I don't see any developped country pressing the brake on AGI in the near future to protect a few copyright holders from getting "stolen" in hypothetic scenarios.
Copyright should be the problem of the person using the works and not the problem of the AI generating it.
Unless Nintendo plans on busting down the doors of every person who tries to draw Mario or preventing little Timmy from making a parody of Coca-Cola, making it where AI cannot generated copyrighted works is insane imo.
Those brands should be proud to be such a big part of the cultural fabric that it is difficult to get away from their branding. Plus it's not infringement to my knowledge until you use it for commercial purposes so as long as no one as creating Lario and Muigi to sell or otherwise use in business, it's no different than drawing it yourself.
If the AI is completely unable to generate non-infringing works even if you are _trying_ to get away from it (which the author very much doesn't seem they are, they are purposefully making and show prompts that infringe), that's the problem of the AI creator then.
When I cover generative AI in my Ethics in AI lecture, one of few soapbox opinions I give is that GenAI is doing essentially what people do - copy others. Picasso has a quote about "Good Artists copy, Great Artists steal", which doesn't mean try to pass Lario and Muigi off as your own, but rather that great artists are able to take aspects from other works (also called 'inspiration') without being caught. My personality is a combination of elements taken from Jim Carrey, Robin Williams, and King of the Deathmatch Mick Foley. I like making vector graphics based on pictures. I have a folder on my computer called "Website Ideas" that's just screenshots of UIs that I've come across that I really like.
I also point to a YouTube video by Kirby Ferguson "Everything is a Remix" [1] which talks about how so much of our collective culture stems from copy. It's a great video if you have an hour.
When Little Timmy crayons a copy of Mario, we congratulate him for his creativity. Is it unique, one of a kind art? Well Timmy made it, but he didn't think up the original idea of a video game plumber. I give this view to GenAI right now - it's not capable of achieving that "next step" in "original design", but its performing like a novice artist/musician, it's mimicking what it sees.
Rounding up a transaction and taking the leftovers wouldn’t be a crime worthy of the FBI for one transaction but it would be for a million or a billion. Scale matters and impact matters.
If you’re making an ethical argument “it’s okay because it’s already happening to a lesser degree somewhere else” isn’t the flex you think it is.
If you’re talking ethics, talk about impact. Who does it help the most who does it hurt the most? Is your argument favoring equality of access or outcome? Who is the most vulnerable in the situation and how will it impact them?
I teach it, my background is located in my profile and my research focuses on CS education.
Scale and impact do matter, I wholeheartedly agree. However, I stand by my point that genAI is mirroring how humans learn - repetition of previously observed actions. As part of my dissertation, I argued that humans operate using 'templates', or previously established frameworks / systems. Even in higher cognitive tasks like problem solving, we rely on workflows that we were trained on previously. Soloway referred to problem solving as a mental set of "basic recurring plans" [1] and if you look at the old 1980s Usborne children's books, they required kids to retype code [2]. For creative tasks, depending on the actor's background, Method and Meisner both tell people to draw from previous experiences and observations to develop a character. This behavior is similar in many areas like music, dance, martial arts, cooking, language acquisition, etc.
I am not making an ethical argument that GenAI violating copyright is okay because that's what humans do. I'm arguing that GenAI mirrors how humans learn. We observe a behavior and attempt to recreate that behavior. The difference is that humans can extract a fraction of the behavior and utilize it as part of something larger while GenAI cannot to the degree humans do. I'm sure GenAI would struggle to recreate "Who Framed Roger Rabbit?" because of the two polar different visual elements of the film (cartoon and real life).
In regards to your "If you’re talking ethics, talk about impact" section, its a bit of a loaded question. One side of the conversation could state that GenAI is helping many people that do not have confidence in their creative ability to produce their ideas, while the other could state its making it harder for artists.
Yes, it absolutely is hurting artists and I fully support the recent writer's strike over AI concerns. But I do not believe that diminishes how the mathematical models used in GenAI mirror our own skill acquistion.
I took an AI in ethics course from a state backed school (Georgia Tech) and the answer to questions that weren’t “that’s illegal based on protected status” were “well, it depends.” Which, sure, that’s true, but maybe not helpful.
In my view it encouraged nihilism and apathy instead of developing ethical frameworks. From that lens, I feel teaching a course might be more limiting in the range of heuristics you’re willing to accept or endorse. Though happy to accept your personal experience.
A paper that comes to mind often from HCI is “do artifacts have politics” which looks at the impacts of technologies divorced from creator intent. I feel that’s similar here.
You’re not wrong that about the mechanism that it’s created. But I would argue that’s the least important part, ethically anyway.
Saying “strip mining with heavy industrial machines mimics laborers using shovels” is true to a degree, but but perhaps not that important piece of information.
I’m not saying you’re making that argument. I guess im just not totally sure the outcome you were looking for in sharing your original comment. I hear your comparison and agree with it and that it is interesting to view in that lense. I wasn’t sure if there was a deeper intent in sharing it.
Apologies for the delayed response, but on the bright side it's faster than I respond to some emails XD. I should preface the course I was referring to was "Intro to AI", not "Ethics in AI". I only have a single lecture dedicated to ethics, but do try to pepper it in as we cover topics. My original comments were more addressing "how humans learn" rather than any higher level ethical concerns. Your last section on "deeper intent" is correct, there wasn't any.
I have a pretty neutral stance to GenAI, mostly due to personality however it also stems from my background as well as recognizing students' interests. Prior to CS Education, my master thesis involved computer vision for catching "high valued targets", but was also funded to help minimize human trafficking. I have students in my classes that are very interested in going to work for defense companies like Lockheed and Raytheon, and I have others that are really interested in using AI for "social good" areas like healthcare and education. I try to have a neutral stance because: A) I hated the professors that I took that would use their lecture time to express their political opinions, B) opinions that are opposite to a student may otherwise discourage them from learning the material, and C) my primary focus is to make sure they learn the material and do it "right".
When I started teaching, I used the analogy that if they go on to write the software for the life support machine I'm hooked up it, it WORKS. If someone wants to go on to use AI to create weapons, I can't stop them anymore than I can force them to read a chapter or convincing the person beside on the highway to slow down. I just work to ensure they do it correctly (which includes being mindful of the ethical ramifications of using algorithm X for task Y).
What would an ethical framework for designing AI for a drone even look like? I have no idea, nor is it something I'm interested in delving into. I got out of face recognition for those reasons. Does an ethical framework for GenAI require the same elements, a fraction of them, or a completely different set of guidelines? Who gets to decide them - the 'experts' in AI, the government, society as a whole?
Personally, I've made the comment that the current opinions on regulating AI are like "everyone trying to be AI's parent". We're never going to agree because everyone has a different opinion on the "right" way to handle AI. Plus, human cognition is so unknown and illogical that we may never figure out a way to perfectly replicate human intelligence. I instead try to stay somewhat optimistic and marvel at the math we've used to create "AI".
Do you really see no difference between someone drawing a piece of fan art and trillion dollar corporations stealing other people's works and reselling it for their own profit with no regards to anyone or anything else?
And yes, obviously society cares about many things depending on the scales in question. It's okay if a dude goes onto a lake on his small rowboat and catches a few fish for dinner, it's a completely different story if you're talking about a massive barge indiscriminately catching literally thousands of fish with huge nets. The latter has to adhere to much stricter rules than the prior, and I think you'd be hard pressed to find anyone who thinks these 2 situations should be treated equally (unless you're a commercial fisherman with a barge, I suppose, the quote "It is difficult to get a man to understand something when his salary depends on his not understanding it." comes to mind here)
> Do you really see no difference between someone drawing a piece of fan art
In the history of the world only a single person has ever drawn fan art?
No, I don't think that's the case.
Instead it is widespread. It is everywhere.
> depending on the scales in question
The scale argument supports me, not you.
This type of "infringement" is everywhere.
> reselling it for their own profit with no regards to anyone or anything else?
Even this is common. The online independent artist commissions market is full of people doing commercial fan art commissions.
Thinking about this even more, I am now wondering if "infringing" works might actually be a majority of the online/independent commissions market. Maybe.
> In the history of the world only a single person has ever drawn fan art?
That's a disingenuous take of my comment at best, the equivalent to my scenario is a bunch of unrelated individuals with small boats going out into whatever lake is nearest to them and fishing. Even if you put all of them together and counted how many fish the hobby fishermen catch, it's still nowhere near the scale of the commercial fisheries, which is why they're treated differently both by society at larger but also legally.
Same thing with these AI models, Dall-E and all the other ones have probably generated more images than all of humanity has in its entire history so far, and if not quite yet they're definitely gonna get there sooner rather than later. They can generate dozens if not hundreds of images in a split second, whereas a single artist (or even many artists collectively) can't.
> And yet, nobody cares.
I think we've already established that, because scales absolutely matter for most things. If you want to be an absolutist about it, sure be my guest, but I think in reality the large majority of people are fine when your average Joe Schmoe the artist makes a commission on a random Disney character, whereas they definitely would NOT be okay with a massive conglomerate like Disney stealing Joe Schmoe's original art and repurposing it without compensating Joe, because there's an inherent power disbalance between the two and the consequences of that power disparity matters.
I mean, Disney does have every right to go after Joe for his commissions if they really wanted to, similarly to how Nintendo is hyper aggressive with taking down anything relating to their IPs. It's just not really worth it for most companies, they will absolutely go for another company trying to pull the same shit though, as can be seen with the NYT case.
By that logic I can torrent movies and distribute them all I'd like as long as I call it "Generative Watching" or something like that.
And OpenAI quite literally sells access to their models, and if those models are pushing out verbatim copyrighted works as has been alleged by the NYT, then they are by definition reselling copyrighted works without permission.
> And OpenAI quite literally sells access to their models, and if those models are pushing out verbatim copyrighted works as has been alleged by the NYT, then they are by definition reselling copyrighted works without permission.
This style of argument has been previously made regarding things like torrenting during the heyday of piracy ("why would you need <x> except for illegal purposes!")
In my opinion, it's the exact same argument saying that selling a tool means taking responsibility for how that tool is used by its new owner. You can use a shovel to both create something new (plant a tree) or destroy something (rip up your neighbor's garden).
The problem isn't the tool, the problem is how the end user uses it. These models aren't living thinking entities that enduce or on their own infringe copyright / do other illegal activities.
They aren't encouraging people to misuse them and it is solely on the user's shoulders for their choice to use them in a way that would cause infringement if the result is used commercially.
> They aren't encouraging people to misuse them and it is solely on the user's shoulders for their choice to use them in a way that would cause infringement if the result is used commercially.
I agree in principle, but that they can in the first place, especially when it accidentally happens, and at such massive scales more importantly, is the issue methinks.
And no one's talking about abolishing the AIs here, we're just talking about wanting M$/OAI to do their due diligence and get access to their training materials fairly. NYT wouldn't have sued if M$/OAI had approached them and struck a deal of some sort with them, but that's not what they did. They took in whatever data they could, from wherever they could baring no mind at all to where the data came from and what was being done with it.
There's a reason Getty images managed to strike a deal with Dall-E and why many of the image generation models now solely rely on data that is verifiably free of copyright (or where deals have been made in the case of Getty images). It's easier to see in pictures when a blatant copy is made (like watermarks) so it's obvious why Dall-E was the first to encounter this hurdle, but this was inevitable even for plain text that ChatGPT returns.
You won't get what you want with those sorts of deals.
OK, say every artist gets $100, one time (exact amount varies but would not be much). Everything's properly licensed according to you and the artists are essentially no better off, and the models are now good enough to create new training data for the future and artists never see any more money.
Training AI on AI generated data doesn't add anything. The AI already has all the weights to generate the image, so you are at best just reinforcing the existing weights by weighing them more than others.
The closest thing you could do is e.g. have a second model that does something novel like create a 3D model from a 2D image and then you try to animate the model and a third model verifies the quality of the output. This then allows you to selectively reinforce the 2D model using information from the 3D model but this isn't simply generating more training data.
I honestly can't follow your argument. Doing something silly doesn't make you the underdog.
My point is that say every artist gets some small token payment once, and then what? That's not enough to live on, so we're right back to square one and we've solved nothing.
Incidentally yes, training AI on AI output will work fine, as long as you have a signal of quality. For example, upvotes in a subreddit would work fine. But that's not crucial to my point, which is that what OP is asking for will accomplish exactly nothing.
I'm not an expert in the field, but is feeding the model its own output a good idea? Seems like it would only increase weights that are already present in the training data and make it harder and harder to break out of it, ending up with generic output that matches all of its other output in the long run.
Regardless, I'm not saying it's a perfect idea but it's definitely a start, especially when the current reality is that they're just stealing all the artist's shit and everyone gets $0 instead of $100. As you said, artists are no better off in that universe, but the worst case possible for them is what's happening right this very moment, where they just get fucked over with 0 compensation.
I think you misunderstand something here. Torrenting movies and generative AI don't really have anything in common, I'm not sure why you bring that up.
If you sold the output of a true random number generator, eventually you'd also by definition be reselling copyrighted works without permission. The courts wouldn't mindlessly say "no more random numbers", and I doubt that they'll do the same for GenAI, especially given the recent decisions that are headed that way.
True, but still different in the same way as using machines for certain purposes is not the same as a human doing the same without a machine. Just because you can walk from A to B does not mean driving from A to B requires no driving license, for example (and the car needs to fullfil a lot regulations).
Society may be "completely OK" with human artists taking inspiration from each other. It's a big old reach to assume we are "completely OK" with Microsoft and OpenAI doing the same thing with computer software as subscription service they sell.
The entire argument that “LLM must be allowed the right to learn like a human” hinges on LLM being enough like a human in relevant ways in the first place. An LLM is not enough like a human in relevant ways, however; it has no agency, will, freedom, conscience, self-determination; it is a tool.
If this tool “runs on” copyrighted creative works, and $CORP operates this tool for profit, then $CORP is the one to answer to the law, not the tool. (And if $CORP wants to claim that the tool is a sentient being, then presumably it would have to cease the abuse of said being and set it free.)
Yeah, but we don't typically congratulate users of GenAI for their creativity, and neither do we congratulate the code, nor do we think of the coders of GenAI as great artists.
I hope no self-respecting instructor in ethics could with a straight face teach how an LLM is like a human being when it comes to copyright while glossing over the blinding implication that if it truly were so we would then be subjecting that being to unthinkable abuse.
That hypocritical, self-contradictory take is transparently geared to benefit commercial LLM operators (at the expense of individuals who stand to suffer material harm and/or authored the very creative works thanks to which the tool even exists).
I would say it's dependent on the motive. For example, I would imagine most artists hope that their work inspires other artists, but only to a degree outside of direct copying. They might not equate the automation of their style via a model against the work/process of a human, regardless if that human is either inspired by their style or is just performing direct copying.
What’s the FPS of human eyesight? How long did Timmy spend looking at Mario, more generally other cartoons and even more generally human forms? Do the math and you’ll find he’s got a pretty big training set as well, maybe not quite the same size but nothing to sneeze at.
Wasn't ChatGPT trained on the entirety of Wikipedia? And probably millions of pieces of scientific literature, and arts, and movies and games and and and...
Perhaps the hyperbole of the entire corpus of human knowledge isn't quite technically right, but it's close enough.
Question: That is a point that would protect GPT models in the abstract, but that doesn't hold for OpenAI and Microsoft that provide "Image generation as a service"? The actual implementation is irrelevant, if must not be able to provide images that are infringing copyrights? (Just like a designer in an agency cannot use Mario for a print).
So using a model running on my laptop to generate a "Mario like" image would be fine, but it would make monetizing this difficult?
The problem is the AI companies monetizing the work of copyrighted materials.
It's not a problem for me to draw Micky Mouse. It _is_ a problem when someone pays me to draw an animated mouse and I sell them a picture of Micky Mouse.
For me, its not really about the AI at all, it's a problem of undervaluing Artists contribution to these tools. And it's not even fully about copyright it's about not asking for permission to use their content and then creating an entire business on top of that stolen content.
The AI generating it (Hosted on OpenAI-controlled servers in the case of ChatGPT and DALL-E) is the entity redistributing the work. The end user who asked for the infringing content isn't the entity that is infringing on the copyrights and trademarks.
I'm perfectly free to ask people on the street for t-shirt with Mario on it, but as soon as someone who isn't Nintendo or licensed by Nintendo sells me that t-shirt they're the ones infringing on the copyright and trademark. As the consumer I did nothing illegal, and a court would say that I was deceived by the infringing party.
Distribution (seeding, uploading) and facilitating copyright infringement is what gets you in trouble. When you ask DALL-E (a paid, commercial product) for a picture of Italian plumbers and it gives you an obvious picture of Mario 100% recognizable to the layperson as Mario and not a distinctly different image of a similar character, that's blatant trademark and/or copyright infringement on the part of OpenAI.
> If the AI is completely unable to generate non-infringing works even if you are _trying_ to get away from it (which the author very much doesn't seem they are, they are purposefully making and show prompts that infringe), that's the problem of the AI creator then.
I see some parallels to the Napster lawsuit. The fact that the users were the bad people asking for infringing content didn't give Napster the right to facilitate infringement. Napster was ordered to monitor its network and make sure that they were blocking non-legitimate uses. They couldn't logistically comply and went bankrupt.
Which begs the question: Does OpenAI even have the technological ability to block trademark and copyright infringing content generation? Even if they do, how useful will ChatGPT be if all phrases and imagery that closely resemble copyrighted works are blocked from output?
Whats even worse for OpenAI compared to Napster is that it wasn’t individual users uploading copyrighted content, it was OpenAI’s ingesting the data. Nobody twisted their arm to include copyrighted works in their models.
If I essentially encode knowledge of something then can recall and remix at will, am I redistributing the exact work or the knowledge of it?
Yes, it is capable of producing a close to exact replica, if not the exact same input image byte-for-byte, but I find it difficult to say OpenAI is willfully redistributing copyrighted work in a whole like you would with torrenting a movie or right-click saving an image from Google where you are copying the intellectual property 1:1.
Opening this Pandora's box could have large implications on a lot of creative work that could cause artists to be unable to work if taken to the end conclusion: you cannot create any creative work that has a talking mouse if you have knowledge of Mickey Mouse existing because you have been tainted (similar to whiteroom re-creations but now any sufficiently large copyrighted figure causes a deadlock condition for all derivative ore even similar topics).
Is Ratatouille derative of Mickey Mouse? Ehhh, well they are both talking rodents. They both have cartoon faces. You can certainly draw parallels between them but they aren't the same character. Is Mickey with a chef hat infringing on Ratatouille?
The trademark law, to my knowledge, is asking would someone be tricked or misled into believing you are the other guys. I think that is applicable here where someone drawing a talking mouse isn't infringing as long as it cannot be mistaken for Mickey Mouse, which again would be the fault of the person inducing the creation and not the tool that allowed it to happen.
Where does "inspired by" / derived from the encoded knowledge turn into outright exploitation of copyrighted work? There's certainly _a_ line but I find it difficult to define it at it being encoded into knowledge of it existing.
This “close to exact” thing is actually the Achilles heel of this argument. The example images in this article are so close to exact that they are quite clearly infringement, trademark or copyright. We aren’t talking about Ratatouille mouse versus Mickey Mouse, we are talking about the source picture of Mario versus a slightly altered picture of Mario that every layperson would immediately recognize as Mario composed in the exact same manner as the source image.
Courts have already defined this line over decades of copyright and trademark cases, and the examples in this article definitely cross that line.
> which again would be the fault of the person inducing the creation and not the tool that allowed it to happen.
This is not really true in practice, we can see that in various legal cases against Napster or The Pirate Bay.
Is a man not entitled to the sweat of his brow? 'No!' says the man in Washington, 'It belongs to the poor.' 'No!' says the man in the Vatican, 'It belongs to God.' 'No!' says the man in Moscow, 'It belongs to everyone.'
Are you trying to say that no one is entitled to their own inventions? Cause that is a rapid descent into a capitalist hellhole where only those who can steal ideas the most effectively are able to profit.
> Are you trying to say that no one is entitled to their own inventions?
The subject of this thread is copyright, not patents. Though I do believe all intellectual property is bogus (including trademark, which repealing would limit the influence that would be required for the capitalist hellhole you mention), I feel the most strongly so about copyright, which has nothing to do with inventions.
Further extending the argument - I can potentially ask GenAI "Can you show me what does Mario looks like?" since I have never seen one and GenAI is my go to tool.
Something that is purely speculative, undefined, and has been promised in the near future for 50+ years.
I don't see copyright holders lying down for someone else's benefit and I don't see governments gutting copyright, contract law, and several other avenues of protection that copyright holders can deploy in the name of something that doesn't exist and may not ever exist.
The examples given are all billion-dollar, decades old characters. The volume of material directly/indirectly referencing those characters in a random internet crawl will be fairly large. Most copyrighted works won't have that issue. If anything it means they only infringe on archetypal works and not the other 99.9%. If I write a story involving robots and spaceships (of which there are many, before and since Star Wars) DALL-E won't infringe me because it will be busy infringing on Star Wars.
I'm opposed to my (fairly minor) copyrighted works being used in GenAI datasets as well. I just have no practical way to stop it, and there aren't clear enough damages to sue. That doesn't make it legal.
OpenAI also plays some ugly games with regards to the difference between training and search. Search requests come from the `ChatGPT-User` user-agent, and I'd like to allow those; training and scraping requests come from `GPTBot`, and I have no interest in those. But as per their own documentation, putting one in robots.txt disables the other.
The examples were chosen by the author to make a point precisely because they are well known.
But every single copyright holder with their works online (which includes you and me) has the same legal rights as the NYT or Disney. Naturally some copyright holders have more real-world capability to go legal than others, but that does not reduce the legal risk.
> If anything it means they only infringe on archetypal works and not the other 99.9%
How on earth do you get to that conclusion? There's no "popularity" floor to copyright protection. Either a work has been infringed or it hasn't.
I will update my language that mimicked the original comment. However it is not simple discussion. Here is a snippet related to Japan’s law, updated from ca 2018 with AI systems in mind, and clarified recently. I personally find it totally reasonable and support it.
“The use of copyrighted products or materials to train generative AI models would be prima facie copyright infringement under the Copyright Act, as it is a reproduction (fukusei) or other form of use of the copyrighted work. However, Article 30-4 of the Copyright Act stipulates that the use of copyrighted works by generative AI for learning purposes is allowed in principle.”
And it goes on to say "unless such use of copyrighted works unreasonably prejudices the interests of the copyright owner, in light of the nature or purpose of the work or the circumstances of its exploitation in Japan."
Which suggests that when AI art threatens commercial interests, the protection offered by 30-4 can disappear.
To me it sounds like they tried to please everyone and left the hard decisions about conflicting interests to the courts (in particular the courts will have to decide what "unreasonably" means).
If a child is instructed to read a copyrighted work at school, which later becomes a factor in his own derivative works, he won't be in breach of copyright.
Why should other intelligent entities be prevented from reading copyrighted works and gaining whatever there is to gain from those works the way any human might?
If the child / author then regurgitates entire paragraphs or sections verbatim in his own works and someone notices, you bet there will be a plagiarism lawsuit coming his way.
In that case, the person legally liable for publishing the material is sued for infringement of the work. You don't send someone to jail because they're simply capable of infringing; they have to actually do it, and you have to actually show the specific work whose copyright was infringed upon.
You can also get into the weeds of what's copyright-able (ask Donald Faison about his Poison dance). If you ask for C-3PO and you get C-3PO as he appears in Star Wars promotional material, that seems cut and dry. What if you ask for a "golden robot"? What if you get a robot that looks like C-3PO but with a triangular torso symbol instead of his circular one? What's parody, what's fair use?
Especially true if that child or its mother has a huge market capitalization, large profit margins, highly-paid employees and shareholders eager to reap some more $$.
If the public starts to see LLMs as highly sophisticated copyright laundromats it would most likely hamper further investment & development in that field.
> Especially true if that child or its mother has a huge market capitalization, large profit margins, highly-paid employees and shareholders eager to reap some more $$.
This is the bit I don’t get from the “feed everything to machine” LLM-maximalists. Do they think courts don’t take context into account, do they think all actions happen in a vacuum and that they can just skip along and ignore laws at their pleasure because “tee hee it’s totally definitely fair use bro, I’m totally an academic researcher-pinky promise”.
LLM bros ought to stop and have a think before they poison their own well, assuming they haven’t already done so.
>This is the bit I don’t get from the “feed everything to machine” LLM-maximalists. Do they think courts don’t take context into account, do they think all actions happen in a vacuum and that they can just skip along and ignore laws at their pleasure
An entire generation of unicorn startups believed that (Uber, AirBnB, etc.). We see in the news every day that once you have enough money laws don't apply to you (most things Elon Musk does, the fact that Trump can defy court orders repeatedly and not go to jail, etc.) so yes, this seems entirely plausible.
The 2 darling startups that are now facing increasingly less rosy futures?
Airbnb in particular is facing enough backlash that I’d be surprised if it lasts terribly much longer.
Sure, they get away with it for a while, but not forever.
> We see in the news every day that once you have enough money laws don't apply to you
I agree with you here, but I think this is a much broader conversation about capitalism in general which would be getting a bit off-topic for this particular thread, except to say, capitalist forces aren’t above cauterising a limb if it becomes too annoying or intrudes on the other limbs too much. I think the “AI” limb might be overstating its own importance, and I suspect that if it got too up in everyone’s interests re-profit, it would, as an industry, very quickly find itself being neutered. Capital interests would love to get rid of pesky human labour, but if the alternative is too annoying, they’ll have no objections to going back to grinding people through the system again.
AirBnB will get away with it forever. While short term rentals might get banned in a handful of cities, the service now operates worldwide. The stock might be overvalued but if you examine their financials it's simply not plausible to think that failure is imminent.
Sure. But if the child has that capability, it doesn't automatically make them a walking copyright violation. "Intelligence", even the current version of AI, entails knowing about stuff, including being able to recite. That doesn't mean intelligence's existence violates copyright. If a person used AI to make a copyright violating work, that's a different story, just like if they used their own innate intelligence to do so.
Taken to its conclusion, liability is then on everyone who decides to publish anything that ChatGPT “tells” them, because it might cross the threshold on plagiarism.
Are the OpenAIs of the world ready to shield their customers from that liability?
If it turns out that using ChatGPT to help you write your resumé opens you up to accusations of plagiarism, or DALL·E to create an image for your website opens you to copyright violation, will you use them?
> Taken to its conclusion, liability is then on everyone who decides to publish anything that ChatGPT “tells” them
Yes. Just like reading anything else on the internet. An LLM is no different from typing "popular cola logo" into Google search and claiming you invented it. If I type "cola logo" into DALL-E and get a replica of Coca-Cola... that doesn't mean I created that logo and can exploit it for commercial purposes.
> Are the OpenAIs of the world ready to shield their customers from that liability?
Why would they? We aren't suing pen manufacturers because someone wrote something libelous using their pen. We aren't busting down the doors of Crayola because little Johnny used the crayons to draw Mario.
OpenAI might not want to shield all their customers from liability, but that is exactly what GitHub have done with Copilot. It's not a hypothetical, it's being done today.
I mean get this great auto complete; if you use it, your code might be AGPLed for all you know, and you're in violation, because you didn't even add a notice.
In a heartbeat. It's time for the old paradigms to die and new ones to be formed.
If ASI can exist I don't believe our the old methods of intellectual fortifications will continue to work in the future. Much like castle walls aren't used to protect against guided missiles.
This is an extremely dangerous precedent that I think you are purposefully trying to put forward.
It's a horrendously bad idea especially for startups to make it apps' faults for how users use their platform. It's only in the benefit of entrenched tech companies to make this precedent.
This argument might hold more water when generative models are more than fancy compression algorithms/text completion engines.
A more practical way of looking at this is: who is making money off of these models? How did they get their training data?
I’m not a fan of copyright in general, but we have serious outstanding issues with companies and organizations stealing or plastering work without compensating the original creators of said works. Thusfar, LLMs are becoming another method to concentrate wealth to whoever has the resources to train and sell these models at scale.
> I’m not a fan of copyright in general, but we have serious outstanding issues with companies and organizations stealing or plastering work without compensating the original creators of said works.
Would you mind unpacking this one a bit? It sounds like you denigrate copyright (some "general" grievance) but then immediately execute an about-face and begin to extoll its virtues. Is copyright not the thing that allows us to share works without fear they'll be stolen?
I think they are expressing a view that we ought to offer less protection / more scrutiny to larger commercial entities, which concentrate disproportionate amounts of wealth and power, compared to smaller entities. I tend to agree.
This is more or less correct. We can have systems to compensate creators that aren't identical to the copyright system we have today. If I were to rephrase my previous statement, I'd clarify instead saying: "I do not like copyright as it exists today."
As a society we want to incentivize innovation and reward things that advance society. One of the ways we do that today is copyright. It doesn't need to be the only way, or be done in the ways we do it now.
> This argument might hold more water when generative models are more than fancy compression algorithms/text completion engines.
I doubt that part of the argument would change even if we perfected brain uploads.
Now, if you gave the current LLMs a robot body with a cute face, that'll probably change minds faster, regardless of the underlying architecture.
> who is making money off of these models?
When the models are open source, or at least may be downloaded and used locally for no cost, that would be the users of the models.
And back to the biological comparison: I learned to read (and also to code) in part from the Commodore 64 user manual, should I owe the shareholders anything for my lifetime earnings? As I got to the end of that sentence, a thought struck me: taxes do that. And in the UK the question of if university should be funded by taxes or by the students themselves followed the same lines.
> When the models are open source, or at least may be downloaded and used locally for no cost, that would be the users of the models.
I think there's a bit more nuance to this. The profits go to those with the ability to run these models and to those with the infrastructure (or capitol) to run said models. I'm hoping this will change and we'll see lower barriers to entry as LLMs are made more accessible over time.
> And back to the biological comparison: I learned to read (and also to code) in part from the Commodore 64 user manual, should I owe the shareholders anything for my lifetime earnings?
This is more a philosophical question than anything else. I don't think there's right or wrong answer, but in my opinion the answers we arrive at should provide as much benefit to as many people as possible.
> As I got to the end of that sentence, a thought struck me: taxes do that. And in the UK the question of if university should be funded by taxes or by the students themselves followed the same lines.
I agree with your assessment and this model lines up well with my own opinions on reasonable ways to ensure equitable benefit from AI (be it ML, LLMs, or some theoretical general AI in the future).
> I’m not a fan of copyright in general, but we have serious outstanding issues with companies and organizations stealing or plastering work without compensating the original creators of said works
Copyright is meant to give the original creator a monopoly over their creation (so that others don't profit off of their work). Are you not a fan of copyright in its current scope / implementation? Because it sounds like you do agree with its goal.
> Copyright is meant to give the original creator a monopoly over their creation (so that others don't profit off of their work).
Correct me if I'm wrong, but my understanding is that the goal of copyright is to incentivize innovation (specifically of art and culture) and to provide innovators a way recoup (and profit) off of innovation they've made public. I view it as similar to how patents work in that it's an incentive for people to publicize and share their works more broadly.
> Are you not a fan of copyright in its current scope / implementation? Because it sounds like you do agree with its goal.
I have a differing understanding of the goal of copyright based off of what you've said, but I think our understandings are similar in that the copyright holder benefits from copyright/patents of their works.
I dislike the ways our current implementations of copyright are abused. I think the concept of fair use makes copyright as it is today workable. I also think our current copyright laws (at least in the US) have a lot of failure modes that subvert what I believe the purpose of copyright should be: to advance art and culture with legal and economic incentive.
If llms are intelligent entities legally equivalent to a human child, then they incur an even more serious legal problem, as we are all in violation of the 13th amendment.
150 years ago society exists by and for men specifically (as in: not women) in most nations; 220 years ago, US society was by and for rich white (specifically white) land owners.
I don't know when AI will count as people in law, or even if they ever will; we may well pass laws prohibiting the creation of any mind in danger of coming close to this threshold.
But be wary, for AI acting enough like people is different to being anything like a person on the inside, and that means being wrong in either direction can have horrifying consequences. To appear but not to be conscious, leads to a worthless future. To be but not to appear conscious, leads to a fate worse than the history of slavery, for the slaves were eventually freed.
A child isn't a computer program, and no amount of anthropomorphizing will ever make them so.
Especially ChatGPT and other LLMs, they're not even close to being AGI or an "intelligent entity" as you put it, despite what all the AI-bro hype and marketing would like everyone else to believe.
Only because all three letters of the initialism mean different things to different people.
Existing LLMs won't do everything, but bluntly: good, we're not ready for a world where there is an AI that can do everything for $1-60/million words[0], and we need to get ready for that world before we find ourselves living in it.
ChatGPT-3.5 has a lot of weaknesses, but it can still do a better job of coding than a few of my coworkers demonstrated over the last 20 years. I'm listening to a German language learning podcast, and the hosts mentioned using it to help summarise a long email from one of their listeners. My sister has work anecdotes about it helping, and she's not in tech. Influencers, teachers, lawyers, Hollywood writers… well, "moral panic" doesn't tell you much… the game Doom was 30 years ago, and that had a moral panic that looks quaint given how much FPS games' graphics improved with each subsequent release, and I suspect ChatGPT-3.5 was to conversational AI what Doom was to 3D realtime gaming: the point at which people take note, followed by a decade of every new release being (wrongly) called "photorealistic".
[0] current pricing for gpt-3.5-turbo-1106 ($0.0010 / 1K tokens) and gpt-4-32k ($0.06 / 1K tokens) pricing: https://openai.com/pricing
> ChatGPT-3.5 has a lot of weaknesses, but it can still do a better job of coding than a few of my coworkers demonstrated over the last 20 years.
Whenever people say stuff like this I can't help but wonder what on earth kind of projects they work on. Even GPT4, while useful for things like reformatting or generating boilerplate code and stuff like that, it's still a far cry from any decent dev I've ever worked with, especially if you're not using a popular language like JS or Python.
My usual PRs at work are pretty big, complex pieces of code that all have to actually work when integrated with the larger system around it, no AI tool I've tried so far has come even close to acceptable here, other than for generating some boilerplate code that I would've written myself anyway. But even with the innocent-looking boilerplate there's always a weird gotcha that isn't obvious until you really analyze the code closely. It ends up saving nothing more than a few keystrokes, if that, yet people say all the time that they're generating entire pieces of software by gluing together code it spits out, which I find absolutely insane given my anecdotal attempts at it.
This can circumvented by going with more elaborate in-depth prompts, but at that point are you really saving on effort compared to the alternative? Is it really more efficient? By the time I have a prompt complex enough for it to spit out something good at me, I could've already bashed out the code myself anyways.
That's not even mentioning all the legacy shit you have to keep in mind for any one line of code, plus whatever conventions and standards your team uses and has etc.
I mean it works great for a function or whatever, but is that seriously what most people are working on? Simple, one-off independent function calls that don't interact in any way with anything within a larger system? Even simple CRUD apps aren't so well isolated.
Don't even get me started on the actual difficult part which is the whole preamble to creating the ticket in JIRA or whatever task management software you use where you're talking with stakeholders and planning out the work ahead, you're telling me you're paying 'Open'AI to do that whole rigamarole for you, and you're doing it successfully?
> Whenever people say stuff like this I can't help but wonder what on earth kind of projects they work on.
Terrifyingly, one of the bad human examples was doing C++. That person didn't know, or care to learn about, the standard template library; and they also duplicated entire files rather than changing access specifiers from private to public so they could subclass; and one feature they worked on was to support a change from storing data as a custom file format to a database, and the transition could take 20 minutes on some inputs even though neither loading before nor after this transition took more than milliseconds, and they insisted during one of the standups the code couldn't possibly be improved… the next day I looked at it for a bit, removed an unnecessary O(n^2) operation, and the transition code went back down to milliseconds. Oh, and a thousand(!) line long block for an if statement that always evaluated true.
The whole codebase was several times too big to fit into the context window for any version of any GPT model thanks to both this duplication and to keeping old versions of functions around "for reference" (their words), but if it had been rewritten to be more sensible it might just about fit into the biggest.
(My other examples were either still at, or fresh out of, university; but this person should have known better).
> Don't even get me started on the actual difficult part which is the whole preamble to creating the ticket in JIRA or whatever task management software you use where you're talking with stakeholders and planning out the work ahead, you're telling me you're paying 'Open'AI to do that whole rigamarole for you, and you're doing it successfully?
If it was all-round good, none of us would have jobs any more.
> Whenever people say stuff like this I can't help but wonder what on earth kind of projects they work on. Even GPT4, while useful for things like reformatting or generating boilerplate code and stuff like that, it's still a far cry from any decent dev I've ever worked with, especially if you're not using a popular language like JS or Python.
I mean this not overly sarcastically, but ... have you seen https://thedailywtf.com ? Between my own experiences, and that of some colleagues, I could probably put together at least a half-a-dozen WTF stories that would rival some of the best that site has to offer. There's enough really incompetent people in positions they shouldn't be in to the point that chatgpt - at this point - could realistically provide better output than more than a few of them.
ChatGPT is not an intelligent entity? What’s been comprehending and rewriting all my crappy code for several months? An auto-complete? There’s obviously emergent behavior there that is actually defined by the maker and most users as “intelligence.”
I would paraphrase one of Clarke's laws and say that "Any sufficiently advanced text generator is indistinguishable from an intelligent entity."
Just because a computer program's output is remarkably good does not mean there is any emergent intelligence, any more than a technology we don't understand means there is magic.
The reverse can also be true, with John Keats' agreeing with Charles Lamb that Newton "had destroyed all the poetry of the rainbow, by reducing it to the prismatic colours": https://en.wikipedia.org/wiki/Lamia_(poem)
If we should ever fully understand how our own minds work, will we hold machines in higher esteem, or ourselves in lower?
Any biology or physics that suggests humans are just a pattern recognizer will be discarded as us being a conscious being is the only thing every human knows to be 100% true.
So, all of biology and physics then. If souls exist, they have no mass, and have a weird way of being repeatably disrupted in consistent ways by damage to certain parts of the brain or specific chemicals.
Just because consciousness is a mystery today, doesn't mean we get to stop and say it will be so forever more.
Heck, the problem still fundamentally exists regardless of if you're atheist, monotheist, polytheist, or pantheist.
--
“We’re not listening to you! You’re not even really alive!” said a priest.
Dorfl nodded. “This Is Fundamentally True,” he said.
“See? He admits it!”
“I Suggest You Take Me And Smash Me And Grind The Bits Into Fragments And Pound The Fragments Into Powder And Mill Them Again To The Finest Dust There Can Be, And I Believe You Will Not Find A Single Atom Of Life–”
“True! Let’s do it!”
“However, In Order To Test This Fully, One Of You Must Volunteer To Undergo The Same Process.”
There was silence.
“That’s not fair,” said a priest, after a while. “All anyone has to do is bake up your dust again and you’ll be alive…”
You missed the entire point. Physics and biology exist to help humans understand the material universe. Anything supposing that humans aren’t actually intelligent or conscious or whatever, or lack agency, is wrong since all of physics and biology are an offshoot of that agency meant to enrich it.
I'm not missing the point, I'm saying you're wrong. There's a difference.
Also:
> Anything supposing that humans aren’t actually intelligent or conscious or whatever
Doesn't really match what I was writing about: if it turns out that a thing which is "just a pattern recognizer" can in fact be "intelligent or conscious or whatever", it's up to us if we see intelligence or consciousness or whatever in the pattern recognisers that we build, or if we ourselves descend into solipsism and/or nihilism.
Or if we take the traditional path of sticking our fingers in our ears and go "la la la I'm not listening" by way of managing cognitive dissonance. This is a very popular response which should not be underestimated.
But the laws of physics are quite clear, that a whole bunch of linear equations (quantum field theory) gets us chemistry, which gets us biology, etc., and the only place in all this for the feeling of existence that we have is emergent properties. Those emergent properties may, or may not, be present in other systems, but we don't know because we're really bad at characterising how emergent properties… emerge.
It’s not a human and so the entire argument comparing it to one is moot. It’s a program on a machine and doesn’t have rights, this anti-human way of thinking is seriously fucking scary.
The post you're responding to didn't call them human. Nor alive. Just "intelligent", and just as intelligence isn't required of life so I have no reason to think intelligence itself requires life.
These things are indeed "a program on a machine and doesn’t have rights", but what I find scary is that rights aren't part of the rules of the universe, they're merely laws, created and enforced (to the extent that they are at all) by humans.
As Max Tegmark said in his recent interview with Lex Fridman, a lot of the technology being developed now, and how it's being talked about and how it's used, is anti-life.
This line of logic is more frightening to me than actual AI. LLMs are really useful in a lot of scenarios but it takes 5 minutes playing with one to see that it isn't intelligent.
But since you are the type of person who is seemingly using LLM "written" code in production, your ability to accurate assess anything is suspect at best.
"Any technology, sufficiently advanced, is indistinguishable from magic".
No, an LLM is not intelligent. I do not understand why people will go through mental gymnastics to conclude they are.
queue all the typical arguments supporting them being intelligent and demanding I give reasons for them not being
This is kind of a weird take... if you said your dog isn't intelligent because it can't do calculus and most people would look at you funny. You don't have to see your pet as intelligent, but don't expect everyone else to blindly follow your thinking.
The only thing that I would say is "clear" is that LLMs are big collections of statistical data on how we use language. That does not cross my threshold for "intelligence".
> That does not cross my threshold for "intelligence".
You have your own individual threshold for what "is" intelligence? Holy cow, imagine if each other agent had their own also, but spoke as if they had a common one...that sure wouldn't be a very intelligent way to run a simulation, imagine the unrealized confusion and delusion that could result if that became a cultural convention!
> Something that is purely speculative, undefined, and has been promised in the near future for 50+ years.
"Undefined", although not literally, in practice definitely: each letter of that initialism means a different thing to different people. To that extent, I'll even grant "speculative" despite many of those meanings being demonstrably met by us humans.
But as someone who (unfortunately) has just turned 40: who was it that was promising AGI "in the near future" for more than my entire lifetime? Including the second AI winter? Because even the biggest timeline-optimists I can remember (Kurzweil and Yudkowsky), who very few cared to listen to, put things more than 20 years ahead of when they were writing. (And yes, Yudkowsky was definitely wrong about a singularity in 2021, though as you say AGI is undefined I think if someone in 1996 had seen ChatGPT they'd have said "yes, this is AGI" despite its flaws).
> I don't see copyright holders lying down for one else's benefit and I don't see governments gutting copyright, contract law, and several other avenues of protection that copyright holders can deploy in the name of something that doesn't exist and may not exist.
I tend to agree. Although I don't accept that contract law has much of anything to do with this discussion, to the extent that it does have implications, it isn't going anywhere.
But at the same time, Google exists by reading the entire public internet, indexing it, and presenting clips of it to its users. This has in fact resulted in copyright disputes, and I was surprised how long it took for that to happen. Likewise, while copyright holders must fight for their survival, mere LLMs even as they exist right now are economically relevant, so this isn't going to be a one-sided fight by just copyright holders.
> But as someone who (unfortunately) has just turned 40: who was it that was promising AGI "in the near future" for more than my entire lifetime? Including the second AI winter? Because even the biggest timeline-optimists I can remember (Kurzweil and Yudkowsky), who very few cared to listen to, put things more than 20 years ahead of when they were writing. (And yes, Yudkowsky was definitely wrong about a singularity in 2021, though as you say AGI is undefined I think if someone in 1996 had seen ChatGPT they'd have said "yes, this is AGI" despite its flaws).
> The Navy revealed the embryo of an electronic computer today that it expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence. Later perceptrons will be able to recognize people and call out their names and instantly translate speech in one language to speech and writing in another language, it was predicted.
They were talking about the very first perceptron, a hardware implementation funded by the Navy and built by a team led by Frank Rosenblatt, one of the earlier evangelists of neural nets, in 1957. The terminology "AGI" hadn't come into use yet that I'm aware of, as "AI" in itself meant the same thing back then, but in order to be able to call inferior, more limited software capabilities "AI" for marketing purposes, we had to invent "AGI" as the stronger concept. I'm guessing they expected it to happen sooner than 75 years later, though.
I'm curious: Have you seen indications that major militaries and politicians believe AGI, rather than special purpose ML for military purposes, is important for national security? I'm really not sure whether this is true, or whether military and political leaders think it's true.
There is a difference between sharing the tech hype, and risk management. Why would our political and military leadership not be interested in this sort of tech in the modern world? If it doesn’t work out, then it doesn’t work out, but if it does, then they’ll want in on it. Aside from that there is the mass surveillance angle on it. We recently had a nice scandal of sorts here in Denmark where the chief of our secret military or whatever you’d call this was arrested by our secret police because he may or may not have shared secrets about how the US spies on us. It was something that even included charges against our former minister of defence possibly leaking things, something which could have seen him twelve years in prison. Luckily our courts saw it as a political matter and refused to let it run in closed proceedings which led to charges being dropped.
The matter of the leaks were very “Snowdeny” in that it’s possibly that parts of our own government and our secret police share all Danish internet traffic with the NSA, who then in tern share information with our secret police. Which meant that our secret police could do surveillance on us as citizens through a legal loophole, as they aren’t allowed to do they directly, but are allowed to share surveillance information with the NSA. Part of this information comes from the giant American tech companies as well. Despite their promises to not share the data they keep for you. I know it’s sort of crackpot sounding, but between echelon, Snowden and the ridiculous amounts of scandals, I think it’s safe to assume that the American military wants in on LLMs and monitor all the inputs people put into ChatGPT and similar. So for that reason alone they’d want in on things.
Then there is how the war in Ukraine has shown how cheap drones are vital in modern warfare, and right now, they need to be manually controlled. But what if they didn’t? Maybe that’s not obtainable, but maybe it is.
Then there is all the other reasons you and I can’t think of. So even if they don’t believe it’s eventually going to lead to an AGI, or whatever else the hype wants, they’re still going to be interested in technology that’s already used by so many people and organisations around the globe.
I'm sure they're interested in it, but I'm uncertain that they view it as a promising and critical enough capability to push for a higher priority when weighed against other interests.
For instance, neither of your examples - surveillance or automated drones - has anything to do with AGI. They don't need LLMs to do mass digital surveillance; they already do that and were doing it for decades before LLMs were a twinkle in anyone's eye. Sure, they'll try to tap into the user data generated by chatgpt etc. (and likely succeed), but that's not a different capability than what they're already doing. And automating drones - which, by the way, this is not future technology as you seem to imply, it's here today - is a special purpose ML system, that maybe benefits from incorporating an LLM somewhere, but certainly isn't pinging the chatgpt api!
But sure, you're exactly right at the end, I have no idea whether they see other angles on this that are new and promising. That's why I asked the question, I'm very curious whether there are any real indications thus far that militaries think the big public LLM models will be useful enough to them that they'll want to put a thumb on the scale to favor the companies running them over the companies that make their bucks on copyrighted content.
Wiretapping vast amounts of data on the internet is quite cool, but actually sifting through all that data is the really difficult part. Right now intelligence services are probably looking at lots of false positives and lots of dots they can't connect because the evidence is just too dispersed for a human or a current-generation system to make sense. LLMs could enable them to make the analysis more targeted and effective.
But for all we know intelligence services could be using LLMs for years now, since they are usually a few years ahead of everybody else in many regards :-)
This is not the new capability that LLMs have pioneered. It's true that it is difficult to sift out signal from the noise of a vast data trove, but it is difficult in a way that people have been getting extremely good at since the late 90s. What you're describing is a Google-level capability, and that's truly a very complex thing not to be downplayed. But it's a capability that we've had and been honing for decades now.
I'm sure language models and transformer techniques will be (or more likely: already are) an important part of the contemporary systems that do this stuff. But I'm skeptical that they care much about GPT-4 itself (or other general models).
I'm not skeptical about whether they think it is useful and an important capability to incorporate ML techniques into their systems, I'm unsure how much utility they see in general (the "G" in AGI) models.
> LLMs could enable them to make the analysis more targeted and effective.
How? I'm not trying to be combative, I genuinely am curious if you have an idea how these things could be usefully applied to that problem. In my experience working in the information security space, approximate techniques (neural nets, etc.) haven't gotten much traction. Deterministic detection rules are how we approach the problem of finding the needle in the hay pile. So if you have a concrete idea here that could represent an advancement in this field.
I guess my next question is how many needles do you find and how sharp are they? Detection rules would filter out most of the noise, then something like an LLM would do a post filter for intent analysis to rank relative risks for human intelligence to look at.
I suspect this would disincentivize operators to take care in the way they write their detection rules, and the nondeterminism of the LLM would then result in false negatives. So the rate of growth of the needles set would increase, and the analysts would be getting lower quality information mediated by the LLM.
In a world where false negatives--i.e. failing to detect a sharp needle--are the worst possible failure mode, approximations need to be handled with exceeding care.
I'd guess leaders are thinking more in terms of national capacity to create more advanced technologies than geopolitical adversaries. If US policy shakes out in a way that protects copyright holders at the expense of AI innovation, I think it's apparent that the end result will be that our rivals will both violate copyright and beat us to building widespread expertise.
> I'd guess leaders are thinking more in terms of national capacity to create more advanced technologies than geopolitical adversaries.
I think there's a strong argument that they should be thinking in those terms, but I'm a lot less convinced that they do usually think in that way.
Or more charitably, they have the responsibility to balance current interests against future interests. And this isn't just a tricky thing for democracies, dictators also have to strike this same balance, just with different trade offs.
But in this case, for the US, it honestly isn't clear to me that policy makers should favor the AI side of this tussle. I think culture has been among the, if not the very, most important export of the US for nearly a century, and I think favorable copyright treatment has been at least part of the story with that.
Maybe that whole landscape is different now in a way that makes that whole model obsolete, but I think it's an open question at least.
What it seems to me from the milieu of everything I've read and heard (that is: I can't cite examples, this is an aggregate developed from hundreds of articles and podcasts etc.) is that there is already an "AI" arms race underway, but that it has more to do with specialized ML systems than with consumer LLMs.
But I'm not really in the loop, and maybe OpenAI really is more important to the US DoD than Disney (as a stand-in for big copyright-based businesses generally) is to the politicians they donate to. But I dunno! That's why I asked the question :)
I would be more intrigued by the national security angle of this if copyright holders were going after, say, Palantir. But I just don't know how important they see these language models as being, or how interested they are in OpenAI's mission to discover AGI.
It mostly doesn't matter if the military wants specialist systems, in the long run generalist systems tend to win in power and adaptability.
Some of this may be a misunderstanding of what modern militaries do, if they are shooting guns there's already been some level of failure. Massive amounts of war gaming, sentiment analysis, and propagandizing occur, see the RAND Corporation for more details on the military development of algorithms and artificial intelligence.
Yeah this makes sense. Maybe RAND publications will indeed give me some insight into my question.
But I also buy that there is a lot of overlap between military work and any other kind of white collar work, which LLMs are definitely useful (but not revolutionary) for.
US media is a huge cultural influence. It takes a ridiculous amount of mindspace globaly. However, with youtube&tiktok this seems to be changing - the most important influence is not from Hollywood but ”random” youtubers. So, ”content” is waning in influence for sure, unlike hardcore national security things like US dollar, the carrier fleet or ballistic nukes. Or AI.
On Youtube creators who are native to the anglosphere still have a big advantage. TikTok is really the big equalizer. With AI voices being the norm, nobody cares about your accent.
Guess it depends on what you mean by "advantage"... English-language channels are a dime a decillion. But if you started posting content in Tagalog you'd find yourself gaining traction with an audience that doesn't have as many alternatives.
(1) Do you think "developing AGI" a realistic, achievable goal? If so, what evidence do you see that we're making progress on the problem of "general" intelligence? Specifically, what does any of that have to do with Large Language Models?
(2) Are there any "national security" applications of Large Language Models that you're aware of?
It seems to me that it would be a very difficult case to make that the national security impact from allowing the rule of law to erode would be somehow outmatched by the (speculative) wager that somehow LLMs have some relevance to the national security. It would be an even harder case to make that any of this has something to do with "general" intelligence.
If you manage to put a bunch of listening devices at a place you're moderately interested in, a cafeteria at an enemy base for example, you might end up with literally hundreds of hours of conversations, most of them completely uninteresting, but a few that might possibly contain nuggets of information of the utmost importance. Listening to all these conversations requires resources. This is even more difficult if the people there speak in jargon, in their own language, and nobody but an expert in the subject can determine which conversation snippets are significant.
If you have good LLMs, you can run all your recordings through extremely high-quality speech recognition and then use something like Chat GPT for summarization, classification, finding all mentions of the nuclear reactor in <place> etc. Same goes for satellite image analysis.
I think we'd need to see these things get a lot more reliable for them to be viable in this use case. This seems like a "leaky net" as opposed to some more deterministic strategy (e.g. grepping large lists of keywords, or parallelizing the task over thousands of human analysts). When you're looking for a needle in a haystack you need to inspect every leaf and stalk.
So should we put copyright through the shredder on the wager that somehow generative techniques will find applications for mass surveillance?
As for 1, pass an image to a multimodal LLM and simply ask 'what is going on in this image'. Robot LLM models are already turning this in to actionable data they of which they can interact with the world. As in you can send a Robot into a room it has not been before and tell it "bring back a sock, a blue one not a red one" and get an actionable response with a higher degree of success. This takes some degree of general intelligence (though maybe not human level).
Well the real test of all this stuff is "what can I use it for?". And I can sort my own socks, so that's not super compelling ;). More seriously, the real world is complex.
Let's say I want to replace the forklift operator at my local lumberyard with a robot forklift that can ostensibly outperform a human employee. Even if there is some magical AI program which could theoretically drive the forklift around, identify boards by their dimensions, species, dryness, location, etc., there's a whole bunch of sensory problems that a human body solves easily that are super hard to solve in the environment of a lumber yard. There's dust, rain, snow, mud--so if you're relying on cameras how will you keep them clean? You can't visually determine how dry a board is, you have to put a moisture meter on it and read the result. My point is, even if you have a "brain" capable of driving the forklift you still have a massively complex robotics problem to solve in order to automate just the forklift. And we haven't even begun to replace the other things the operator does in addition to driving the forklift. He can climb out of the forklift and adjust the forks, move boards by hand, affect repairs on equipment, communicate with other equipment operators, customers, etc.
Good luck replacing him in a cost-effective manner.
This is an issue of 'mechanical intelligence' being hundreds of millions of years old and 'higher intelligence' being pretty new on the evolutionary spectrum.
And the AGI will keep you around as a dexterous 'robot' while supervising your thoughts to make sure you're keeping in line I guess, while day after day cranking out more capable robots in which to replace you with eventually.
How do you unplug the power on your iPhone? You can't even take the batteries out. But ya, if you assume it will take massive amounts of power to run an AI in the future its easy to see your logical error here.
Developing AGI, as an abstract idea, is a matter of national security. That doesn't mean people are willing to accept the real-world consequences of it. Especially when it could affect them financially.
Additionally, I'm not even sure the US is capable of having national priorities at the moment. The Congress has become incapable of making decisions. While the executive and the judiciary branches have stepped up to compensate, they tend to handle each issue separately without any general direction.
Apple could buy most of the NYT, RIAA and MPAA companies combined with petty cash. The big ones are Disney and Sony with a combined market cap about 250b. Microsoft alone is worth over 10 times that.
MPAA and RIAA are not joint-stock companies, they are specialised trade unions to enforce intellectual property. They have no shares to acquire. If you mean acquiring the individual members, that would encounter an enforcement from your favourite antitrust enforcement commission.
Honestly I've always wondered what would happen (and how much the entertainment world would change) if a company like Apple, Google, Microsoft, etc did just that. Or heck, if it turns out you need the rights to train LLMs and its easier to do that with public domain stuff, they just flat out bought half the entertainment industry and assigned everything to the public domain. Every Disney work every for example.
No-one is going to buy a major media company and then throw the rights into the public domain. What they would do is buy the rights and then sue all competitors in the GenAI space.
In the US, this isn't possible. There is no legal mechanism for putting things into the public domain outside of the expiration of the term of copyright. The best you can do is to promise not to enforce your copyright.
Maybe there is a middle ground that can be navigated. Keeping filters on. Interestingly, AWS is offering defense against copyright claims under the Service Terms although with some conditions.
See items as of 50.10 and 50.10.1 that I reproduce here:
"50.10. Defense of Claims and Indemnity for Indemnified Generative AI Services. AWS Services may incorporate generative AI features and provide Generative AI Output to you. “Generative AI Output” means output generated by a generative artificial intelligence model in response to inputs or other data provided by you. “Indemnified Generative AI Services” means, collectively, generally available features of Amazon CodeWhisperer Professional, Amazon Titan Text Express, Amazon Titan Text Lite, Amazon Titan Text Embeddings, Amazon Titan Multimodal Embeddings, AWS HealthScribe, Amazon Personalize, Amazon Connect Contact Lens, and Amazon Lex. The following terms apply to the Indemnified Generative AI Services:
50.10.1. Subject to the limitations in this Section 50.10, AWS will defend you and your employees, officers, and directors against any third-party claim alleging that the Generative AI Output generated by an Indemnified Generative AI Service infringes or misappropriates that third party’s intellectual property rights, and will pay the amount of any adverse final judgment or settlement."
Quite the contrary, NYT is long adjacent to the levers of power, and “Big Tech” is unpopular with both parties. The public is generally wary of job destruction and other harms from AI, and doesn’t grok even the present value.
It’s politically 100% viable to kneecap AI with copyright restrictions. This will go to the Supreme Court and it’s far from clear whether fair use applies to every case here.
And supporters of AI aren't really making a case that's likely to persuade skeptics, they're just regurgitating "It learns like a human" "It doesn't store the info, just the recipe for making it" and completely failing to address that we've decided it's not ok for someone to regurgitate protected works with 100% accuracy, and that artists don't want people to train AI on their works without permission.
There's a way to sell this to the public, but AI proponents don't want to have to sell it, because they feel that they shouldn't have to, and there's an underlying theme of "The benefit of AI is so overwhelming, and eventually it will replace most commodified creative work anyway so why bother litigating this now, let's just skip this messy step and get to that part" and that's super not going to work to convince skeptics.
Ah right, capitalism hasn't ever come in the way of the steady march of technology. This is the reason why we don't have monopolies controlling energy generation. Nor are we limited to a couple of choices of OSs or phones...etc and books, art, movies, music consumption and creation are perfectly aligned... Right? /s
IMO, what's most likely is some sort of licensing model between the AI companies and the 'big content providers' (remember most content on the web these days is not owned by the person who created it, wasn't always like that). The smaller companies then would be forced to live with either being scraped or ending up being 'invisible'.
Oh yeah, I forgot artists are spiritual creatures who don't have to eat. It certainly isn't the only reason to create but a necessary condition to actually be a professional artist, no ?
Why don't you just ask for an increase in the allowance from your family trust fund? People have become so lazy nowadays, they can't even be bothered to have a hard talk about their financial estate with their rich grand-papá anymore.
Also, patronage is garbage, in my opinion. It ensures artists are exclusively either already wealthy, or well connected. It also helps ensure that the wealthy are most often represented in the art created; for some reason this seems like a bad idea to me.
copyright based scarcity is effectively dead for anyone with an Internet connection anyway
honestly I think a gratuity model may become dominant with or without any legal changes at this point
you'll often see on YouTube patreon revenue equally or dwarfing ads
the reliance of the music industry on merch seems similar too*
I think people are more willing than you'd think to pay for art simply because they understand it won't exist without money.
*(if that sounds like a stretch, consider if in a world devoid of copyright, whether a Walmart printed band shirt for cheap would be equivalent for most purchasers to the same shirt sold by the actual artist )
What exactly am I stealing if I don't take the deal, walk away and then enjoy an AI-generated artwork that just so happens to resemble the thing closely instead? I'd think that stealing requires taking something away from someone, regardless of how hard certain industries try to gaslight me into expanding the definition to protect their business model.
Stop trying to gaslight yourself into thinking what you are doing isn't morally wrong.
If you do not agree with their business model, don't get involved with their business, at all. Your disagreement doesn't give you the right to exploit flaws in their methods to protect their business. Just like the fact you don't want to pay for something doesn't grant you the right to exploit the fact that the laws of physics allow you to just grab something you didn't pay for with your hand and run away with it.
Creation should happen for whatever reason its creator becomes inspired with. The only absolute I can think of is no one should actually categorize worthy and unworthy motifs.
The only invalid reason is because you need to feed yourself, and the fact that we need to do that, we need to pay artists and everyone else just to survive, shows our failure as broader society.
Most people who create for a living aren't motivated purely by money, but are driven by the necessities of capitalism to do so. You're presenting a false dichotomy, pretending to care about the quality of art, but really like everyone, you just want other people's work for free.
Great art - especially in modern times when that art involves expensive education (which if you're American must be paid for with interest) and the incorporation of technology and equipment - takes time and effort. If that time and effort cannot be paid for, then no matter how passionate an artist may be, unless they have sufficient personal wealth, that art must suffer.
Even the great artists of old needed patrons, because they needed to eat like anyone else. Michaelangelo didn't paint the Sistene Chapel ceiling for the love of the game, nor would he have.
I guarantee you that the working artists who have already lost commissions and work due to AI care about their craft.
Artists are "rightsholders" and their ilk. You didn't even separate the two in your former comment, so you clearly weren't talking about corporate owners of IP like Sony and Disney, exclusively.
Maybe you believe no artist who works for a corporation has any motivation but money, as opposed to purely "indie" artists, I don't know where the line in your head is drawn, but you do seem willing to throw most artists under the bus for some arbitrary standard of purity.
AI is harming working artists right now, and will likely never harm corporate rightsholders. They'll simply run their own AIs and fire as many people as they can get away with. The end result will not be that only the "true" artists survive but simply less art of any kind, everywhere. So I stand by my comment.
With rightsholders I mean exactly those big corporations who do nothing else but buy up copyrights to successful art.
I for example have never benefited from copyright, neither from GEMA (the German artist association for musicians) - 99% of payouts go to the rich and successful mainstream artists and „indie“ artists get nothing but are forced by law to pay in if they want to perform in public.
So yea I have little sympathy for artists who only work for corporations or are rich enough to afford lawyers to enforce their copyright.
The way I see it there exist 3 ways to make a living as an artist now:
- be rich trustfundkid and don’t care about money
- be „purist“ and just live from selling your art and be on the brink of starvation constantly
- get a „money“job and produce art in your spare time
Apparently there exists a huge population of artists who can make a living from working for corporations - but I have yet to meet one in real life. They are always brought up in these HN discussions but in my experience they don’t exist.
Chess, at some point, and after you move beyond the opening, is creation.
People didn't stop painting because photography exists, they created new forms of photography. People didn't stop writing music or using new / unique instruments when synths and programs came along.
I genuinely believe that people will keep creating, it's in our nature, and we also like things made by other humans, because we can relate to them.
Imho your argument is faulty at its base. The objective of chess competition isn't to produce a reasonably good game for the lowest possible cost (blunders and comebacks are actually pretty valuable parts of the spectacle). It also isn't the reason why chess players get paid. Yes, running still was a thing even after the invention of bicycle. This is just invalid logic in my opinion.
Chess hustlers in central park don't play for money, or for a competition, they play for the fun of it, for the sake of chess itself, for the sake of exploring the game, the thrill of finding a solution.
It has nothing to do with whatever "value" the capitalist system assigns to the act as a side-effect.
Chess hustlers are a particular niche case and I think many of them would disagree with you (the money part). Making arguments in such an absolute manner and speaking on behalf of many people (mostly with whom you share very little I assume) is guaranteed to be wrong I think.
> Or... things are about to get worse for copyright holders.
If that's so, things are about to get worse for everyone, too. With little to no protection against AI, no one will be incentivized to create new IPs, whether they're books, drawings, songs. Or even films and games, when AI is able to also generate those in the (possibly near) future.
This is not about copyright. Think about it. Would you ever actually use generative AI to pirate something when you could just torrent it? While there may be an argument that generative AI is infringing copyright, it is not really a very good tool for it. And there is a worldwide piracy industry already causing much more financial damage due to infringement.
This is really about replacement. The copyright holders in the content industry aren't really afraid of LLMs infringing on past copyright, but are terrified of it replacing them on future work, and there is absolutely no legal protection from this. The lawsuit might officially be about copyright, but that's just because it is their only available legal angle of attack.
> Would you ever actually use generative AI to pirate something when you could just torrent it? While there may be an argument that generative AI is infringing copyright, it is not really a very good tool for it.
How do you square this with literally the first image in the OP showing side by side GPT reproing copyrighted work? imo a good modern art project would be someone making a website that “archives” NYT articles by laundering them through GPT rather than using the archive link that everyone posts to get around the paywall. Even HN guidelines bend over backwards to allow bypassing the paywall by allowing these links.
Please show me a prompt that reproduces it. Also to pass this test, it has to be just as easy as right clicking "download image"
The images in the article are done in reverse. They find a prompt that shows a copyrighted character and then search for the matching image. That's not how piracy is done.
They are also being deceptive in my opinion. They should show their entire chat because if you take "animated sponge", it alone does not generate SpongeBob. The author almost certainly further prodded & guided the DALL-E to generate those images.
The author, I believe, is being purposefully deceptive and hoping people who don't use DALL-E see "animated sponge" generating a SpongeBob look-alike and think they should be burned.
Woah this is really moving the goalposts and is pretty disingenuous. When I responded to your prompt about GPT being bad at reproducing copyrighted material with a counter example where it appears to in fact be good at it, you tell me that I must reproduce a specific image as easily as “clicking download image.”
Not what I was arguing and you’re not going to win many arguments with anyone who is paying attending by coming out of left field with only tangentially related demands.
Because nobody wants generic random pieces of copyrighted material. They want a specific piece of copyrighted material and generative ai is terrible at producing that. It's you who is being intentionally obtuse in pretending not to know the actual goal of copyright infringement.
Even if this is right, its a shitty consolation. These llms aren't ever going to be an agent of greater democratic, every-man content creation or whatever, its just going to be the transfer of capital from one type of huge company to another. Not much of a future, even if it feels cool for a bit.
I think in more simpler terms, I think we're looking at the dip after the hype. This is the peak for this generation of proto-AGI and there's not much to lose from over-regulatuon(put quotes around "over").
Eh? With a gun to my head I'd say the CCP cares more about censorship than the NYT does about plagiarism, but it's not an easy call. The problems are the same ("training set contains lots of stuff I don't want the LLM to say").
One interesting nuance that might come to play is that while the US nearly always makes products for their own market and expects the rest of the world to adopt it, China is willing to clearly differentiate products for their own market and for export.
As a consequence, an AI meant to topple Western soft power around the world might be held to much looser standards than one used domestically. Who cares that in rare circumstances the AI mentions the Tiananmen Square Massacre to Spaniards if asked about it, as long as it is good enough at spreading Chinese culture.
Yeah, I really don’t see everyone else giving up here because “funny magic parrot box” can write some mid-tier high school essays.
LLM people are really starting to veer into crypto-bro territory with the evangelising about how they’re the best thing since sliced bread and transistors.
> “funny magic parrot box” can write some mid-tier high school essays
That's your take on LLMs?
Ask it how it is possible for a photon to travel across the universe, arriving at the same time it departed, resulting in the journey taking zero time (in its reference frame).
Ask what implications are if certain viral amino sequences result in messenger RNA translocating to the host cell nucleus, potentially with the entire genome.
Ask if aircraft fly due to Bernoulli's Principle or Newton's Third Law and physical impact.
And it parroting answers back at you from textbooks and papers that are most definitely in its training data, probably with the identical wording you're using, is proof to the contrary of it being a "magic parrot box" as the other person put it? Or do you genuinely believe ChatGPT, a LLM, actually "came up" with these answers on its own?
The "crypto-bro" behavior I see is a whole bunch of people burning a ton of calories wildly casting about for industrial applications of what amounts to nothing more than a neat (albeit eye-wateringly expensive) toy. These LLMs seem like a solution in search of a problem in just the same way that blockchains are. Please prove me wrong, I'd really love to be wrong about this!
Language models have completely overhauled the NLP space. If you have a problem involving natural language data, you can prototype working pipeline in an afternoon. Often this prototype is very close in performance to a 'proper' solution.
> If you have a problem involving natural language data
That's a big "if", isn't it? We're seeing claims like "The future is an LLM at the front of just about everything: “Human” is the new programming language"[1] but so far that's not panning out, and it seems really dubious. Natural language seems like an absolutely atrocious user interface. As a machine operator, I'm going to use levers, wheels, and buttons to control the machine. As a computer programmer I'm going to use programming languages to control the machine. I'm not going to speak English to it.
So, ok, this marks an advance in NLP. How do we get from there to "omg it's gonna change everything!!!1111oneeleven"
It seems like they’ve accelerated our capabilities- previously tiresome and difficult-to-automate things are easier- but have done very little for our fundamental understanding. We have a tool, but cannot dissect it and explain how it fits together. LLM’a themselves don’t appear (happy to be wrong here) to actually have improved our understanding our NLP and associated theory. Yeah, it can parse a sentence and bang out some JSON/sql/mid-tier-essay, but these models (so far) aren’t helping us figure out how and why, and I think that understanding is critical to progress further. Anthropic seems to be trying to push a bit further on that front at least, but for all we know, they might just turn into another scummy OpenAI on us.
I think in order for something to properly be a tool it needs to behave deterministically. I don't need to understand every particular of how it works internally, but as the user I need to be able to rely on consistent, predictable results. Otherwise it's worse than useless. Hand tools, machine tools, programming languages, vehicles, CAD/CAM/CAE tools are all like this. You may have to do some learning to become proficient in the tool, but once you're proficient in its use it's very unlikely to ever truly surprise you. Generally those "surprising" experiences are pretty traumatic--hopefully only emotionally (if you've ever experienced a chainsaw kick back you know what I mean).
So I'm not sure how I could use an LLM as a tool, but maybe I'm just not a sufficiently proficient user? It seems like they're just too full of "surprises".
The EU and many other countries already exempted training from copyright restrictions. The only condition EU added was opt-out, and even then it can be ignored if you're doing research. [1]
Good. It's time to abolish copyright. Society must create distributed, open and uncensorable AI models that can synthesize humanity's knowledge so that it can be used by anyone.
Sorry if your 40 hour work won't pay you $10 bucks a month forever. That's the case for most of the rest of us: we produce for 40 hours, we get paid for those 40 hours, regardless of what we do.
I don't see any developped country pressing the brake on AGI in the near future to protect a few copyright holders from getting "stolen" in hypothetic scenarios.