Can you elaborate on how it's not comparable? It seems obvious to me that it is ...

tekno45 · on Jan 5, 2025

you're asking why you have to treat people differently than you treat tools and machines.

crazygringo · on Jan 5, 2025

Well obviously not in general. But when it comes to copyright law specifically, yes absolutely. That is the question I'm asking.

cmiles74 · on Jan 5, 2025

It has never been argued that copyright law should apply to information the people learn, whether that be from reading books or newspapers, watching television or appreciating art like paintings or photographs.

Unlike a person, an large language model is product built by a company and sold by a company. While I am not a lawyer, I believe much of the copyright arguments around LLM training revolve around the idea that copyrighted content should be licensed by the company training the LLM. In much the same way that people are not allowed to scrape the content of the New York Time website and then pass it off as their own content, so should OpenAI be barred from scraping the New York Times website to train ChatGPT and then sell the service without providing some dollars back to the New York Times.

salawat · on Jan 5, 2025

You're not going to get an answer you find agreeable, because you're hoping for an answer that allows you to continue to treat the tool as chattel, without conferring to it the excess baggage of being an individuated entity/laborer.

You're either going to get: it's a technological, infinitely scalable process, and the training data should be considered what it is, which is intellectual property that should be being licensed before being used.

...or... It actually is the same as human learning, and it's time we started loading these things up with other baggage to be attached to persons if we're going to accept it's possible for a machine to learn like a human.

There isn't a reasonable middle ground due to the magnitude of social disruption a chattel quasi-human technological human replacement would cause.

throwaway2037 · on Jan 6, 2025

Hi. I like this post. There are some careful thoughts here.

Can you help me to understand the term "chattel" as you used it? I never heard the term before I read your post, and I needed to Google for it: <<

(in general use) a personal possession.

(in law) an item of property other than freehold land, including tangible goods ( chattels personal ) and leasehold interests ( chattels real ). >>

salawat · on Jan 7, 2025

Chattel, as I'm using it is in reference to the usage in distinguishing an "ownable piece of property" from an employee".

Namely, a magic, technologically reproducible box that can be applied almost as effectively as a human hireling, but isn't a human hireling is near infinitely more desirable in a capitalist system since the blackbox is chattel, the hired human is not. The chattel has no natural rights, no claim to self sovereignty, and is an asset that is legally extant by virtue of the fact it is owned by the owner.

Chattel that are flexible enough to replace the legal burdens incurred by hiring a human to do the same job, will naturally be converged upon due to the capitalistic optimation function of minimizing unit input cost for output over dollars and potential dollars as expressed through legal exposure.

Imagine you had two human like populations. One made of plastic that aren't considered humans but property. I.e. are chattel. Then you have a bunch of people with all the baggage that comes with it.

Hiring people/employing people is hard. Particularly in the U.S. and other jurisdictions where a great deal of responsibility for actually implementing regulations/ taxation/immigration and such is tacked onto being an employer/being able to hire.

As the gap between the capability of the chattel population closes in on the human population, the more economic and workload sense it makes for the system to improve the chattel population under our current optimization strategy, (given no pre-emptive work to cut off externality dumping). Humans are messy and complicated to work with. Often unpredictable. Chattel are easy to account for; especially when combined with "technical restraints". You have to fundamentally engage in negotiation with another human being to get them on board with working for you. You buy the chattel, and that's that. The chattel has no grounds to refuse service. Socially speaking, we don't even recognize it's outputs as carrying any social weight, or resistance as anything but malfunctions.

Economics is the science around using access to resources as a means to get other people to work with you. Being chattel means you can cut out entirely all that complexity. You are resource. Not people.

Unironically, we need to have an answer to whether or not we are going to consider a sufficiently complex function imitator as something that requires a classification above "chattel" or controls around how we apply it in order to not self-destruct the economic equilibria in which we purport to exist. Because all it takes is removing or sufficiently obstructing the flow of value down from individuals who accrete the most of these wunder-chattel to render things so top heavy, most of the constraints/invariants of our socioeconomic systems as we know them become invalidated.

That does not bode well for anyone.

aidenn0 · on Jan 5, 2025

Aren't animals a current example of a middle ground? They are incapable of authoring copyrightable works under current US law.

philwelch · on Jan 5, 2025

No, you’re missing the point of copyright. The point of copyright is to protect an exclusive right to copy, not the right to produce original works influenced by previous works. If an LLM produces original works that are influenced by the training data, that is not a violation of copyright. If it reproduces the training data verbatim, it is.

throwaway2037 · on Jan 6, 2025

    > The point of copyright is to protect an exclusive right to copy, not the right to produce original works influenced by previous works.

As I understand, the definition of "the right to produce original works influenced by previous works" has been a slowly moving target in my lifetime. Think about the effects of the album Paul's Boutique by Beastie Boys. They went wild with sampling and paid very little (zero?) to license those samples. Then, there were a bunch of court cases in the US that decided that future samplers needed to license the samples from the original authors. However, the ability to create legal, derivative works is usually carefully defined in copyright law. Can you comment on this matter vis-a-via LLMs?

    > If an LLM produces original works that are influenced by the training data, that is not a violation of copyright.

I'm pretty sure if an LLM creates Paul's Boutique 2.0 in 2025 using incredible number of samples, then someone cannot sell it (or use it in a YouTube video) without first licensing those samples. I doubt very much someone could just "hide behind" an LLM and claim, "Oh, it is original, but derivative, work, created by an LLM." I doubt courts would allow that.

philwelch · on Jan 6, 2025

> I'm pretty sure if an LLM creates Paul's Boutique 2.0 in 2025 using incredible number of samples, then someone cannot sell it (or use it in a YouTube video) without first licensing those samples. I doubt very much someone could just "hide behind" an LLM and claim, "Oh, it is original, but derivative, work, created by an LLM." I doubt courts would allow that.

This isn’t how LLM’s work though. Samples are just that, literal samples that are copied from one work to another verbatim. LLM’s use training data to construct a predictive model of which tokens follow each other. You probably could get an LLM to use samples deliberately if you wanted to, but this isn’t how they typically work.

Regardless, at that point you’re just evaluating the claim of copyright infringement based on the nature of the work itself, which is exactly what I’m advocating, versus presuming that all LLM output is necessarily copyright infringement if any copyrighted material was used in training.

dijksterhuis · on Jan 5, 2025

i weirdly agree with you, but also want to point out that “influenced by the training data” is doing some very heavy lifting there.

exactly how the new work is created is important when it comes to derivative works.

does it use a copy of the original work to create it, or a vague idea/memory of the original work’s composition?

when i make music it’s usually vague memories. i’d argue that LLMs have an encoded representation of the original work in their weights (along with all the other stuff).

but that’s the legal grey area bit. is the “mush” of model weights an encoded representation of works, or vague memories?

philwelch · on Jan 5, 2025

I don’t really think it matters because you can just compare the output to the input and apply the same standard, treating the process between the two as a black box.

dijksterhuis · on Jan 6, 2025

did you just call me a black box? :/

not sure how i feel about being reduced down to that as a human being.

philwelch · on Jan 6, 2025

As far as I’m concerned you are a black box. Just as I’m a black box from your perspective. In principle I could come over and vivisect your brain if you’d like, but I doubt you’d be interested, and I wouldn’t really want to incur the legal liability even if you were.

Besides, “black box” just means that your internal mental life and cognitive mechanism is opaque to me. It’s not like I’m calling you a p-zombie.

yencabulator · on Jan 6, 2025

Also, even if an LLM generates an original work, the weights it used may still be a derived work.

_DeadFred_ · on Jan 5, 2025

One is a collection of highly dithered data generated by machines paid for by a business in order to financially gain from the copyrighted works in order to replace any future need for copyrighted text books.

The other is a person learning from a copyrighted textbook in the legally protected manner, and whom and use the textbook was written for.

cmiles74 · on Jan 5, 2025

I don't think this question really makes any sense... In my opinion, it's kind of mish-mashing several things together.

"Can you elaborate on how it's not comparable?"

The process of individual people interacting with their culture is a vastly different process than that used to train large language models. In what ways to you think these processes have anything in common?

"It seems obvious to me that it is -- they both learn and then create -- so what's the difference?"

This doesn't seem obvious to me (obviously)! Maybe you can argue that an LLM "learns" during training, but that ceases once training is complete. For sure, there are work-arounds that meet certain goals (RAG, fine-tuning); maybe your already vague definition of "learning" could be stretched to include these? Still, comparing this to how people learn is pretty far-fetched. AFAICT, there's no literature supporting the view that there's any commonality here; if you have some I would be very interested to read it. :-)

Do they both create? I suspect not; an LLM is parroting back data from it's training set. We've seen many studies showing that tested LLMs perform poorly on novel problem sets. This article was posted just this week:

https://news.ycombinator.com/item?id=42565606

The court is still out on the copyright issue, for the perspective of US law we'll have to wait on this one. Still, it's clear that an LLM can't "create" in any meaningful way.

And so on and so forth. How is hiring an employee at all similar to subscribing to an OpenAI ChatGPT plan? Wacky indeed!

crazygringo · on Jan 6, 2025

Obviously, on the inside, the process that a person goes through in learning and creating, and the process that a LLM goes through in learning and creating, is very different. Nobody will dispute that.

But if they're learning from the same kinds of materials, and producing the same kind of output, then obviously the comparison can be made. And your idea that LLM's don't create seems obviously false.

So I have to conclude the two seem comparable, and someone would have to show why different legal principles around copyright ought to apply, when it's a simple question of input/output. Why should it matter if it's a human or algorithm doing the processing, from a copyright perspective? Nothing "wacky" about the question at all.

groby_b · on Jan 5, 2025

Unless you are making an argument for personhood, one is a machine, the other is a human. Different standards apply, end of discussion.

homarp · on Jan 5, 2025

most probably your employee actually 'paid' for their textbook.