this post was submitted on 24 Jun 2025
634 points (98.9% liked)
Technology
71997 readers
4058 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
You should read the ruling in more detail, the judge explains the reasoning behind why he found the way that he did. For example:
This isn't "oligarch interests and demands," this is affirming a right to learn and that copyright doesn't allow its holder to prohibit people from analyzing the things that they read.
Yeah, but the issue is they didn’t buy a legal copy of the book. Once you own the book, you can read it as many times as you want. They didn’t legally own the books.
Right, and that's the, "but faces trial over damages for millions of pirated works," part that's still up in the air.
But AFAIK they actually didn't acquire the legal rights even to read the stuff they trained from. There were definitely cases of pirated books used to train models.
Yes, and that part of the case is going to trial. This was a preliminary judgment specifically about the training itself.
It's two issues being ruled on.
Yes, as you mention, the act of training an LLM was ruled to be fair use, assuming that the digital training data was legally obtained.
The other part of the ruling, which I think is really, really important for everyone, not just AI/LLM companies or developers, is that it is legal to buy printed books and digitize them into a central library with indexed metadata. Anthropic has to go to trial on the pirated books they just downloaded from the internet, but has fully won the portion of the case about the physical books they bought and digitized.
LLMs don’t learn, and they’re not people. Applying the same logic doesn’t make much sense.
The judge isn't saying that they learn or that they're people. He's saying that training falls into the same legal classification as learning.
Which doesn’t make any sense.
Argue it to the judge, I guess. That's how the legal system works.
Isn't part of the issue here that they're defaulting to LLMs being people, and having the same rights as people? I appreciate the "right to read" aspect, but it would be nice if this were more explicitly about people. Foregoing copyright law because there's too much data is also insane, if that's what's happening. Claude should be required to provide citations "each time they recall it from memory".
Does Citizens United apply here? Are corporations people, and so LLMs are, too? If so, then imo we should be writing legal documents with stipulations like, "as per Citizens United" so that eventually, when they overturn that insanity in my dreams, all of this new legal precedence doesn't suddenly become like a house of cards. Ianal.
Not even slightly, the judge didn't rule anything like that. I'd suggest taking a read through his ruling, his conclusions start on page 9 and they're not that complicated. In a nutshell, it's just saying that the training of an AI doesn't violate the copyright of the training material.
How Anthropic got the training material is a separate matter, that part is going to an actual try. This was a preliminary judgment on just the training part.
That's not what's happening. And Citizens United has nothing to do with this. It's about the question of whether training an AI is something that can violate copyright.
Except learning in this context is building a probability map reinforcing the exact text of the book. Given the right prompt, no new generative concepts come out, just the verbatim book text trained on.
So it depends on the model I suppose and if the model enforces generative answers and blocks verbatim recitation.
Again, you should read the ruling. The judge explicitly addresses this. The Authors claim that this is how LLMs work, and the judge says "okay, let's assume that their claim is true."
Even on that basis he still finds that it's not violating copyright to train an LLM.
And I don't think the Authors' claim would hold up if challenged, for that matter. Anthropic chose not to challenge it because it didn't make a difference to their case, but in actuality an LLM doesn't store the training data verbatim within itself. It's physically impossible to compress text that much.
I will admit this is not a simple case. That being said, if you've lived in the US (and are aware of local mores), but you're not American. you will have a different perspective on the US judicial system.
How is right to learn even relevant here? An LLM by definition cannot learn.
Where did I say analyzing a text should be restricted?
I literally quoted a relevant part of the judge's decision:
I am not a lawyer. I am talking about reality.
What does an LLM application (or training processes associated with an LLM application) have to do with the concept of learning? Where is the learning happening? Who is doing the learning?
Who is stopping the individuals at the LLM company from learning or analysing a given book?
From my experience living in the US, this is pretty standard American-style corruption. Lots of pomp and bombast and roleplay of sorts, but the outcome is no different from any other country that is in deep need of judicial and anti-corruotion reform.
No, you're framing the issue incorrectly.
The law concerns itself with copying. When humans learn, they inevitably copy things. They may memorize portions of copyrighted material, and then retrieve those memories in doing something new with them, or just by recreating it.
If the argument is that the mere act of copying for training an LLM is illegal copying, then what would we say about the use of copyrighted text for teaching children? They will memorize portions of what they read. They will later write some of them down. And if there is a person who memorizes an entire poem (or song) and then writes it down for someone else, that's actually a copyright violation. But if they memorize that poem or song and reuse it in creating something new and different, but with links and connections to that previous copyrighted work, then that kind of copying and processing is generally allowed.
The judge here is analyzing what exact types of copying are permitted under the law, and for that, the copyright holders' argument would sweep too broadly and prohibit all sorts of methods that humans use to learn.
Well, I'm talking about the reality of the law. The judge equated training with learning and stated that there is nothing in copyright that can prohibit it. Go ahead and read the judge's ruling, it's on display at the article linked. His conclusions start on page 9.
People. ML AI's are not a human. It's machine. Why do you want to give it human rights?
Do you think AIs spontaneously generate? They are a tool that people use. I don't want to give the AIs rights, it's about the people who build and use them.
Sounds like natural personhood for AI is coming
"No officer, you can't shoot me. I have a LLM in my pocket. Without me, it'll stop learning"