this post was submitted on 24 Jun 2025
635 points (98.9% liked)

Technology

72829 readers
3109 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] Alphane_Moon@lemmy.world 123 points 3 weeks ago* (last edited 3 weeks ago) (31 children)

And this is how you know that the American legal system should not be trusted.

Mind you I am not saying this an easy case, it's not. But the framing that piracy is wrong but ML training for profit is not wrong is clearly based on oligarch interests and demands.

[–] FaceDeer@fedia.io 24 points 3 weeks ago (22 children)

You should read the ruling in more detail, the judge explains the reasoning behind why he found the way that he did. For example:

Authors argue that using works to train Claude’s underlying LLMs was like using works to train any person to read and write, so Authors should be able to exclude Anthropic from this use (Opp. 16). But Authors cannot rightly exclude anyone from using their works for training or learning as such. Everyone reads texts, too, then writes new texts. They may need to pay for getting their hands on a text in the first instance. But to make anyone pay specifically for the use of a book each time they read it, each time they recall it from memory, each time they later draw upon it when writing new things in new ways would be unthinkable.

This isn't "oligarch interests and demands," this is affirming a right to learn and that copyright doesn't allow its holder to prohibit people from analyzing the things that they read.

[–] Lemming6969@lemmy.world 1 points 3 weeks ago (1 children)

Except learning in this context is building a probability map reinforcing the exact text of the book. Given the right prompt, no new generative concepts come out, just the verbatim book text trained on.

So it depends on the model I suppose and if the model enforces generative answers and blocks verbatim recitation.

[–] FaceDeer@fedia.io 2 points 3 weeks ago

Again, you should read the ruling. The judge explicitly addresses this. The Authors claim that this is how LLMs work, and the judge says "okay, let's assume that their claim is true."

Fourth, each fully trained LLM itself retained “compressed” copies of the works it had trained upon, or so Authors contend and this order takes for granted.

Even on that basis he still finds that it's not violating copyright to train an LLM.

And I don't think the Authors' claim would hold up if challenged, for that matter. Anthropic chose not to challenge it because it didn't make a difference to their case, but in actuality an LLM doesn't store the training data verbatim within itself. It's physically impossible to compress text that much.

load more comments (20 replies)
load more comments (28 replies)