this post was submitted on 28 Dec 2023
305 points (97.8% liked)

Technology

59377 readers
5843 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

The New York Times sues OpenAI and Microsoft for copyright infringement::The New York Times has sued OpenAI and Microsoft for copyright infringement, alleging that the companies’ artificial intelligence technology illegally copied millions of Times articles to train ChatGPT and other services to provide people with information – technology that now competes with the Times.

top 30 comments
sorted by: hot top controversial new old
[–] phoneymouse@lemmy.world 56 points 10 months ago (4 children)

There is something wrong when search and AI companies extract all of the value produced by journalism for themselves. Sites like Reddit and Lemmy also have this issue. I’m not sure what the solution is. I don’t like the idea of a web full of paywalls, but I also don’t like the idea of all the profit going to the ones who didn’t create the product.

[–] Kecessa@sh.itjust.works 13 points 10 months ago* (last edited 10 months ago)

The solution is imposing to these companies the responsibility of tracking their profit per media, tax them and redistribute that money based on the tracking info. They're able to track all the pages you visit, it's complete bullshit when they say they don't know how much they make for each places their ads are displayed.

[–] AllonzeeLV@lemmy.world 10 points 10 months ago (1 children)

but I also don’t like the idea of all the profit going to the ones who didn’t create the product.

Should... should we tell him?

[–] kilgore_trout@feddit.it 10 points 10 months ago

Tell them instead of mocking them.

Yes, "that's how the world works". But doesn't mean we should stop trying to change it.

[–] DogWater@lemmy.world 2 points 10 months ago

Ai isn't creating the product. It consumed it.

[–] LainOfTheWired@lemy.lol 22 points 10 months ago (6 children)

My question is how is an AI reading a bunch of articles any different from a human doing it. With this logic no one would legally be able to write an article as they are using bits of other peoples work they read that they learnt to write a good article with.

They are both making money with parts of other peoples work.

[–] hansl@lemmy.world 19 points 10 months ago

It was thought that the LLM wouldn’t keep the actual data internally verbatim. If you can memorize an article, and recite it to everyone free of charge, technically it’s plagiarism. Same if you sing a song to a crowd when you don’t have the rights.

The Google research (and other discovery) proved that you can actually extract verbatim training data from a LLM. Which has a lot of implications for copyright.

[–] MirthfulAlembic@lemmy.world 16 points 10 months ago

The physical limitations are an important difference. A human can only read and remember so much material. With AI, you can scale that exponentially with more compute resources. Frankly, IP law was not written with this possibility in mind and needs to be updated to find a balance.

[–] JonEFive@midwest.social 14 points 10 months ago (2 children)

Let me ask you this: when have you ever seen ChatGPT cite its sources and give appropriate credit to the original author?

If I were to just read the NYT and make money by simply summarizing articles and posting those summaries on my own website without adding anything to it like my own commentary and without giving credit to the author, that would rightfully be considered plagiarism.

This is a really interesting conundrum though. I would argue that AI isn't capable of original thought the way that humans are and therefore AI creators must provide due compensation to the authors and artists whose data they used.

AI is only giving back some amalgamation of words and concepts that it has been trained on. You might say that humans do the same, but that isn't exactly true. The human brain is a funny thing. It can forget, it can misremember. It can manipulate. It can exaggerate. It can plan. It can have irrational or emotional responses. AI can't really do those things on its own. It's just mimicking human behavior at best.

Most importantly to me though, AI is not capable of spontaneous thought. It is only capable of providing information that it has been trained on and only when prompted.

[–] thru_dangers_untold@lemm.ee 2 points 10 months ago* (last edited 10 months ago) (1 children)

There is evidence to suggest some LLM's have the ability to produce original outputs, such as DeepMind's solution to the cap set problem.

https://www.nature.com/articles/s41586-023-06924-6

On the other hand LLM's have some incredible text compression abilities

https://arxiv.org/abs/2308.07633

I'm pretty sure there is copyright infringement going on by the letter of the law. But I also think the world would be better off if copyright laws were a bit more loose. Not wild-west anything-goes libertarianism, but more open than the current state.

[–] JonEFive@midwest.social 2 points 10 months ago

I tend to agree with your last point, especially because of the way the system has been bastardized over the years. What started out as well intentioned legislation to ensure that authors and artists maintain control over their work has become a contentious and litigious minefield that barely protects creators.

[–] General_Effort@lemmy.world 1 points 10 months ago (1 children)

Let me ask you this: when have you ever seen ChatGPT cite its sources and give appropriate credit to the original author?

Bing chat now does that by default. Normally you have to prompt that manually.

If I were to just read the NYT and make money by simply summarizing articles and posting those summaries on my own website without adding anything to it like my own commentary and without giving credit to the author, that would rightfully be considered plagiarism.

No. It would be considered journalism. If you read the news a bit, you will find that they reference the output of other news corporations quite a bit. If your preferred news source does not do that, then they simply don't cite their sources.

[–] JonEFive@midwest.social 1 points 10 months ago (1 children)

Prompting for a source wouldn't satisfy me until I could trust that the AI wasn't hallucinating. After all, if GPT can make up facts about things like legal precedent or well documented events, why would I trust that its citations are legitimate?

And if the suggestion is that the person asking for the information double check the cited sources, maybe that's reasonable to request, but it somewhat defeats the original purpose.

Bing might be doing things differently though, so you might be right in your assessment on that front. I haven't played with their AI yet.

[–] General_Effort@lemmy.world 1 points 10 months ago

You did ask if ChatGPT had ever sighted sources. Bing uses it and besides, you can ask for that manually.

Whether it defeats the purpose depends on your original purpose.

[–] BURN@lemmy.world 6 points 10 months ago

An AI does not learn like a human does. Therefore the same laws and principles can’t be applied to computer “learning” as can be to human learning.

They’re fundamentally different uses of the material.

[–] topinambour_rex@lemmy.world 1 points 10 months ago* (last edited 10 months ago)

The main difference being the volume. An example I like is how Google trained his gaming AI to starcraft 2. This AI was able to beat high ranked professional gamers. It was trained by watching a century of games.

Chatgpt didn't read few articles, it read years of them, maybe a couple of decades.

[–] KingThrillgore@lemmy.ml -1 points 10 months ago

If this lawsuit causes it to be ILLEGAL to read anything you buy because you could plagiarize it, Bradbury is gonna spin in his fucking grave.

[–] burliman@lemmy.today 4 points 10 months ago (1 children)

Reminds me of Nokia suing Apple (two waves), Blockbuster suing Netflix, and Yahoo suing Facebook. Threatened, declining company suing a disruptor is what we can expect will always happen I guess. Will be nice to see this stuff finally tested in court though.

[–] jacksilver@lemmy.world 13 points 10 months ago

Except the news still needs to come from somewhere. While GPT can "create" things, it's not a journalist. It's just the next step in aggregation skimming money from the actual sources.

[–] maegul@lemmy.ml 3 points 10 months ago

Interesting take on mastodon on this in this thread: https://hachyderm.io/@Impossible_PhD/111654403989681220