this post was submitted on 24 Aug 2023

216 points (95.0% liked)

Technology

73534 readers

4212 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

216

Stephen King: My Books Were Used to Train AI (www.theatlantic.com)

submitted 2 years ago by L4s@lemmy.world to c/technology@lemmy.world

39 comments fedilink hide all child comments

Stephen King: My Books Were Used to Train AI::One prominent author responds to the revelation that his writing is being used to coach artificial intelligence.

top 39 comments

sorted by: hot top controversial new old

[–] CitizenKong@lemmy.world 42 points 2 years ago (2 children)

The AI in black fled into the desert and the wordslinger followed.

[–] discomatic@lemmy.ca 1 points 2 years ago

This is my favourite comment on Lemmy so far.

[–] afraid_of_zombies@lemmy.world 1 points 2 years ago

Don't worry, a later AI will republish it and it will suck.

The Gunslinger was one of my favorites, before King decided to George Lucas it.

[–] TheFrogThatFlies@lemmy.world 23 points 2 years ago (3 children)

We need an AI with all human knowledge, or various with different specializations. But those AIs must not be in the hands of companies.

[–] BloodForTheBloodGod@lemmy.ca 3 points 2 years ago

AI doesn't currently have any knowledge of facts. It just knows patterns.

[–] darth_helmet@sh.itjust.works 2 points 2 years ago

if you think companies are going to misuse and abuse ai, just wait until we find out how the man is using them.

[–] Steeve@lemmy.ca 1 points 2 years ago* (last edited 2 years ago) (1 children)

Problem is, how are you gunna run it? Meta has already open sourced an LLM that rivals GPT-4 with only 65B parameters, but you can't even come close to running it with a top of line GPU.

[–] afraid_of_zombies@lemmy.world 1 points 2 years ago (1 children)

Maybe the Wikipedia Federation can do it.

[–] Steeve@lemmy.ca 1 points 2 years ago

They could! It is open source.

[–] ashok36@lemmy.world 11 points 2 years ago (1 children)

I mean, yeah, duh. Just ask any of them to write a paragraph "in the style of INSERT AUTHOR".

If it can, then it was trained on that author. I'm not sure how that's a problem though.

[–] ForgotAboutDre@lemmy.world 14 points 2 years ago (10 children)

We don't have the legal framework for this type of thing. So people are going to disagree with how using training data for a commercial AI product should work.

I imagine Steven King would argue they didn't have licenses or permission to use his books to train their AI. So he should be compensated or the AI deleted/retrained. He would argue buying a copy of the book only lets it be used for humans to read. Similar to buying a CD doesn't allow you to put that song in your advert.

load more comments (10 replies)

[–] Turun@feddit.de 8 points 2 years ago (1 children)

Yes, and all of modern fantasy is heavily influenced by Tolkien's writing, who in turn took inspiration from old legends like Beowulf.

As if human artists and writers are blind to anything ever created.

[–] sab@lemmy.world 0 points 2 years ago (2 children)

Humans with imperfect memories being influenced by a work <> AI language models being trained on a work.

[–] p03locke@lemmy.dbzer0.com 5 points 2 years ago (1 children)

You seem to imply that AI has perfect memory. It doesn't.

Stable Diffusion is a 4GB file of weights. ChatGPT's model is of a similar size. It is mathematically impossible for it to store the entire internet on a few GBs of data, just like it is physically impossible for one human brain to store the entire internet with its neutral network.

[–] sab@lemmy.world 0 points 2 years ago* (last edited 2 years ago) (1 children)

But you can easily fit all of Kings work in a 4gb model. Just because it isn't done in the most popular models, doesn't make it ethical to do it in the first place.

In my opinion, you should only be able to use a work to train an AI model, if the work is public domain or if you have explicit permission to do so by the license holder. Especially if you then use that model for profit or charge orders to use ie.

[–] p03locke@lemmy.dbzer0.com 3 points 2 years ago (1 children)

But you can easily fit all of Kings work in a 4gb model.

But, uhhhh, they didn't. They didn't copy everything, word for word, and put it into a model. That's not how AI models work.

[–] sab@lemmy.world -2 points 2 years ago* (last edited 2 years ago) (1 children)

I didn't claim it was.

We can discuss technicalities all day long, but that's so beside the point. Thread OP claimed that creating an LLM based on a copyrighted work is okay, because humans are influenced by other works as well. But a human can't crank out hundreds of Stephen King-like chapters per hour. Or hundreds of Dali-like paintings pretty minute.

If King or Dali had given permission for their works to be used in this way, it might have been a different story, but as it is, AI models are being trained on (and profit from) huge amounts of data that they did not have permission for.

Edit: nevermind, I think trying to discuss AI ethics with you is pointless. Have a nice weekend!

[–] commie@lemmy.dbzer0.com 4 points 2 years ago

But a human can’t crank out hundreds of Stephen King-like chapters per hour. Or hundreds of Dali-like paintings pretty minute.

so?

[–] Turun@feddit.de 2 points 2 years ago (1 children)

Sure, if you want to see it like that. But if you try out StableDiffusion, etc you will notice that "imperfect memory" describes the AI as well. You can ask it for famous paintings and it will get the objects and colors generally correct, but only as well as a human artist would. The details will be severely lacking. And that's the best case scenario for the AI, because famous paintings will be over represented in the training data.

[–] sab@lemmy.world -3 points 2 years ago* (last edited 2 years ago) (3 children)

Nah.

By default an AI will draw from its entire memory, and so will have lots of different influences. But by tuning your prompt (or restricting your input dataset) you can make it so specific, it's basically creating near perfect clones. And contrary to a human, it can then produce similar works hundreds of times per minute.

But even that is beside the point. Those works were sold under the presumption that people will read them. Not to ingest them into a LLM or text-to-image model. And now, companies like openai and others profit from the models they trained without permission from the original author. That's just wrong.

Edit: As several people mentioned, I exaggerated when I said near perfect clones, I'll admit that. But just because it doesn't violate copyright (IANAL), doesn't mean it's ethical to take a work and make derivatives of it on an unlimited scale.

[–] stephen01king@lemmy.zip 3 points 2 years ago

If you wanna make the claim that AI can make perfect clones, you gotta provide more proof than just your own words. I personally has never managed to make that happen.

[–] AEsheron@lemmy.world 3 points 2 years ago

Have you used Stable Diffusion. I defy you to make a perfect clone of any image. Take a whole week to try and refine it if you want. It is basically impossible by definition, unless you only trained it on that one image.

[–] BetaDoggo_@lemmy.world 2 points 2 years ago

Obviously restricting the input will cause the model to overfit, but that's not an issue for most models where Billions of samples are used. In the case of stable diffusion this paper had a ~0.03% success rate extracting training data after 500 attempts on each image, ~6.23E-5% per generation. And that was on a targeted set with the highest number of duplicates in the dataset.

The reason they were sold doesn't matter, as long as the material isn't being redistributed copyright isn't being violated.

[–] InvertedParallax@lemm.ee 7 points 2 years ago (1 children)

That doesn't bode well for humanity.

[–] afraid_of_zombies@lemmy.world 1 points 2 years ago

And? If we are that pathetic maybe we deserve extinction.

How does having a new tool leave you worse off?

[–] Treczoks@lemmy.world 6 points 2 years ago

Now that might give an AI scary ideas...

[–] Kolanaki@yiffit.net 4 points 2 years ago

Is that why when I ask ChatGPT to tell me a scary story, it invariably contains a sex scene between minors?

[–] afraid_of_zombies@lemmy.world 3 points 2 years ago

So if your AI responses are biased towards car crashes you will know why now.

Take a Stephen King book you have never read. Open a random page and point to a random paragraph. Do this 3x. You will find a car crash, a memory of a car crash, someone talking about a car crash, or someone concluding X happened because of a car crash.

[–] 5BC2E7@lemmy.world 2 points 2 years ago

Does he also have a problem with people that were changed by his books?