this post was submitted on 09 Jan 2024

488 points (98.4% liked)

Technology

86387 readers

3591 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 3 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

488

‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says (www.theguardian.com)

submitted 2 years ago by L4s@lemmy.world to c/technology@lemmy.world

292 comments fedilink hide all child comments

‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says::Pressure grows on artificial intelligence firms over the content used to train their products

you are viewing a single comment's thread
view the rest of the comments

[–] S410@lemmy.ml 18 points 2 years ago (2 children)

They're not wrong, though?

Almost all information that currently exists has been created in the last century or so. Only a fraction of all that information is available to be legally acquired for use and only a fraction of that already small fraction has been explicitly licensed using permissive licenses.

Things that we don't even think about as "protected works" are in fact just that. Doesn't matter what it is: napkin doodles, writings on bathrooms stall walls, letters written to friends and family. All of those things are protected, unless stated otherwise. And, I don't know about you, but I've never seen a license notice attached to a napkin doodle.

Now, imagine trying to raise a child while avoiding every piece of information like that; information that you aren't licensed to use. You wouldn't end up with a person well suited to exist in the world. They'd lack education regarding science, technology, they'd lack understanding of pop-culture, they'd know no brand names, etc.

Machine learning models are similar. You can train them that way, sure, but they'd be basically useless for real-world applications.

[–] AntY@lemmy.world 48 points 2 years ago (5 children)

The main difference between the two in your analogy, that has great bearing on this particular problem, is that the machine learning model is a product that is to be monetized.

[–] deweydecibel@lemmy.world 8 points 2 years ago (2 children)

And ultimately replace the humans it learned from.

[–] Zoboomafoo@slrpnk.net 0 points 2 years ago

Good, I want AI to do all my work for me

[–] afraid_of_zombies@lemmy.world -1 points 2 years ago

Yes clearly 90 years plus death of artist is acceptable

[–] LWD@lemm.ee 2 points 2 years ago* (last edited 2 years ago)

deleted

[–] BURN@lemmy.world 2 points 2 years ago (1 children)

Also an “AI” is not human, and should not be regulated as such

[–] afraid_of_zombies@lemmy.world -1 points 2 years ago (1 children)

Neither is a corporation and yet they claim first amendment rights.

[–] BURN@lemmy.world 2 points 2 years ago (1 children)

That’s an entirely separate problem, but is certainly a problem

[–] afraid_of_zombies@lemmy.world -1 points 2 years ago (1 children)

I don't think it is. We have all these non-human stuff we are awarding more rights to than we have. You can't put a corporation in jail but you can put me in jail. I don't have freedom from religion but a corporation does.

[–] BURN@lemmy.world 2 points 2 years ago (1 children)

Corporations are not people, and should not be treated as such.

If a company does something illegal, the penalty should be spread to the board. It’d make them think twice about breaking the law.

We should not be awarding human rights to non-human, non-sentient creations. LLMs and any kind of Generative AI are not human and should not in any case be treated as such.

[–] afraid_of_zombies@lemmy.world -1 points 2 years ago (1 children)

Corporations are not people, and should not be treated as such.

Understand. Please tell Disney that they no longer own Mickey Mouse.

[–] BURN@lemmy.world 1 points 2 years ago (1 children)

Again, I literally already said that it’s a problem.

IP law is also different than granting rights to corporations. Corporations SHOULD be allowed to own IP, provided they’ve compensated the creator.

[–] afraid_of_zombies@lemmy.world -2 points 2 years ago (1 children)

For 90 years after the ceator's death?

[–] BURN@lemmy.world 1 points 2 years ago (1 children)

Honestly, yes. I’m ok with that. People are not entitled to be able to do anything they want with someone else’s IP. 90 years is almost reasonable. Cut it in half and I’d also consider it fairly reasonable.

I’m all for expanding copyright for individuals and small companies (small media companies, photographers who are incorporated, artists who make money based on commissions, etc) and reducing it for mega corps, but there’s an extremely fine line around that.

[–] afraid_of_zombies@lemmy.world -2 points 2 years ago

Well I am not. If the goal is to promote artistic creation it should not follow inheritance. Heck it shouldn't even be 45 years. No one at Disney was alive when Mickey was made therefore it should be public domain.

Once you fix that let me know.

[–] testfactor@lemmy.world -1 points 2 years ago

And real children aren't in a capitalist society?

[–] Exatron@lemmy.world 11 points 2 years ago (1 children)

The difference here is that a child can't absorb and suddenly use massive amounts of data.

[–] S410@lemmy.ml 5 points 2 years ago* (last edited 2 years ago) (2 children)

The act of learning is absorbing and using massive amounts of data. Almost any child can, for example, re-create copyrighted cartoon characters in their drawing or whistle copyrighted tunes.

If you look at, pretty much, any and all human created works, you will be able to trace elements of those works to many different sources. We, usually, call that "sources of inspiration". Of course, in case of human created works, it's not a big deal. Generally, it's considered transformative and a fair use.

[–] hellothere@sh.itjust.works 4 points 2 years ago

It's a question of scale. A single child cannot replace literally all artists, for example.

[–] Exatron@lemmy.world 2 points 2 years ago (1 children)

The problem is that a human doesn’t absorb exact copies of what it learns from, and fair use doesn't include taking entire works, shoving them in a box, and shaking it until something you want comes out.

[–] S410@lemmy.ml -1 points 2 years ago (2 children)

Expect for all the cases when humans do exactly that.

A lot of learning is, really, little more than memorization: spelling of words, mathematical formulas, physical constants, etc. But, of course, those are pretty small, so they don't count?

Then there's things like sayings, which are entire phrases that only really work if they're repeated verbatim. You sure can deliver the same idea using different words, but it's not the same saying at that point.

To make a cover of a song, for example, you have to memorize the lyrics and melody of the original, exactly, to be able to re-create it. If you want to make that cover in the style of some other artist, you, obviously, have to learn their style: that is, analyze and memorize what makes that style unique. (e.g. C418 - Haggstrom, but it's composed by John Williams)

Sometimes the artists don't even realize they're doing exactly that, so we end up with with "subconscious plagiarism" cases, e.g. Bright Tunes Music v. Harrisongs Music.

Some people, like Stephen Wiltshire, are very good at memorizing and replicating certain things; way better than you, I, or even current machine learning systems. And for that they're praised.

[–] Exatron@lemmy.world 2 points 2 years ago (1 children)

Except they literally don't. Human memory doesn't retain an exact copy of things. Very good isn't the same as exactly. And human beings can't grab everything they see and instantly use it.

[–] S410@lemmy.ml 0 points 2 years ago* (last edited 2 years ago)

Machine learning doesn't retain an exact copy either. Just how on earth do you think can a model trained on terabytes of data be only a few gigabytes in side, yet contain "exact copies" of everything? If "AI" could function as a compression algorithm, it'd definitely be used as one. But it can't, so it isn't.

Machine learning can definitely re-create certain things really closely, but to do it well, it generally requires a lot of repeats in the training set. Which, granted, is a big problem that exists right now, and which people are trying to solve. But even right now, if you want an "exact" re-creation of something, cherry picking is almost always necessary, since (unsurprisingly) ML systems have a tendency to create things that have not been seen before.

Here's an image from an article claiming that machine learning image generators plagiarize things.

However, if you take a second to look at the image, you'll see that the prompters literally ask for screencaps of specific movies with specific actors, etc. and even then the resulting images aren't one-to-one copies. It doesn't take long to spot differences, like different lighting, slightly different poses, different backgrounds, etc.

If you got ahold of a human artist specializing in photoreal drawings and asked them to re-create a specific part of a movie they've seen a couple dozen or hundred times, they'd most likely produce something remarkably similar in accuracy. Very similar to what machine learning images generators are capable of at the moment.

[–] PipedLinkBot@feddit.rocks 1 points 2 years ago

Here is an alternative Piped link(s):

C418 - Haggstrom, but it's composed by John Williams

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I'm open-source; check me out at GitHub.