Technology

43242 readers

239 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 4 years ago

MODERATORS

alyaza@beehaw.org

TheRtRevKaiser@beehaw.org

gyrfalcon@beehaw.org

rs5th@beehaw.org

SemioticStandard@beehaw.org

TheRtRevKaiser@kbin.social

coldredlight@beehaw.org

remington@beehaw.org

224

OpenAI says it’s “impossible” to create useful AI models without copyrighted material (arstechnica.com)

submitted 2 years ago by sculd@beehaw.org to c/technology@beehaw.org

118 comments fedilink hide all child comments

Apparently, stealing other people's work to create product for money is now "fair use" as according to OpenAI because they are "innovating" (stealing). Yeah. Move fast and break things, huh?

"Because copyright today covers virtually every sort of human expression—including blogposts, photographs, forum posts, scraps of software code, and government documents—it would be impossible to train today’s leading AI models without using copyrighted materials," wrote OpenAI in the House of Lords submission.

OpenAI claimed that the authors in that lawsuit "misconceive[d] the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence."

you are viewing a single comment's thread
view the rest of the comments

[–] Critical_Insight@feddit.uk 7 points 2 years ago (2 children)

There's not a musician that havent heard other songs before. Not a painter that haven't seen other painting. No comedian that haven't heard jokes. No writer that haven't read books.

AI haters are not applying the same standards to humans that they do to generative AI. Obviously this is not to say that AI can't plagiarize. If it's spitting out sentences that are direct quotes from an article someone wrote before and doesn't disclose the source then yeah that is an issue. There's however a limit after which the output differs enough from the input that you can't claim it's stealing even if perfectly mimics the style of someone else.

Just because DallE creates pictures that have getty images watermark on them it doesn't mean the picture itself is a direct copy from their database. If anything it's the use of the logo that's the issue. Not the picture.

[–] sculd@beehaw.org 8 points 2 years ago (1 children)

Said in another thread but I will repeat here. AIs are not humans. AIs' creative process and learning process are also different.

AIs are being used to make profit for executives while creators suffer.

[–] Critical_Insight@feddit.uk 3 points 2 years ago (1 children)

That sucks for the creators ofcourse but if AI creates better content that's where people will go. That's a big if though especially in the near future

[–] sour@kbin.social 4 points 2 years ago

better

[–] BraveSirZaphod@kbin.social 7 points 2 years ago

AI haters are not applying the same standards to humans that they do to generative AI

I don't think it should go unquestioned that the same standards should apply. No human is able to look at billions of creative works and then create a million new works in an hour. There's a meaningfully different level of scale here, and so it's not necessarily obvious that the same standards should apply.

If it’s spitting out sentences that are direct quotes from an article someone wrote before and doesn’t disclose the source then yeah that is an issue.

A fundamental issue is that LLMs simply cannot do this. They can query a webpage, find a relevant chunk, and spit that back at you with a citation, but it is simply impossible for them to actually generate a response to a query, realize that they've generated a meaningful amount of copyrighted material, and disclose its source, because it literally does not know its source. This is not a fixable issue unless the fundamental approach to these models changes.