this post was submitted on 17 Feb 2024

1089 points (98.7% liked)

Technology

72729 readers

1704 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

1089

Reddit started doing what they always wanted to do, sell user content to AI. (www.reuters.com)

submitted 1 year ago by Fake4000@lemmy.world to c/technology@lemmy.world

201 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] prex@aussie.zone 60 points 1 year ago (3 children)

I assume AI is training off the content here for free.

[–] Bishma@discuss.tchncs.de 38 points 1 year ago (2 children)

Yes, but there's no contract to give them legal cover if anyone ever does anything about all the content they steal.

[–] deweydecibel@lemmy.world 27 points 1 year ago* (last edited 1 year ago) (3 children)

And ya know what? Frankly, if AI is going to harvest all this shit, I'd rather fuckers like spez couldn't get rich off it in the process. Granted I'm not happy the tech bros running these AI companies are getting rich with these fucking things, but I can at least take solace that, for Lemmy at least, there isn't some asshole middle man making bank off the work and words of users they never paid a dime to.

Genuinely, why does Sepz and Reddit deserve to make money off anything we posted? Why does any social media site? They make the site, pay for the servers, maintain the apps, sure, and they can get compensation for that, I don't see a problem there. But why does any social media company deserve to get rich when the only thing that makes their platform valuable is the people that post to it? Reddit didn't even have paid mods, the community did all the work on the content of that site, why in the fuck do we tolerate these assholes making profit off it like this?

[–] General_Effort@lemmy.world 2 points 1 year ago

This is sad to read because I agree with all of it (except the casual sexism).

why in the fuck do we tolerate these assholes making profit off it like this?

Look at this thread. People delete their posts on Reddit. Which means that they can no longer be scraped for free. Which means they are now exclusively available in Reddit's archive. It's not that people tolerate it. It's that the first instinct of people who don't tolerate it, is to make it worse. What can you do?

[–] prex@aussie.zone 1 points 1 year ago

100%

[–] Quadhammer@lemmy.world 0 points 1 year ago

Intellectual property theft

[–] Buddahriffic@lemmy.world 5 points 1 year ago (1 children)

What do you mean? What legal cover do they need against what actions?

[–] Bishma@discuss.tchncs.de 6 points 1 year ago

If the EU (or any other governments) decide that AI can't legally train their models on information they don't own or license (I don't know how that would work legally but they talk about it), then this company that Reddit has sold access to could argue to lawmakers that they have license for all the content on Reddit. I don't know that it would hold up, but I suspect it's part of the company's perceived value in this Reddit deal.

[–] OmanMkII@aussie.zone 14 points 1 year ago (3 children)

I was curious if a robots.txt equivalent exists for AI training data, and there was some solid points here:

If I go to your writing, I read it & learn from it. Your writing influences my future writing. We've been okay with this as long as it's not a blatant forgery.

If a computer goes to your writing, it reads it & learns from it. Your writing influences its future writing. It seems we are not okay with this, even if it isn't blatant forgery.

[AI at the moment is] different because the company is re-using your material to create a product they are going to sell. I'm not sure if I believe that is so different than a human employee doing the same thing.

https://news.ycombinator.com/item?id=34324208

I still think we should have the ability to opt out like we do with search engines and webcrawlers, but if the algorithm works ideally and learns but does not recycle content, is it truly any different from a factory of workers pumping out clones of popular series on Amazon? I honestly don't know the answer to that.

[–] deweydecibel@lemmy.world 9 points 1 year ago* (last edited 1 year ago)

The problem is not the technology, the problem is the businesses and the people behind them.

These tools were made with the explicit purpose of taking the content that they did not create, repurposing them, and creating a product. Throw all these conversation about intelligence and learning out the fucking window, what matters is what the thing does, and why it was created to do that thing.

Until we reach a point where there is some sort of AI out there that has any semblance of free will, and can choose not to learn if fed certain information, and choose not to respond to input given to it without being programmed to do not respond, then we are not talking about intelligence, we are talking about a tool. No matter how they dress it up.

Stop arguing about this on their terms, because they're gaslighting the fuck out of you.

[–] Appoxo@lemmy.dbzer0.com 6 points 1 year ago (1 children)

Afaik the OpenAI bot may choose to ignore it? At least that's what another user claimed it does.

[–] JohnEdwa@sopuli.xyz 12 points 1 year ago

Robots.txt has been always ignored by some bots, it's just a guideline originally meant to prevent excessive bandwidth usage by search indexing bots and is entirely voluntary.

Archive.org bot for example has completely ignored it since 2017.

[–] MossyFeathers@pawb.social 1 points 1 year ago

This is kinda my take on it. However, the way I see it is that the AI isn't intelligent enough yet to truly create something original. As such, right now AI is closer to being a tool than a being. Because of that, it somewhat bothers me that I'm being used to teach a tool. If I thought that companies like OpenAI were truly trying to create beings and not tools, then I'd feel differently.

It's kinda nuanced, but a being can voluntarily determine whether or not something is copyright infringing, understand why that might be an issue, and then decide whether or not to continue writing based on that. A tool can't really do that. You can try and add filters to a tool to avoid writing copy written text, but that will have flaws and holes in it. A being who understands what it's writing and what makes it plagiarism vs reference vs homage/inspiration/whatever is less likely to have those issues.

[–] rar@discuss.online 3 points 1 year ago

It's all federated, so it would be strange the bots didn't scrape anything off.