this post was submitted on 17 May 2024

92 points (96.9% liked)

Privacy

48805 readers

543 users here now

A place to discuss privacy and freedom in the digital world.

Privacy has become a very important issue in modern society, with companies and governments constantly abusing their power, more and more people are waking up to the importance of digital privacy.

In this community everyone is welcome to post links and discuss topics related to privacy.

Some Rules

Posting a link to a website containing tracking isn't great, if contents of the website are behind a paywall maybe copy them into the post
Don't promote proprietary software
Try to keep things on topic
If you have a question, please try searching for previous discussions, maybe it has already been answered
Reposts are fine, but should have at least a couple of weeks in between so that the post can reach a new audience
Be nice :)

Related communities

much thanks to @gary_host_laptop for the logo design :)

founded 6 years ago

MODERATORS

OpenAI and Reddit Partnership (openai.com)

submitted 2 years ago* (last edited 2 years ago) by governorkeagan@lemdro.id to c/privacy@lemmy.ml

13 comments fedilink hide all child comments

It has finally happened...not surprised though.

all 15 comments

sorted by: hot top controversial new old

[–] MaximilianKohler@lemmy.world 30 points 2 years ago (1 children)

This is horrible news. Reddit is a horrible website and only getting worse. OpenAI promoting them and using their garbage content to train their AI systems is alarming. This is so dystopian.

And of course it always leads back to money:

Sam Altman is a shareholder in Reddit

[–] archchan@lemmy.ml 7 points 2 years ago (1 children)

I don't use ChatGPT and host my own models ✅️ I don't use Reddit, I use Lemmy ✅️

I agree with the dystopian part. It's not a warm and fuzzy feeling, this feeling. A feeling that's too common for my liking these days.

[–] squidspinachfootball@lemm.ee 1 points 2 years ago

Curious to know more about how you host your own models? Does it require high end hardware?

[–] Madiator2011@lm.madiator.cloud 25 points 2 years ago

To be true everything we post online can be used for training. Reddit is just made for money :P Kinda using more Lemmy now for posting and reddit just for browsing like archive.

[–] swooosh@lemmy.world 16 points 2 years ago (1 children)

It's crazy that reddit doesn't have to ask everyone if they want to contribute. This shows who owns and controls your posts.

[–] Deckweiss@lemmy.world 10 points 2 years ago* (last edited 2 years ago) (3 children)

The actual crazy thing is:

Imagine if somebody ran a Lemmy instance and just subscribed to every sublemmy and scraped all the data without asking. And nobody would even notice.

Reddit owns the content posted on their platform. But when you post on lemmy, everybody owns it, including every data company large and small.

But hey, at least we are feeling good about our social media platform choise, cause it's federated and open source or whatever, right?

[–] swooosh@lemmy.world 5 points 2 years ago* (last edited 2 years ago)

Like facebooks threads?

Everyone can use it. With reddit's posts, only reddit can do it.

[–] ExtremeDullard@lemmy.sdf.org 7 points 2 years ago

Ain't you glad you gave Reddit content for free and they're reselling if for millions?

[–] Gsus4@mander.xyz 7 points 2 years ago* (last edited 2 years ago)

This is not so bad. Reddit is crawling with bot spam and that will increase as users leave the platform every time it does a stunt to pump the stock price. The percentage of real/fake content will decrease and will poison the training pipelines. It's a great experiment to test model collapse in real time, really.

[–] Scolding0513@sh.itjust.works 3 points 2 years ago

this was announced a while ago

[–] Wanangwa_Bamidele@thelemmy.club 3 points 2 years ago (1 children)

How bad this could be ? Enlightning me please.

[–] ResoluteCatnap@lemmy.ml 2 points 2 years ago

Its not any different than how it already was. Initially the GenAI models were all being trained on masses of unlicensed data including data from reddit. The problem is some companies like New York Times are suing for training an LLM off of their data. So in response companies like OpenAI are now trying to reach partnerships that basically license the use of the data (that they already had). This also means that they will be able to continue to have future access to that data as long as the partnership is in place. Whereas some companies without a partnership could start to ban scraping activity or update their terms to forbid training AI off of their data.

Overall these partnerships are a good thing. Licensed training data is good. But from a privacy standpoint, the AI models were already trained on reddit data. This is just formalizing the relationship

[–] Tomkoid@lemmy.ml 3 points 2 years ago

I like how they monetized their API and data because they don't want it to be used to train AI models, and now they are selling user data for millions to OpenAI.