Technology

74345 readers

2847 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

298

Reddit has a new AI training deal to sell user content (www.theverge.com)

submitted 2 years ago by L4s@lemmy.world to c/technology@lemmy.world

31 comments fedilink hide all child comments

Reddit has a new AI training deal to sell user content::Reddit has reportedly made a deal with an unnamed AI company to allow access to its platform’s content for the purposes of AI model training.

you are viewing a single comment's thread
view the rest of the comments

[–] Lmaydev@programming.dev 26 points 2 years ago (3 children)

I'd be very surprised if people weren't already scraping Reddit for this.

[–] NoRodent@lemmy.world 20 points 2 years ago* (last edited 2 years ago) (1 children)

I mean, there's /r/SubSimulatorGPT2 that's been running for years... Although that one was at least hilarious to read because at that stage the AI was in the sweet spot of being simultaneously coherent while making total lapses in logic.

[–] TexasDrunk@lemmy.world 6 points 2 years ago (1 children)

Didn't forget incredibly racist on multiple occasions.

[–] bbkpr@lemmy.world 2 points 2 years ago

The AI is what was fed into it 😂

[–] Verserk@lemmy.dbzer0.com 8 points 2 years ago (1 children)

That was the real reason for the API changes last year, apps just got caught in the crossfire.

[–] fuckwit_mcbumcrumble@lemmy.world 3 points 2 years ago

Yeah I thought that was pretty well the established conscientious on the thing. People questioning it confuses me honestly.

[–] NeatNit@discuss.tchncs.de 6 points 2 years ago* (last edited 2 years ago) (1 children)

it's all but guaranteed. Reminds me of this Computerphile video: https://youtu.be/WO2X3oZEJOA?t=874 TL;DW: there were "glitch tokens" in GPT (and therefore ChatGPT) which undeniably came from Reddit usernames.

Note, there's no proof that these reddit usernames were in the training data (and there's even reasons to assume that they weren't, watch the video for context) but there's no doubt that OpenAI already had scraped reddit data at some point prior to training, probably mixed in with all the rest of their text data. I see no reason to assume they completely removed all reddit text before training. The video suggest reasons and evidence that they removed certain subreddits, not all of reddit.

[–] PipedLinkBot@feddit.rocks 1 points 2 years ago

Here is an alternative Piped link(s):

https://piped.video/WO2X3oZEJOA?t=874

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I'm open-source; check me out at GitHub.