this post was submitted on 15 Feb 2024

82 points (100.0% liked)

Technology

42279 readers

649 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 4 years ago

MODERATORS

alyaza@beehaw.org

TheRtRevKaiser@beehaw.org

gyrfalcon@beehaw.org

rs5th@beehaw.org

SemioticStandard@beehaw.org

TheRtRevKaiser@kbin.social

coldredlight@beehaw.org

remington@beehaw.org

Cable can't compete with 5G home internet, so it's cheating (www.spacebar.news)

submitted 2 years ago by corbin@infosec.pub to c/technology@beehaw.org

60 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] onlinepersona@programming.dev 18 points 2 years ago (2 children)

The US, wow... what a place to live in as the 99%.

CC BY-NC-SA 4.0

[–] acastcandream@beehaw.org 32 points 2 years ago (2 children)

CC BY-NC-SA 4.0

Why are you putting a CC license on your comments?

[–] BotCheese@beehaw.org 21 points 2 years ago (5 children)

From what I understand it is some thing for AI, to stop them from harvesting or to poison the data, by having it repeating therefore more likely to show up.

[–] beefcat@beehaw.org 59 points 2 years ago (1 children)

Sounds an awful lot like that thing boomers used to do on Facebook where they would post a message on their wall rescinding Facebook's rights to the content they post there. I'm sure it's equally effective.

[–] Bene7rddso@feddit.de 4 points 2 years ago (1 children)

Sure, the fun begins when it starts spitting out copyright notices

[–] t3rmit3@beehaw.org 2 points 2 years ago

That would require a significant number of people to be doing it, to 'poison' the input pool, as it were.

[–] corbin@infosec.pub 41 points 2 years ago

It seems pretty well established at this point that AI training models don't respect copyright.

[–] mozz@mbin.grits.dev 20 points 2 years ago* (last edited 2 years ago) (1 children)

I would be extremely extremely surprised if the AI model did anything different with "this comment is protected by CC license so I don't have the legal right to it" as compared with its normal "this comment is copyright by its owner so I don't have the legal right to it hahaha sike snork snork snork I absorb" processing mode.

[–] Max_P@lemmy.max-p.me 13 points 2 years ago (1 children)

No but if they forget to strip those before training the models, it's gonna start spitting out licenses everywhere, making it annoying for AI companies.

It's so easily fixed with a simple regex though, it's not that useful. But poisoning the data is theoretically possible.

[–] t3rmit3@beehaw.org 1 points 2 years ago

Only if enough people were doing this to constitute an algorithmically-reducible behavior.

If you could get everyone who mentions a specific word or subject to put a CC license in their comment, then an ML model trained on those comments would likely output the license name when that subject was mentioned, but they don't just randomly insert strings they've seen, without context.

[–] peter@feddit.uk 19 points 2 years ago

That seems stupid

[–] acastcandream@beehaw.org 12 points 2 years ago

Interesting. Feels like that thing people used to add to FB comments back in the day that did nothing but in the case of AI I could see it maybe doing something. I’ll be looking into it - thanks!

[–] conciselyverbose@kbin.social 19 points 2 years ago* (last edited 2 years ago)

To turn every comment, no matter how on topic, into obnoxious spam.

[–] Danterious@lemmy.dbzer0.com 17 points 2 years ago* (last edited 2 years ago) (2 children)

You know if you want to do something more effective than just putting copyright at the end of your comments you could try creating an adversarial suffix using this technique. It makes any LLM reading your comment begin its response with any specific output you specify (such as outing itself as a language model or calling itself a chicken).

It gives you the code necessary to be able to create it.

There are also other data poisoning techniques you could use just to make your data worthless to the AI but this is the one I thought would be the most funny if any LLMs were lurking on lemmy (I have already seen a few).

[–] dubyakay@lemmy.ca 5 points 2 years ago

Thanks for the link. This was a good read.

[–] onlinepersona@programming.dev 2 points 2 years ago

That's a neat idea and I've considered it, but would need time to research and test. Time I don't have, so this is the easiest thing I came up with. If there were a bot, plugin, browser extension, or something that did the necessary modifications and kept up to date with new developments in AI, I'd use it.

CC BY-NC-SA 4.0