Lemmy

12548 readers

4 users here now

Everything about Lemmy; bugs, gripes, praises, and advocacy.

For discussion about the lemmy.ml instance, go to !meta@lemmy.ml.

founded 4 years ago

MODERATORS

nutomic@lemmy.ml

Protecting user content and data on Lemmy (programming.dev)

submitted 9 months ago* (last edited 9 months ago) by silas@programming.dev to c/lemmy@lemmy.ml

18 comments fedilink hide all child comments

I see talk here and there about how any company or individual can easily use anything we post on Lemmy however they want. This could include AI training, behavior analysis, or user profiling. With the recent news of Reddit data being sold and licensed for AI training, I thought this would be a great time to preemptively discuss how we feel about this topic and brainstorm ways to discourage unwanted use of the content we post.

I’ve seen some users add a license to the end of each of their comments. One idea might be this: Add a feature to Lemmy where each user can choose a content license that applies to everything they post. For example, one user might choose to no rights for their content (like CC0) because they don’t care how their data is used. Another user might not want companies profiting off their posts, so they’d choose a more restrictive license.

I’m eager to here everyone’s thoughts on the whole topic, so to kick things off:

Do you care how your public data and posted content is used? Why or why not?
What do you think of choosing a content license for your Lemmy account? Does this contradict the FOSS model?
Should Lemmy have features to protect user data/content in this way, or should that be left up to the user to figure out on their own?

Data is becoming an increasingly valuable commodity in the digital world. Hopefully these big-picture conversations can help us see what we value as a community and be more prepared for the future.

you are viewing a single comment's thread
view the rest of the comments

[–] scrubbles@poptalk.scrubbles.tech 26 points 9 months ago* (last edited 9 months ago) (8 children)

I'm sorry, but that's just impossible here. I'm sorry to tell you, but it is.

ActivityPub is a protocol which takes your content and blasts it out to anyone who listens. That's the design of it, that we all listen on our own servers and we can then treat our servers as we want. There is no profit motive on our servers because anyone could just jump to a new server.

However, this means there is literally no opt out protocol. Anyone can start a server, which means anyone can start a server. Governments, corporations, the jerk down the street, anyone. The only way to turn that off is by saying "Defederate from this server", but of course the anonymous nature.. we don't have to know who they are.

Of course we can defederate from other servers but since anyone can spin up a server on any domain, how do you know that Meta doesn't have a server right now at some weird domain? OpenAI could be listening right now and training. In fact I'd be surprised if the site formerly known as Twitter didn't have a mastodon server up so they could keep tabs on it

Even deleting a message is another blast out to all other servers. "Hey, this user requests you delete this message". So what happens if someone modifies their code to just ignore that?

I guess what I'm trying to say is that the fediverse is open and free - and the downside of being open and free is that it's open and free - to everyone. There is no permenent delete. There is no way to way to license it because by clicking post you are saying "Blast this out to everyone who is listening", once it's on their server it's their data. You gave it to them. There is no way to protect data because the protocol quite literally does the opposite.

[–] Die4Ever@programming.dev 4 points 9 months ago (1 children)

cloning data in that way isn't legally different than what The Wayback Machine does for other websites, it doesn't mean a company can just ignore the legal license of the content just because they can get a copy of it

if the only concern was getting a copy of the data, then Reddit wouldn't be able to sell access to the data for $75mil or whatever, the AI company would just scrape the pages or pay the API fees directly, and then they could even store the data and serve it to other people as a mirror and make some money off of the content with ads too!

same thing with licenses on Git repos, you can't just clone it and do whatever you want with it, there are laws

[–] scrubbles@poptalk.scrubbles.tech 4 points 9 months ago (1 children)

The problem is that does another server have to listen to the license. You're on programming.dev. Say they obey your license that you put there. Well, say my server explicitely says "Do not send me things if you want it licensed. By sending me your data you waive all rights to your data and waive all licenses". I can put this in my legal area too. So, who wins then? That's different than git where if I clone it I'm pulling your data, you willingly pushed it to my server where I said what I would do with it.

ActivityPub sent it to me automatically, it's on my server, and on my server I say anything you give me has no license. To me, that's like the people who say FB has no right my data in a FB post.

The difference between Lemmy and Reddit is that it was Reddit's servers, they owned the data, and there was an agreement by signing up on who owned it - Reddit. Lemmy has no such agreement, and the data is not on a "Lemmy" server, it's stored on everyone's servers.

[–] Die4Ever@programming.dev 4 points 9 months ago* (last edited 8 months ago)

you make a good point about push vs pull, although things are only pushed if someone is subscribed (opt-ed in)

I think the proposal is for licenses to become part of the ActivityPub protocol, so all applications would retain the original license of the content, license would be a first class citizen

although without licenses this is functionally the same as email, I wonder how the laws work for that, for example I don't think you can just plagiarize something that someone wrote, quoted, or copy-pasted to you in an email if it's actually copyrighted content like from a book (aka content that had a license)

load more comments (6 replies)