this post was submitted on 14 Dec 2023
326 points (96.0% liked)

Fediverse

28251 readers
580 users here now

A community to talk about the Fediverse and all it's related services using ActivityPub (Mastodon, Lemmy, KBin, etc).

If you wanted to get help with moderating your own community then head over to !moderators@lemmy.world!

Rules

Learn more at these websites: Join The Fediverse Wiki, Fediverse.info, Wikipedia Page, The Federation Info (Stats), FediDB (Stats), Sub Rehab (Reddit Migration), Search Lemmy

founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] jeffhykin@lemm.ee 15 points 10 months ago* (last edited 10 months ago) (3 children)

I think we can give facebook/threads the bad end of the bargin IF we have a data protections.

You know how powerful copy-left was for open source? I think we can do the same for Lemmy servers. We can have users agree (formally) that the data on a particular server cannot be used for training llvm's advertisements, marketing profiles, etc, and make it legally binding.

Even if we don't federate with them, Meta can still harvest the data so we should add these protections regardless. Maybe there is already something like this and I'm just unaware of it.

If we do add these protections and we ensure that the largest instance (e.g. Lemmy.world) is community controlled, I think it could work well for bringing more content to Lemmy.

[–] pennomi@lemmy.world 9 points 10 months ago

Yep, on a public forum like this we lose very little on privacy by federating with them. What we do stand to lose is comment and post quality, but that’s trivial to fix by simply blocking threads on a personal level.

[–] AustralianSimon@lemmy.world 4 points 10 months ago (1 children)

You can scrape Lemmy instances for training data without even running an instance.

[–] jeffhykin@lemm.ee 0 points 10 months ago* (last edited 10 months ago) (1 children)

Yeah, sorry if I'm not great at communicating. That's exactly what I'm trying to point out when I said:

Even if we don't federate with them, Meta can still harvest the data so we should add these protections regardless.

[–] AustralianSimon@lemmy.world 1 points 10 months ago (1 children)

That's the thing, anything public is fair game. This is why Reddit is ruining their API.

[–] jeffhykin@lemm.ee 0 points 10 months ago (1 children)

It's not fair game for for-profit bussinesses training LLM's. That's part of why Reddit made the move; so that companies would need to pay Reddit for access to the data for legally training models

[–] AustralianSimon@lemmy.world 1 points 10 months ago (1 children)

They changed the terms and made the API pay to use for large volumes of use. People using it to train models have already pillaged what they need and you can get the data prior to APIgeddon elsewhere.

[–] jeffhykin@lemm.ee 0 points 10 months ago* (last edited 10 months ago) (1 children)

Sure, but it's still true that there are legal protections we can add that make it not fair game for Lemmy. At best it would be unfair-game (illegal scraping of Lemmy)

[–] AustralianSimon@lemmy.world 1 points 10 months ago (1 children)

A rule for one Lemmy or even the Lemmy app doesn't mean same rule applies across ActivityPub Federation, if your data federated to my instance, it's mine too.

[–] jeffhykin@lemm.ee 1 points 10 months ago* (last edited 10 months ago) (1 children)

it can apply across all of them, for example that's how copy-left works

[–] AustralianSimon@lemmy.world 1 points 10 months ago (1 children)
[–] jeffhykin@lemm.ee 1 points 10 months ago* (last edited 10 months ago) (1 children)

What? I'm saying every federated copy must legally must have the usage restrictions. Just cause it's copied doesn't mean it can go into a for-profit LLM.

[–] AustralianSimon@lemmy.world 1 points 10 months ago (1 children)

There is no licensing in the protocol so anything you put out there is free.

https://www.w3.org/TR/2018/REC-activitypub-20180123/

[–] jeffhykin@lemm.ee 1 points 10 months ago* (last edited 10 months ago)

If we serve licensed content over ssh or HTTPS it's still licensed. Protocols don't change the legal requirements of the data. Warner Bros will still sue if one of their movies is hosted on a server using the activity pub protocol.

[–] Masimatutu@lemm.ee 3 points 10 months ago (1 children)

What does lemmy.world being the biggest have to do with any of this?

[–] jeffhykin@lemm.ee 0 points 10 months ago* (last edited 10 months ago) (1 children)

As opposed to a facebook-controlled server being the top search result for Lemmy.

I see why that's confusing so I edited my comment just now

[–] Masimatutu@lemm.ee 2 points 10 months ago

I think this is the wrong take. If we want Lemmy to be truly community-controlled, we need many small servers, as opposed to the current situation of one server controlling half the userbase. Also, which server is Facebook-controlled? Lemmy.world is in the minority by federating with Threads.