this post was submitted on 01 Jul 2023
107 points (97.3% liked)

Selfhosted

40246 readers
845 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 1 year ago
MODERATORS
 

I'd be really keen to host a lemmy instance but just wondering with GDPR and everything, if there is anything else to consider outside of the technical setup and provisioning of hardware?

Lemmy is storing users data so is there any requirement to do anything GDPR wise?

Hope this is the right place for this - But seen a lot of posts interested in hosting their own lemmy instance, and this is an extension of that

you are viewing a single comment's thread
view the rest of the comments
[–] Max_P@lemmy.max-p.me 39 points 1 year ago (7 children)

I'd put a legal blob in the Legal section clearly outlining the nature of the fediverse and making it clear to the user that really deleting stuff from Lemmy is near impossible because every instance has a copy of it. That you'll happily comply and purge the user's data upon request but that it will still be cached on every other server.

I'd be interested to see what lawyers have to say about it. Technically the data sharing is absolutely required by the protocol so it might be okay with the GDPR, but it's also possible that as worded it can't possibly be GDPR compliant. It was designed with big companies like Google, Meta and big advertisers in mind, and didn't really account for decentralized services like the fediverse...

[–] danieljackson@lemmy.world 32 points 1 year ago (1 children)

The GDPR doesn't apply only to services hosted in the EU, but any services handling the data of an EU citizen.

This is why some news outlets in the US just decided to block EU users all together, out of laziness.

IANAL, but the GDPR doesn't cover pseudonymous data. Actually the GDPR encourages data processors (= services) to use pseudomization.

Personally identifiable information are IPs, email addresses, street address, name, date of birth, ... Lemmy only collect IPs and email addresses. And these are not shared between instances.

Whether the service is hosted in the EU or not, as long as it serves EU users, lemmy should provide a way to delete emails and ip information in a self serving way. (maybe by deleting the account) In the mean time, instances admins have to fulfil requests to delete emails/ips of EU citizens from the database.

[–] b3nsn0w@pricefield.org 3 points 1 year ago

I'm gonna preface this: IANAL either.

There are also different legal bases for different kinds of data processing. For example, I'm pretty sure ensuring your site's security counts as legitimate interest, and it's pretty common that IP addresses are stored and processed as such. You don't need to remove someone's IP from your access logs just because they asked for it, because your interest in keeping your site secure for both yourself and everyone else outweighs their interest in the privacy of their data. Legitimate interest is the fuzziest of the six legal bases and it doesn't help that advertisers have started attempting to qualify their BS as "legitimate interest" especially in consent forms (if they need your consent it's not legitimate interest, it's user consent, and they really should stop lying) but it still exists to keep things viable.

As a rule of thumb, if you're storing data to provide a service you need to export or delete that data upon request, and if you're doing anything over what's strictly necessary for providing your service you need to ask the user about it. And you're right, this applies to anyone whose instance is used by EU citizens.

Also, pseudonymous data still counts as personal data as long as the pseudonym can be linked back to personally identifiable information. You need to sever this link to comply with a deletion request.

[–] chiisana@lemmy.chiisana.net 13 points 1 year ago

I am not a lawyer and definitely not anyone’s lawyer providing legal advices, but I’ve done a little bit of work around implementing GDPR compliance at my jobby job. My understanding is that you must inform users when you’re sending their data out to third party processors, and they, too, must be GDPR complaint.

So if your instance is sending information that is covered under GDPR out to other instances, you much call out those instances as data processors, and ensure they’re complaint before you add them. When you add one, I think you’re also supposed to inform users that you’re adding a new data processor via some form of notice addressed to them. Furthermore, at time of deletion, you’d also need to inform your data processors of the request, such that their compliance workflow can be followed.

In my mind, strictly speaking, what Lemmy is doing could work if the “cluster” of GDPR compliant instances doesn’t federate out to the broader non-GDPR compliant instances. So, lots of manual maintaining the allowed federation instances, each time you add a new instance, you’d then need to inform your users… once you receive a deletion request, you’d need to use the ban with purge option to purge everything on your instance, and pass that on to all federated instances. The key distinction here is ensuring your federated instances honours your purge request, which is hard to verify.

The end result is that you’d essentially be creating your own bubble of the fediverse isolated from the rest of the fediverse… which is not an ideal outcome but that’s what happens when you let regulators decide what to do on things they don’t understand…

[–] Max_P@lemmy.max-p.me 7 points 1 year ago* (last edited 1 year ago) (3 children)

Actually I wonder if the end result would end up essentially being, you can only federate with other GDPR compliant instances that you trust will respect the GDPR and honor federated data delete requests.

The core of the issue is that just by the virtue of running, an instance collects a stupid amount of data. I was baffled at how many user accounts my instance had discovered mere hours after starting it up.

Edit: row counts after just a week of running my private instance with only 3 users:

The profiling potential is scary, so users should be really careful with basically every interaction on the Fediverse, including votes. I bet the feds are having a field day monitoring what's going on on exploding-heads and lemmygrad.

[–] poVoq@slrpnk.net 7 points 1 year ago (2 children)

IANAL but no, as instances do not share "personal data". There is a misconception that GDPR deletion requests apply to all data created by a user, but to my understanding it only applies to "personal data" as defined here: https://commission.europa.eu/law/law-topic/data-protection/reform/what-personal-data_en

[–] Max_P@lemmy.max-p.me 2 points 1 year ago (2 children)

Under GDPR, any piece of potentially identifying information is considered personal data. I had GDPR training at work. Under the GDPR it's not even possible to count unique visitors to your website because you'd have to keep track of some identifier even if just IP address and User-Agent, even if it's entirely client side. You still have to get consent for this.

Even just community subscriptions is plenty of data to make a rather comprehensive profile of the user's interests, and if you throw in votes it quickly becomes scary.

This is everything you upvoted:

[–] poVoq@slrpnk.net 5 points 1 year ago* (last edited 1 year ago) (1 children)

Obviously IP addresses are personal data, but those are not shared to other instances.

You could probably argue that the federated ID is personal data, but I am not sure as it might also count as only an internal identifier required for operation. IANAL but I don't think votes can be considered personal data under the GDPR.

[–] chiisana@lemmy.chiisana.net 1 points 1 year ago (2 children)

Question boils down to where is the boundary. Does an alias of your choosing, which uniquely identifies you across the fediverse personally identifiable? I think we all would say yes. Does then actions linked to that alias constitutes as personally identifiable? Well, in absence of the correlation of the ID, it is still technically possible to map out who this user is and what their interests and preferences are, so maybe yes? That’s a hard grey area to determine IMO.

[–] poVoq@slrpnk.net 2 points 1 year ago

Indeed, but I think email addresses for email providers (but not everyone else) are handled differently by the GDPR as they are necessary for providing the email service. I think this is similar to how functional cookies do not require consent under the GDPR if they are only used to keep you logged into the site etc.

[–] tk338@lemmy.one 1 points 1 year ago (1 children)

I think as @danieljackson@lemmy.world commented slightly higher up, this might be considered pseudonymised data? The link he provided suggested it was considered personally indentifying information - I'm (as per my question) definitely no expert in this though

[–] danieljackson@lemmy.world 4 points 1 year ago (1 children)

The link I provided says that pseudonymous data can be used to hide personalized data.

If you are a DPO, you can see the appeal and benefits of pseudonymization. It makes data identifiable if needed, but inaccessible to unauthorized users and allows data processors and data controllers to lower the risk of a potential data breach and safeguard personal data.

GDPR requires you to take all appropriate technical and organizational measures to protect personal data, and pseudonymization can be an appropriate method of choice if you want to keep the data utility.

The owner of lemmy.one can use tk338@lemmy.one to map it to an IP and/or email address. This becomes now personally identifiable data. But other instance owners can't map it to any personalized data, so it is basically "anonymized data" for them.

You just have to provide a way to either

  • To delete personally identifiable data
  • Unlink the personally identifiable data from the pseudonymized data on your local instance.

Disclaimer, IANAL, YMMV, yaddy, yadda,...

[–] tk338@lemmy.one 1 points 1 year ago

Understood, missed that subtelty. The fact emails aren't actually shared makes it very GDPR "friendly"

[–] vegetaaaaaaa@lemmy.world 1 points 1 year ago (1 children)

This is everything you upvoted:

How does that work? As the admin of the lemmy.max-p.me you have access to your server's db which contains a replica of the db of all servers you receive federation from, including detailed per-user upvotes/downvotes? Correct?

[–] Max_P@lemmy.max-p.me 2 points 1 year ago* (last edited 1 year ago) (1 children)

Yeah pretty much, although not entirely. I only get pushed copies of the intersection between the communities my instance tracks and the victim's, and only from the time my server started federating those. I guess I could make a bot account that subscribes to every possible Lemmy communities so that I do get a copy. I could also patch up the backend to ignore any deletion requests and stash up everyone's deleted posts and even go fetch linked images and store them forever.

It's not really a secret though. Some users in another thread were shocked to learn that kbin does publicly display that information. For example, picking the first post on kbin.social: https://kbin.social/m/tech/t/124303/Bluesky-temporarily-halts-sign-ups-because-so-many-people-are-joining/votes/up

Essentially, it's extremely public, so one's gotta be careful about every single interaction on here.

I only did this for example's sake, I respect people's privacy and have no intention of running a hostile instance. But point being, anyone can rather easily.

[–] vegetaaaaaaa@lemmy.world 1 points 1 year ago

Interesting - I had the feeling this was how the federation mechanism worked, I don't see how it could work without sacrificing privacy.

So a "bad" actor could just spin up their own instance, federate with a huge amount of other instances (I don't think other instances have a say in this, except if they explicitly, manually blacklist the "bad" instance?), and start profiling users based on their votes.

The potential for global surveillance is enormous. But I can also see it being useful to detect and fight bot farms, spam, brigading and other bad stuff that has plagued Reddit for quite some time.

Lemmy could do a better job at informing users that basically everything you do here is public (including votes). On Kbin the /votes/up page makes it clear at least (I like that even comments have a /votes/up page).

[–] Thorosofbeer@lemmy.world 3 points 1 year ago

I believe this is probably what will happen if this ever becomes a big issue. GDPR was never intended to be navigable for anything except giant proprietary blob tech companies.

[–] Thorosofbeer@lemmy.world 1 points 1 year ago

I believe this is probably what will happen if this ever becomes a big issue. GDPR was never intended to be navigable for anything except giant proprietary blob tech companies.

[–] tk338@lemmy.one 2 points 1 year ago

That definitely seems like it might be along the right lines - Though GDPR (rightly so) was designed to leave the power in the hands of the user/customer, you're right, it doesn't account for things like the fediverse. I wonder if something else put in the legal section would actually cover it

[–] tr00st@lemmy.tr00st.co.uk 2 points 1 year ago

The protocol would seem unlikely to satisfy the concept of "necessary". It's entirely possible for the protocol to be impossible to implement whilst not complying with GDPR. Might require the development of something more sharded - data pulling in real time, etc.

[–] Thorosofbeer@lemmy.world -1 points 1 year ago (2 children)

I believe this is probably what will happen if this ever becomes a big issue. GDPR was never intended to be navigable for anything except giant proprietary blob tech companies.

[–] danieljackson@lemmy.world 6 points 1 year ago (1 children)

As I said in another comment, the GDPR protects people. And the GDPR only applies to personnaly identifiable data (IPs, email addresses, street address, legal name, date of birh...) Lemmy only collect emails and IPs, and do not share them between instances. So it's very easy to comply to the GDPR as long as you don't do anything shady.

The EU has a marketing issue. They tried to pass legislation to prevent companies to collect data. But instead, company displayed a popup, kept collecting data, and blamed it on the EU. Everytime I see a popup, I blame ruthless data collection.

Actually, Lemmy is most likely violatiing the California Consumer Privacy Act, which, as opposed to the GPDR, gives the right to update/delete any data generated by the user, not only personally identifiable information.

[–] bilb@lem.monster 1 points 1 year ago (1 children)

You don't see a lot of chatter about the CCPA, I wonder why.

[–] Revan343@lemmy.ca 1 points 1 year ago

Probably because it's wholly unrealistic