this post was submitted on 17 Jan 2024
75 points (96.3% liked)

Programming.dev Meta

2472 readers
4 users here now

Welcome to the Programming.Dev meta community!

This is a community for discussing things about programming.dev itself. Things like announcements, site help posts, site questions, etc. are all welcome here.

Links

Credits

founded 1 year ago
MODERATORS
 

Do you keep access logs? How long do you keep them?

I imagine that if you ever got a request, you'd understandably just give in and hand out the data. Have you thought of a canary?

Thanks for all your work!

top 7 comments
sorted by: hot top controversial new old
[–] snowe@programming.dev 78 points 10 months ago* (last edited 10 months ago) (3 children)

I'm not in the business of collecting user data and don't really want to be. In regards to logs, we restart our containers every 6 hours and the logs are wiped at that time, so the furthest back logs I can actually find in our system are from an hour ago.

And nah, I wouldn't give in. There's no real reason to request that information, as accessing a url means absolutely nothing. I did so just now to verify things and the same could be argued by any real user (oh, I clicked on the link and didn't know what it was going to). I very much doubt the past 6 hours of logs would be useful anyway, as by the time I got the request the logs wouldn't matter anymore.

But, I'm still going to see if I can turn off logging for requests. I do not think we need them at all, and if we do, we can simply turn it on for a few minutes to get the info we need.

[–] onlinepersona@programming.dev 18 points 10 months ago (1 children)

Thanks for the clear answer and thank you for maintaining this instance 💖

CC BY-NC-SA 4.0

[–] snowe@programming.dev 12 points 10 months ago

Thank you for the good post. And at this point, most of the work is being done by Ategon and the other admins. I am mostly here for infrastructure support and general direction of the instance.

[–] aport@programming.dev 6 points 10 months ago

Based Lemmy admin

[–] Die4Ever@programming.dev 6 points 10 months ago* (last edited 10 months ago)

as accessing a url means absolutely nothing.

Hmmm yes I see Google's search indexer has accessed this URL, they should get sued!

[–] IphtashuFitz@lemmy.world 13 points 10 months ago (1 children)

The company I work for probably doesn’t see as much traffic as Reddit, but we provide services via the web in the US and roughly 15 other countries. We make use of Akamai for CDN, security, etc. and one of the things they do is provide us with raw logs of every request made to our sites. That generates a lot of data that we feed into Splunk for analysis, debugging, etc.

One of the nicer things Akamai does in their logs is to classify if they believe the request came from a bot, and if so then what bot it was. They are able to identify over 1000 individual bots, and can also detect traffic from new/unknown bots. There is a LOT of bot activity on the internet these days, and many originate from cloud providers like AWS, where it’s clear it’s a machine making the request and not a human.

If we had a legal request for logs I’d have to look at the data to see how to respond. If Akamai showed a lot of bot activity from consumer ISP IPs then I’d likely include that data in an effort to show that end users may be victims of botnets. But if bot activity was mostly originating from cloud providers etc. then I probably wouldn’t include it. Let the lawyers try to figure out from the raw data what traffic originated from humans vs bots.

[–] onlinepersona@programming.dev 3 points 10 months ago

Dunno if fediverse instances would be willing to get a lawyer to fight such requests. IMO, the best way to counter it is not collect such data in the first place, but you make a good point about bots. Honestly, I'm just curious what the maintainers will say. I might just start interacting with the fediverse over TOR and be done with it.

CC BY-NC-SA 4.0