this post was submitted on 09 Jan 2025
56 points (93.8% liked)

Selfhosted

59923 readers
650 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam.

  3. Posts here are to be centered around self-hosting. Please ensure it is clear in your post how it relates to self-hosting.

  4. Don't duplicate the full text of your blog or git here. Just post the link for folks to click.

  5. Submission headline should match the article title.

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 3 years ago
MODERATORS
 

Now that we know AI bots will ignore robots.txt and churn residential IP addresses to scrape websites, does anyone know of a method to block them that doesn't entail handing over your website to Cloudflare?

you are viewing a single comment's thread
view the rest of the comments
[โ€“] r00ty@kbin.life 3 points 1 year ago (1 children)

Didn't have the link to hand. But a search turned this one up: https://reggiodigital.com/blog/nginx-rule-blocking-bad-bots/ it looks to be the same list, and you can see the ones I've added to the end of that list.

[โ€“] Mojeek@lemmy.ml 2 points 1 year ago

thanks a lot for providing this ๐Ÿ™