this post was submitted on 09 Jan 2025
56 points (93.8% liked)

Selfhosted

59923 readers
691 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam.

  3. Posts here are to be centered around self-hosting. Please ensure it is clear in your post how it relates to self-hosting.

  4. Don't duplicate the full text of your blog or git here. Just post the link for folks to click.

  5. Submission headline should match the article title.

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 3 years ago
MODERATORS
 

Now that we know AI bots will ignore robots.txt and churn residential IP addresses to scrape websites, does anyone know of a method to block them that doesn't entail handing over your website to Cloudflare?

you are viewing a single comment's thread
view the rest of the comments
[–] Mojeek@lemmy.ml 5 points 1 year ago (1 children)

why MojeekBot? we're a search engine

[–] r00ty@kbin.life 2 points 1 year ago (1 children)

Hmm, I took an original list and added to it. You got a website I can check? If so I'll happily remove. I don't mind slow web crawlers at all.

[–] Mojeek@lemmy.ml 4 points 1 year ago (1 children)

if you have any recall on where the list came from that's also useful to us. Here's our Bot page: https://www.mojeek.com/bot.html and some external info: https://en.wikipedia.org/wiki/Mojeek

[–] r00ty@kbin.life 3 points 1 year ago (1 children)

Didn't have the link to hand. But a search turned this one up: https://reggiodigital.com/blog/nginx-rule-blocking-bad-bots/ it looks to be the same list, and you can see the ones I've added to the end of that list.

[–] Mojeek@lemmy.ml 2 points 1 year ago

thanks a lot for providing this 🙏