this post was submitted on 21 Dec 2023
253 points (97.7% liked)
Fediverse
28395 readers
734 users here now
A community to talk about the Fediverse and all it's related services using ActivityPub (Mastodon, Lemmy, KBin, etc).
If you wanted to get help with moderating your own community then head over to !moderators@lemmy.world!
Rules
- Posts must be on topic.
- Be respectful of others.
- Cite the sources used for graphs and other statistics.
- Follow the general Lemmy.world rules.
Learn more at these websites: Join The Fediverse Wiki, Fediverse.info, Wikipedia Page, The Federation Info (Stats), FediDB (Stats), Sub Rehab (Reddit Migration), Search Lemmy
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Reading a post and having a bot thrashing a server indexing everything are 2 different things. If a user used the site like that they would be throttled and if repeated afterwards, banned. It is also one thing to read/interact with a site as that adds value to the site as a whole. A bot that just mass hits links cataloging everything is just a strain on the server an Admin needs to support, with no upside for the instance, as it's a bot ingesting and no real interaction actually took place.
This is a completely separate argument and one that we already have mechanisms for. Servers can use status codes and headers to warn about rate limits and block offenders.
A search index adds value as well; that's why this keeps coming up. And, again, there are existing mechanisms to handle this. A
robots.txt
file can indicate you don't want to be crawled and offenders can be IP blockedShould a dedicated search not use/index ActivityPub instead of the html interface?
If so, instances can simply defederate from search engine instances. So the point you are trying to make still holds.