this post was submitted on 24 Sep 2025
133 points (98.5% liked)

Fediverse

37510 readers
144 users here now

A community to talk about the Fediverse and all it's related services using ActivityPub (Mastodon, Lemmy, KBin, etc).

If you wanted to get help with moderating your own community then head over to !moderators@lemmy.world!

Rules

Learn more at these websites: Join The Fediverse Wiki, Fediverse.info, Wikipedia Page, The Federation Info (Stats), FediDB (Stats), Sub Rehab (Reddit Migration)

founded 2 years ago
MODERATORS
 

During some work with Tess, I'd notice that my test instance was running horribly slow. The CPU was spiking, Postgres was not happy and using pretty much all the available compute.

Investigating, I found the culprit to be some crawler or possibly malicious actor sending a massive number of unscoped requests to /api/v3/comment/list. What I mean by "unscoped" is without limiting it to a post ID. I'm not sure if this is a bug in Lemmy or there's a legit use for just fetching only comments outside of a post, but I digress as that's another discussion.

After disallowing unscoped requests to the comment list endpoint (see mitigation further down), no more issue.

The kicker seemed to be that this bot / jackass was searching by "Old" and was requesting thousands of pages deep.

Requests looked like this: GET /api/v3/comment/list?limit=50&sort=Old&page=16413

Since I shutdown Dubvee officially, I'm not keeping logs as long as I used to, but I saw other page numbers in the access log, but they were all above 10,000. From the logs I have available, the requests seem to be coming from these 3 IP addresses, but I have insufficient data to confirm this is all of them (probably isn't).

  • 134.19.178.167
  • 213.152.162.5
  • 134.19.179.211

Log Excerpt

Note that I log the query string as well as the URI. I've run a custom Nginx setup for so long, I actually don't recall if the query string is logged by default or not. If you're not logging the query string, you can still look for the 3 (known) IPs above making requests to /api/v3/comment/list and see if entries similar to these show up.

2025-09-21T14:31:59-04:00 {LB_NAME}: dubvee.org, https, {LB_IP}, 134.19.179.211, - , NL, Amsterdam, North Holland, 52.37590, 4.89750, TLSv1.3, TLS_AES_256_GCM_SHA384, "GET", "/api/v3/comment/list", "limit=50&sort=Old&page=16413"
2025-09-21T14:32:00-04:00 {LB_NAME}: dubvee.org, https, {LB_IP}, 134.19.179.211, - , NL, Amsterdam, North Holland, 52.37590, 4.89750, TLSv1.3, TLS_AES_256_GCM_SHA384, "GET", "/api/v3/comment/list", "limit=50&sort=Old&page=16413"
2025-09-21T14:32:01-04:00 {LB_NAME}: dubvee.org, https, {LB_IP}, 134.19.179.211, - , NL, Amsterdam, North Holland, 52.37590, 4.89750, TLSv1.3, TLS_AES_256_GCM_SHA384, "GET", "/api/v3/comment/list", "limit=50&sort=Old&page=16413"
2025-09-21T14:32:01-04:00 {LB_NAME}: dubvee.org, https, {LB_IP}, 134.19.179.211, - , NL, Amsterdam, North Holland, 52.37590, 4.89750, TLSv1.3, TLS_AES_256_GCM_SHA384, "GET", "/api/v3/comment/list", "limit=50&sort=Old&page=16413"
2025-09-21T14:32:12-04:00 {LB_NAME}: dubvee.org, https, {LB_IP}, 134.19.179.211, - , NL, Amsterdam, North Holland, 52.37590, 4.89750, TLSv1.3, TLS_AES_256_GCM_SHA384, "GET", "/api/v3/comment/list", "limit=50&sort=Old&page=16413"
2025-09-21T14:32:13-04:00 {LB_NAME}: dubvee.org, https, {LB_IP}, 134.19.179.211, - , NL, Amsterdam, North Holland, 52.37590, 4.89750, TLSv1.3, TLS_AES_256_GCM_SHA384, "GET", "/api/v3/comment/list", "limit=50&sort=Old&page=16413"
2025-09-21T14:32:13-04:00 {LB_NAME}: dubvee.org, https, {LB_IP}, 134.19.179.211, - , NL, Amsterdam, North Holland, 52.37590, 4.89750, TLSv1.3, TLS_AES_256_GCM_SHA384, "GET", "/api/v3/comment/list", "limit=50&sort=Old&page=16413"
2025-09-21T14:32:13-04:00 {LB_NAME}: dubvee.org, https, {LB_IP}, 134.19.179.211, - , NL, Amsterdam, North Holland, 52.37590, 4.89750, TLSv1.3, TLS_AES_256_GCM_SHA384, "GET", "/api/v3/comment/list", "limit=50&sort=Old&page=16413"

Mitigation:

First, I blocked the IPs making these requests, but they would come back from a different one. Finally, I implemented a more robust solution.

My final mitigation was to simply reject requests to /api/v3/comment/list that did not have a post ID in the query parameters. I did this by creating a dedicated location block in Nginx that is an exact match for /api/v3/comment/list and doing the checks there.

I could probably add another check to see if the page number is beyond a reasonable number, but since I'm not sure what, if any, clients utilize this, I'm content just blocking unscoped comment list requests entirely. If you have more info / better suggestion, leave it in the comments.

# Basically an and/or for has post_id or has saved_only
map $has_post_id:$has_saved_only $comment_list_invalid{
  "1:0"	1;
  "0:1" 1;
  "1:1" 1;
  default 0;
}

server {

...

location = /api/v3/comment/list {

  # You'll need the standard proxy_pass headers such as Host, etc. I load those from an include file.
  include conf.d/includes/http/server/location/proxy.conf;

  # Create a variable to hold a 0/1 state
  set $has_post_id 0;

  # If the URL query string contains 'post_id' set the variable to 1
  if ($arg_post_id) {
    set $has_post_id  1;
  }
  if ($arg_saved_only) {
    set $has_saved_only 1;
  }

  # If the comment_list_invalid map resolves to 0, "send" a 444 resposne
  # 444 is an Nginx-specific return code that immediately closes the connection 
  # and wastes no further resources on the request
  if ($comment_list_invalid = 0) {
    return 444;
  }

  # Otherwise, proxy pass to the API as normal 
  # (replace this with whatever your upstream name is for the Lemmy API
  proxy_pass "http://lemmy-be/";
}
you are viewing a single comment's thread
view the rest of the comments
[โ€“] ademir@lemmy.eco.br 9 points 1 month ago* (last edited 1 month ago) (2 children)

For Cloudflare users:
Security Rules:

(http.request.uri.path eq "/api/v3/comment/list" and not http.request.uri.query contains "post_id")

For Caddy users:

  # >>> Specific handler for /api/v3/comment/list with post_id check
  handle_path /api/v3/comment/list {
    # Check if the 'post_id' query parameter is present
    @hasPostId {
      query post_id=*
    }
    # Abort the connection if the parameter is missing
    handle @hasPostId {
      reverse_proxy http://localhost:8536/
    }
    # This handles all requests that did not match @hasPostId
    abort
  }