this post was submitted on 21 Jul 2023
42 points (95.7% liked)

Selfhosted

52523 readers
549 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago
MODERATORS
 

Basically, I have read several statements addressing this topic. For example:

"If my server gets too big I will just close registrations"

"Server X got too big, so they closed registrations to manage the load"

While I do understand that this can help for small servers which don't have a big number of external users. How does this help with big and popular servers? Don't they have to serve requests from external users using their resources? For example, I might self host a server just for my account but I read all my content from lemmy.world. Am I not using their bandwidth and their resources anyway?

Bonus question: Does federating with other servers increase the resource usage of my server? What kind of metadata/data do I have to store from each server I federate with?

Thanks!

you are viewing a single comment's thread
view the rest of the comments
[–] Max_P@lemmy.max-p.me 9 points 2 years ago (1 children)

It reduces the load simply because your instance handles most of the traffic, particularly compute/database. Currently media still goes to the instance of the poster, but there's talks of also proxying and caching those locally, and CDNs like CloudFront/Cloudflare are a thing that can help a lot with that.

So lets say we have server A and B, both with a thousand users on them, totalling 2000 users. For the most part, A and B only have to handle their local thousand users, plus some extra traffic between them for federation. And assuming the users uses communities of both instances roughly equally, it also means that the load of hosting pictures is also spread out between the two instances.

Federating with other servers does add some load (and on theirs as well), because your instance is effectively ingesting all the remote communities' data that your users have subscribed to. But ingesting that once is still much less demanding than thousands of users all requesting the same data. Your instance acts as a cache layer.

ActivityPub is also a push model. Remote instances push content to your instance, you don't pull from them.

[–] aztlantic@lemmy.world 5 points 2 years ago (2 children)

This means that if user 1 from server A requests a post from server B, server A will cache that post. Then, if User 2 from Server A wants to see the same post they get the cached version instead of the remote instance pushing it to server A? Is this cache eternal (i.e it is never deleted from Server A) or is that something the spec doesn't address and it is up to each server owner?

[–] zanyhog33@lemmy.jcaks.net 5 points 2 years ago

It works a little differently to that. When someone posts on server B, that post and it's comments get blasted out to all subscribed servers. So server A will already have the post cached if someone is subscribed to that community. The cache in server A will update any time activity happens on server B.

[–] Max_P@lemmy.max-p.me 3 points 2 years ago* (last edited 2 years ago)

It's eternal yes, unless the admin manually purges it.

I also said cache for the sake of simplicity, it's technically not a cache. Every instance gets activity pushed to them pretty much in realtime, and stores a copy of everything. Posts, comments, votes, even moderation actions. So it's more like a massively distributed multi-primary eventually-maybe-consistent database than a cache.

Apart from the initial preview that fetches the last 20 posts and no comments, everything is populated purely through ActivityPub messages being pushed to every subscribed instance, in mostly realtime.

So user 1&2 never request A to go get a post from B. They simply request a post that's already on A that's a copy that's been pushed by B and may have been published by C. B is only involved if a user from A comments on the post, then A will push that comment to B which will then push it to C and D and others.

So 10,000 users viewing a post on A is entirely handled by A, and 20,000 users on C viewing the same post is entirely handled by C. B could have zero users and it would still work perfectly. Similarly, A could have zero communities and rely entirely on B to manage the communities. B would have very little work to do despite having a total of 30,000 users viewing its posts. In fact, B could even go down and A and C would still serve the post and even take comments and votes, they just will be synchronized back when B comes back up and A&C would temporarily have a slightly different view of the same post.

So the more instances, the more distributed everything is. And that's why instances that becomes too large can simply shut down registrations or even kick its users out. It could become B in this example.