Programming.dev Meta

2691 readers

3 users here now

Welcome to the Programming.Dev meta community!

This is a community for discussing things about programming.dev itself. Things like announcements, site help posts, site questions, etc. are all welcome here.

Credits

Icon base by Lorc under CC BY 3.0 with modifications to add a gradient

founded 2 years ago

MODERATORS

snowe@programming.dev

Ategon@programming.dev

Server migration has been completed (mastodon.social)

submitted 2 days ago by snowe@programming.dev to c/meta@programming.dev

22 comments fedilink hide all child comments

Hi all,

First off, I want to apologize for all the server instability. We long ago outgrew our instance size, but I was unable to afford a larger node on our provider, Vultr. We were maxing out every part of the server whenever any even slightly significant number of users were on the fediverse.

I've finally found the time to migrate us to a new provider, which allows us to step up to a much more powerful configuration. That migration has now been completed. I actually intended to post about the downtime on this community this morning before beginning, but when I went to do so, the server was already down and struggling to come back up. So I went ahead with the migration.

Server before 4cpu/16GB/400GB NVMe Server after 8cpu/64GB/1Tb NVMe

Please update this thread if you are seeing any issues around any part of the site. This means duplicate threads, things that aren't federating, inability to load profiles, etc.

There is still database tuning that needs to occur, so you should expect some downtime here and there, but otherwise the instance should be much more stable from now on.

During this process I also improved several other aspects of operating the server, so any 'actual' downtime should be accompanied by proper maintenance pages (that hopefully don't get wiped by ansible anymore), so that will also be a good indicator of legitimate maintenance.

Once again, I really apologize for all of the downtime. It's very frustrating to use a server that operates like this, I understand.

snowe

you are viewing a single comment's thread
view the rest of the comments

[–] snowe@programming.dev 12 points 2 days ago (1 children)

programming.dev is the 9th largest lemmy server. https://join-lemmy.org/instances

That stat was probably that low due to the server being down for around 90% of the last two weeks. If you look now it's at 220 and it will continue to go up.

On top of that, every action on every server that is federated is relayed to every instance. So all of lemmy.world's activity is still relayed to us and we have to handle it. Same for the other servers.

On top of that we also operate many other services:

bytes.programming.dev
git.programming.dev
blocks.programming.dev
etc (there's a lot)

But really it was mostly just postgres thrashing on all the requests. Here's a look at our Cloudflare dashboard for number of requests:

Yes this should be handle-able by a server that small (think actor paradigm), but I was unable to tune postgres to get it to that point as I'm not great at database stuff. I'm sure a DBA would have done a better job. I will note that some of the queries being used in the lemmy code are very badly optimized and were taking 20+ seconds to run each time, locking up the instance. With that on top of some other badly optimized selects for things like reading comments (which would take like 7s mean), there wasn't much I could do.

With the cost difference it was well worth it to just upgrade to a cheaper better server all around.

[–] fuzzzerd@programming.dev 4 points 1 day ago (1 children)

For all of the attention in the early days about Lemmy being rust based and thus focused on performance, the database seems to be the main bottleneck neck and from anecdotal monitoring of the other admins complaints I'd say that seems true.

Seems like some design issues lead to heavy database usage and its going to be really hard to optimize away from that.

I don't really have a better idea, just acknowledging even a small instance has to scale disproportionally to its size when the rest of the network grows and that's heavy on the database specifically.

[–] BB_C@programming.dev 4 points 1 day ago

The push-based ActivityPub (apub) federation itself is bad design anyway. Something pull-based with aggregation and well-defined synchronisation would have been much better.

There are ideas beyond that. For example, complete separation between content and moderation. But that would diverge from the decentralized family of protocols apub belongs to, and may not attract a lot of users and traffic. And those who care and don't mind smaller networks prefer fully distributed solutions anyway.

Programming.dev Meta

Links

Credits