this post was submitted on 15 Oct 2025

65 points (100.0% liked)

Programming.dev Meta

2691 readers

35 users here now

Welcome to the Programming.Dev meta community!

This is a community for discussing things about programming.dev itself. Things like announcements, site help posts, site questions, etc. are all welcome here.

Credits

Icon base by Lorc under CC BY 3.0 with modifications to add a gradient

founded 2 years ago

MODERATORS

snowe@programming.dev

Ategon@programming.dev

Server migration has been completed (mastodon.social)

submitted 1 day ago by snowe@programming.dev to c/meta@programming.dev

21 comments fedilink hide all child comments

Hi all,

First off, I want to apologize for all the server instability. We long ago outgrew our instance size, but I was unable to afford a larger node on our provider, Vultr. We were maxing out every part of the server whenever any even slightly significant number of users were on the fediverse.

I've finally found the time to migrate us to a new provider, which allows us to step up to a much more powerful configuration. That migration has now been completed. I actually intended to post about the downtime on this community this morning before beginning, but when I went to do so, the server was already down and struggling to come back up. So I went ahead with the migration.

Server before 4cpu/16GB/400GB NVMe Server after 8cpu/64GB/1Tb NVMe

Please update this thread if you are seeing any issues around any part of the site. This means duplicate threads, things that aren't federating, inability to load profiles, etc.

There is still database tuning that needs to occur, so you should expect some downtime here and there, but otherwise the instance should be much more stable from now on.

During this process I also improved several other aspects of operating the server, so any 'actual' downtime should be accompanied by proper maintenance pages (that hopefully don't get wiped by ansible anymore), so that will also be a good indicator of legitimate maintenance.

Once again, I really apologize for all of the downtime. It's very frustrating to use a server that operates like this, I understand.

snowe

top 21 comments

sorted by: hot top controversial new old

[–] kogasa@programming.dev 6 points 1 day ago

Hetzner is a good provider. Congrats on the migration and good luck.

[–] Kissaki@programming.dev 5 points 1 day ago* (last edited 1 day ago) (1 children)

When programming.dev became practically unusable most of the time in the last several days I was considering moving elsewhere. No post or announcement even acknowledging the issue here in meta didn't make me hopeful in the issues subsiding at all. (I've experienced instance death before on feddit.de.)

Great to see this development.

Thank you for your continued work and efforts! 👍

[–] fuzzzerd@programming.dev 2 points 1 day ago

Felt the same way, and when I had time and thoughts to post the server was down. Glad to see it back up with an upgrade.

[–] tatterdemalion@programming.dev 11 points 1 day ago (1 children)

Very happy to hear this. I was noticing frequent slowness recently. I never really got to the point of considering leaving because I'm not on here that much anyway. But I do sponsor your github in a small way. I hope the sponsorships cover at least the hosting cost.

You might want to update your Github sponsor page as it sounds like some of that info is out of date after this migration.

[–] snowe@programming.dev 8 points 1 day ago

it is helping, thank you for the sponsorship. I should have migrated a long time ago because the costs really were adding up. I'll update my sponsor page after I have a fresh month of data for the bucket costs (which are still on Vultr) and the new server costs (which hopefully should be static). Thanks for the suggestion!

[–] irelephant@programming.dev 1 points 1 day ago

Great!

[–] ruffsl@programming.dev 9 points 1 day ago (1 children)

Thank you for all the work that goes into maintaining this instance!

[–] snowe@programming.dev 6 points 1 day ago

and thank you for being a great contributor to the community! this site would be nothing without all of you!

[–] mark@programming.dev 6 points 1 day ago

Thanks, boss. You are a good man. 🕺

[–] Scoopta@programming.dev 7 points 1 day ago (1 children)

Out of curiosity what provider did you guys move to that gives better bang for the buck than vultr?

[–] snowe@programming.dev 14 points 1 day ago

Hetzner. Honestly every provider was cheaper. I literally didn't find a single provider that was even close to as expensive as Vultr. You can look at Vultr's deploy page here (might need to be logged in for that). For 16GB of RAM on any product, the minimum cost is $80 a month. We were paying $120+.

It's honestly crazy how expensive Vultr is. The servers might have better processors, didn't really check that, but all our performance depends on RAM and cores, so none of that really matters.

Also was able to get 64GB of ECC RAM on Hetzner. No clue if Vultr provided that, but they don't list it anywhere.

Providers I looked at:

Scaleway
Hetzner
Contabo
Netcup
OVH
Space Hosting
I think one more, but can't remember it right now.

[–] targetx@programming.dev 5 points 1 day ago

Thanks, it seems a lot faster!

[–] popcar2@programming.dev 5 points 1 day ago

Fantastic, thanks! I might be spending more time on Lemmy after all.

[–] bitcrafter@programming.dev 4 points 1 day ago (1 children)

Oh, cool, to be honest I was actually in the process of transitioning to another instance due to the instability, but it sounds like I may not need to!

Thanks a lot!

[–] snowe@programming.dev 5 points 1 day ago

sorry for all the trouble. I would understand it if you still do, as I haven't been the best operator.

[–] ISO@lemmy.zip 4 points 1 day ago (1 children)

Good news. Thank you.

And it's good to hear that it wasn't something nefarious messing with the instance.

[–] snowe@programming.dev 5 points 1 day ago

Oh we are getting attacked constantly, that definitely didn't help, but a large portion of it was just thrashing from postgres not getting enough memory.

[–] entwine@programming.dev 3 points 1 day ago* (last edited 1 day ago) (1 children)

Sidebar stats say this instance gets 23 users/day, which seems absolutely tiny and within the capabilities of 4c/16gb cloud instance.

We were maxing out every part of the server whenever any even slightly significant number of users were on the fediverse.

Idk anything about how lemmy/fediverse works, but does that mean tiny instances like this get hit when the rest of the network is experiencing high load? Seems problematic.

EDIT: btw, thanks for the free service and the effort you put in to keep it running!

[–] snowe@programming.dev 12 points 1 day ago (1 children)

programming.dev is the 9th largest lemmy server. https://join-lemmy.org/instances

That stat was probably that low due to the server being down for around 90% of the last two weeks. If you look now it's at 220 and it will continue to go up.

On top of that, every action on every server that is federated is relayed to every instance. So all of lemmy.world's activity is still relayed to us and we have to handle it. Same for the other servers.

On top of that we also operate many other services:

bytes.programming.dev
git.programming.dev
blocks.programming.dev
etc (there's a lot)

But really it was mostly just postgres thrashing on all the requests. Here's a look at our Cloudflare dashboard for number of requests:

Yes this should be handle-able by a server that small (think actor paradigm), but I was unable to tune postgres to get it to that point as I'm not great at database stuff. I'm sure a DBA would have done a better job. I will note that some of the queries being used in the lemmy code are very badly optimized and were taking 20+ seconds to run each time, locking up the instance. With that on top of some other badly optimized selects for things like reading comments (which would take like 7s mean), there wasn't much I could do.

With the cost difference it was well worth it to just upgrade to a cheaper better server all around.

[–] fuzzzerd@programming.dev 4 points 1 day ago (1 children)

For all of the attention in the early days about Lemmy being rust based and thus focused on performance, the database seems to be the main bottleneck neck and from anecdotal monitoring of the other admins complaints I'd say that seems true.

Seems like some design issues lead to heavy database usage and its going to be really hard to optimize away from that.

I don't really have a better idea, just acknowledging even a small instance has to scale disproportionally to its size when the rest of the network grows and that's heavy on the database specifically.

[–] BB_C@programming.dev 4 points 1 day ago

The push-based ActivityPub (apub) federation itself is bad design anyway. Something pull-based with aggregation and well-defined synchronisation would have been much better.

There are ideas beyond that. For example, complete separation between content and moderation. But that would diverge from the decentralized family of protocols apub belongs to, and may not attract a lot of users and traffic. And those who care and don't mind smaller networks prefer fully distributed solutions anyway.

Programming.dev Meta

Links

Credits