Hi there, I just wanted to quickly share an issue that we worked through to do with database corruption. Perhaps there are some learnings or improvements that can be made to the backend handling of database connections/transactions to stop this happening in the future?
Problem
The activity table in lemmy db had 1 duplicated record, which we noticed when trying to reindex the db.
Cause
We suspect that the cause was due to duplicating the docker stack that contained our lemmy services. We are running these as docker swarm services. I think there was a short moment in time where two lemmy backend services existed, and were on the same network as the database, and therefore they both wrote to the activity table at the same time. afaik, it should not be possible to do this, if transactions are being used.
The second potential cause was stopping the old stack, but we're pretty sure that the services are meant to be stopped gracefully, but this is worth checking.
Fix
we copied out the activity table to a new table, and then copied all of the records to another table that did not have a unique key constraint. this allows you to search for the duplicate keys and remove them. verify that the duplicate is removed, then we performed the operation on the activity table and reindexed it. this resolved the issue.
tldr
be careful when stopping your stack, and also when duplicating it. definitely set up regular backups of your db, and I also feel that some regular backups should be built into the lemmy postgres image.