this post was submitted on 23 Nov 2023
1 points (100.0% liked)

Data Hoarder

1 readers
1 users here now

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time (tm) ). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] dr100@alien.top 1 points 11 months ago (4 children)

This is why you ALWAYS need INDEPENDENT backups. You can think all day long about detecting bitrot, and how well you're protected against X drive failures but then something comes from the side and messes up your data in a different way than you've foreseen.

[–] imakesawdust@alien.top 1 points 11 months ago (1 children)

The problem here is that those independent backups would also be corrupted. As I understand from the github discussion, the issue might be a bug that causes ZFS to not recognize when a page is dirty and needs to be flushed and is somehow triggered when copying files using a new-ish optimization that has been implemented in Linux and *BSD kernels? If you trigger the bug while copying a file, the original remains kosher but the new file has swaths of bad data. Any backup made after this point would contain both the (good) original and (corrupted) copied file.

[–] dr100@alien.top 1 points 11 months ago

The point is you'll still have the originals, which you might in the meantime have removed (for example if one would reorganize a huge collection and started by working on the reflinked copy and in the end removed the original, natural cleanup workflow, not many would think that you'd need to check the results after a reflinked nearly-instant copy, not even foresee that if there's some bitrot it'll come from THAT).

Sure, in this case snapshots would have worked just as well, but of course there are other cases in which they wouldn't have. Independent backups cover everything, well assuming you have enough history which is another discussion (I was considering to literally keep it forever after removing some old important file by mistake, but it becomes too daunting and too tempting to remove files removed 1,2,3 years ago).

[–] quint21@alien.top 1 points 11 months ago

something comes from the side and messes up your data in a different way than you've foreseen.

This happened to me years ago. Naïvely thinking SnapRAID protected me against the likelihood of a drive failure. I wasn't prepared for two drives failing simultaneously due to a power supply catastrophically failing (smoke, sparks) and frying the drives as it died.

It was an expensive lesson: I had to send one drive off for data recovery, and after I got it back I used SnapRAID to restore the remaining drive. Independent backups (and multiple parity drives) is the way.

[–] katbyte@alien.top 1 points 11 months ago

Also independent way to verify files. I cfv everything before a big move and then after to check

[–] henry_tennenbaum@alien.top 1 points 11 months ago

Wait. Are you trying to say that raid is not a backup?