this post was submitted on 24 Sep 2025
79 points (97.6% liked)
Linux
58584 readers
1115 users here now
From Wikipedia, the free encyclopedia
Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).
Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.
Rules
- Posts must be relevant to operating systems running the Linux kernel. GNU/Linux or otherwise.
- No misinformation
- No NSFW content
- No hate speech, bigotry, etc
Related Communities
Community icon by Alpár-Etele Méder, licensed under CC BY 3.0
founded 6 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
You're asking the right questions, and there have been some great answers on here already.
I work at the crossover between IT and digital preservation in a large GLAM institution, so I'd like to offer my perspective. Sorry of there are any peculiarities in my comment, English is my 2nd language.
First of all (and as you've correctly realizes), compression is an antipattern in DigiPres and adds risk that you should only accept of you know what you're doing. Some formats do offer integrity information (MKV/FFV1 for video comes to mind, or the BagIt archival information package structure), including formats that use lossless compression, and these should be preferred.
You might want to check this to find a suitable format here: https://en.wikipedia.org/wiki/List_of_archive_formats -> Containers and compression
Depending on your file formats, it might not even be beneficial to use a compressed container, e.g. if you're archiving photos/videos that already exist in compressed formats (JPEG/JFIF, h.264, ...).
You can make your data more resilient by choosing appropriate formats not only for the compressed container but also for the payload itself. Find significant properties of your data and pick formats accordingly, not the other way round. Convert before archival of necessary (the term is normalization).
You might also want to consider to reduce the risk of losing the entirety of your archive by compressing each file individually. Bit rot is a real threat, and you probably want to limit the impact of flipped bits. Error rates for spinning HDDs are well studied and understood, and even relatively small archives tend to be within the size range for bit flips. I can't seem to find the sources just now, but iirc, it was something like 1 Bit in 1.5TB for disks at write time.
Also, there's only so much you can do against bit rot on the format side, so consider using a filesystem that allows you to run regular scrubs and so actually run them; ZFS or Btrfs come to mind. If you use a more "traditional" filesystem like ext4, you could at least add checksum files for all of your archival data that you can then use as a baseline for more manual checks, but these won't help you repair damaged payload files. You can also create BagIt bags for your archive contents, because bags come with fixity mechanisms included. See RFC 8493 (https://datatracker.ietf.org/doc/html/rfc8493). There are even libraries and software that help you verify the integrity of bags, so that may be helpful.
The disk hardware itself is a risk as well; having your disk laying around for prolonged periods of time might have an adverse effect on bearings etc. You don't have to keep it running every day, but regular scrubs might help to detect early signs of hardware degradation. Enable SMART if possible. Don't save on disk quality. If at all possible, purchase two disks (different make & model) to store the information.
DigiPres is first and foremost a game of risk reduction and an organizational process, even of we tend to prioritize the technical aspects of it. Keep that in mind at all times
And finally, I want to leave you with some reading material on DigiPres and personal archiving on general.
I've probably forgotten a few things (it's late...), but if you have any further questions, feel free to ask.
EDIT: I answered to a similar thread a few months ago, see https://sh.itjust.works/comment/13922388
danke für deinen Beitrag! Auch wenn ich dir Frage nicht gestellt hab, war dein Post super informativ und hab auch echt was gelernt :) Besonders die Perspektive, wie in deinem Feld an das Thema herangegangen wird ist für Laien sehr wertvoll um ein Gefühl für die wichtigen Aspekte zu erkennen! (und denke mal, bei dem Username, dass du deutsch sprechen kannst haha)
Ich bleib' trotzdem mal bei Englisch, damit's im englischen Thread verstanden wird.
ENGLISH: Yeah, you're right, I wasn't particularly on-topic there. :D I tried to address your underlying assumptions as well as the actual file format question, and it kinda derailed from there.
Sooo, file format... I think you're restricting yourself too much if you just use the formats that are included in binutils. Also, you have conflicting goals there: it's compression (make the most of your storage) vs. resilience (have a format that is stable in the long term). Someone here recommended
lzip
, which is definitely a right answer for good compression ratio. The Wikipedia article I linked features a table that compares compressed archive formats, so that might be a good starting point to find resilient formats. Look out for formats with at least Integrity Check and possibly Recovery Record, as these seem to be more important than compression ratio. When you have settled on a format, run some tests to find the best compression algorithm for your material. You might also want to measure throughput/time while you're at it to find variants that offer a reasonable compromise between compression and performance. If you're so inclined, try to read a few format specs to find suitable candidates.You're generally looking for formats that:
You might want to read up on more technical infos on how an actual archive handles these challenges at https://slubarchiv.slub-dresden.de/technische-standards-fuer-die-ablieferung-von-digitalen-dokumenten and the PDF files with specifications linked there (all in German).
Just note that @RiverRabbits@lemmy.blahaj.zone wasn't the one who opened the Thread, that's why they said they didn't ask the question (I get the feeling there might have been some confusion here :P ).
Still, very informative comment.
Haha, yeah I'm not the OP! But the way my german is phrased here and how the replier interpreted it would read as super passive aggressive (think "I didn't ask that question but thanks"), and for that I apologize 😭 I just meant I'm not the OP😌
Of yeah, there really was, thank you. :)