this post was submitted on 29 Jun 2025
488 points (98.4% liked)
Linux Gaming
19615 readers
513 users here now
Discussions and news about gaming on the GNU/Linux family of operating systems (including the Steam Deck). Potentially a $HOME
away from home for disgruntled /r/linux_gaming denizens of the redditarian demesne.
This page can be subscribed to via RSS.
Original /r/linux_gaming pengwing by uoou.
No memes/shitposts/low-effort posts, please.
Resources
WWW:
Discord:
IRC:
Matrix:
Telegram:
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
According to Google, burned CDs and DVDs retain data for 5-10 years.
SSDs are between a few years and a few decades, depending on the age, type and quality of the SSD. Same goes for USB sticks.
HDDs are between 10 and 20 years.
Tape drives are at 30+ years.
I have CD-R demo discs from the early 00's that still play fine. Also according to Wikipedia: "On July 3, 1991, the first recording of a concert directly to CD was made using a Yamaha YPDR 601. The concert was performed by Claudio Baglioni at the Stadio Flaminio in Rome, Italy. At that time, it was generally anticipated that recordable CDs would have a lifetime of no more than 10 years. However, as of July 2020 the CD from this live recording still plays back with no uncorrectable errors."
Edit: Yes, a tape drive would be ideal but i'm poor af.
It's always a game of statistics.
You might have some 20yo disks that play fine, but there's enough 10yo disks that don't play fine. Also, especially with audio disks, having some data loss on them won't be noticeable. You could probably have up to 10% of data loss on the CD without hearing much of a difference.
Things are very different for data storage though. Here losing a single bit (e.g. of an encrypted/compressed file) might make the whole file unreadable. And if it's a critical file that might make the whole disk useless.
Audio CD is a very low-data-density format. There's a ton of data on there that doesn't matter (as exemplified by the fact that MP3 CDs can easily hold 6 times as much audio as a regular, uncompressed Audio CD). This low data density creates redundancy.
The data retention values above aren't about "After X years all of the data disappears" but about "This is how long the data will be fully retained without a single bit of data loss".
I also have HDDs from ~2000 that still work fine. The probably oldest piece of tech I own is a Gameboy, which has its BIOS in a ROM, and that one still works fine, even though it's older than 30 years now. But for one I don't own enough Gameboys to know whether I got an outlier here and I don't have the means to check if every single bit on that ROM is still identical to the original.
Sorry if I'm mostly focusing on paragraph 3 but I have to. MP3 CDs sound way worse than a redbook audio CD though. You can losslessly compress PCM by about 50% by using a codec like flac or alac, but there is data loss if you use a lossy format like .mp3. You can compress 20 vacation photos taken by an iPhone 16 to fit on a 1.44 mb floppy disk and you will have something resembling the original data, but I think you'll agree it's worse. Back to my original point, A CD-R is much more likely to reatain data for 5 years than an SSD is. Unless it's periodiclly powered on of couse. I have an HDD from 2008 in my PC actually. I'm often impressed how long they can last.
Sure, lossy compression is lossy, but that wasn't my point. My point was that data corruption in information-dense formats is more critical than in low-density formats.
To take your example of the vacation photos: If you have a 100 megapixel HDR photo and you lose 100 bytes of data, you will lose a few pixels and you won't even notice the change unless you zoom in quite far.
Compress these pictures down to fit on the floppy from your example (that would be ~73kb per photo), then losing 100 bytes of data will now be very noticeable in the picture, since you just lost ~0.1% of the whole data. Not taking the specifics of compression algorithms into account, you just lost 1 in every 1000 pixels, which is a lot.
High resolution low information density formats allow for quite a lot of damage before it becomes critical.
High information density formats on the other hand are quite vulnerable to critical data loss.
To show what I mean, take this image:
I saved it as BMP and then ran a script over it that replaces 1% of all bytes with a random byte. This is the result:
(I had to convert the result back to jpg to be able to upload it here.)
So even with a total of 99865 bytes replaced with random values, the image of an apple is clearly visible. There are a few small noise spots here and there, but the overall picture is still fine and if you print it as a photo, it's likely that these spots won't even be visible.
As a comparison, I now saved the original image as JPEG and also corrupted 1% of all bytes the same way. This here's the result. Gimp and many other file viewers can't open the file at all any more. Chrome can open it, and it looks like this:
The same happens with audio CDs. Audio CDs use uncompressed "direct" data, just like BMP. Data corruption only affects the data at the point of the corruption. That means, if one bit is unreadable, you probably won't be able to notice at all, and even if 1% of all data on the CD is corrupt, you will likely only notice a slightly elevated noise level, even though 1% data loss is an enormous amount.
If you instead use compressed formats (even FLAC) or if it's actual data and not media, a single illegible bit might destroy the whole file, because each bit of data depends on the information earlier in the file, so if one bit is corrupted, everything after that bit might become unreadable.
That's why your audio CD is still legible far beyond its expiry date, but a CD-R containing your backup data might not.
Again, these data retention time spans don't mean that after that time all data on the device disappears at once, but that until that time every single bit of data on your device is preserved. After that you might start to experience data loss, usually in the form of single bits or bytes failing.
Edit: Just for fun, this is what the BMP looks like with 95% corruption:
Even with this massive amount of damage, the image is still recognizable.
Edit 2: Due to a mistake in the script, this image is actually 61.3% corrupted, not 95%, but that's still a massive amount of corruption and the image is still clearly recognizable.
Fair enough, I misunderstood your argument. I appreciate your demonstration. Any chance you'd be willing to share your script? I have a few ideas on how to play with it.
Edit: I forgot, I actually had a HDD fail on me, luckily I was able to recover some of the data. Many .flac files on it were completely corrupted and unreadable past a certain point. The .aiff files I had were perfectly readable. I suspect they were at least partially corrupted. Luckily, I was able to re download all of the affected files. So, no data was actually lost.
If you run it, the first argument is the input file, the second one is the output file and the third is the percentage of corrupted bytes to inject.
I did spare the first 2000 bytes in the file to get clear of the file header (corruption on a BMP file header can still cause the whole image to be illegible, and this demonstration was about uncompressed vs compressed data, not about resilience of file headers).
I also just noticed when pasting the script that I don't check for double-corrupting the same bytes. At lower damage rates that's not an issue, but for the 95% example, it's actually 61.3% actual corruption.
Thanks, I'll make good use of it. I gotta to learn to write scripts like this.
I am not OP, but thanks a lot for a great educational post! Incredible how you can lose 95% of pixels from BMP and it still somewhat works.