3-2-1-backup

joined 1 year ago
[–] 3-2-1-backup@alien.top 1 points 11 months ago

Well, two steps forwards, one step back. The scrub I ran yesterday at least showed some errors, but I'm having trouble identifying exactly what is the actual problem. I think I'll sleep on it and form a new plan in the morning.

Controller failure? RAM failure? Dmesg shows absolutely nothing, no panics no anything so I'm not thinking it's ram. Hmmmm... maybe I'll run mtest after I get some sleep.

3-2-1-backup@BackupServer:~$ sudo zpool status -vx
pool: data_pool3
state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
 see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
 scan: scrub repaired 40K in 07:07:07 with 4 errors on Tue Nov 28 22:39:33 2023
config:

    NAME                        STATE     READ WRITE CKSUM
    data_pool3                  ONLINE       0     0     0
      raidz2-0                  ONLINE       0     0     0
        wwn-0x5000ccax1  ONLINE       0     0     8
        wwn-0x5000ccax2 ONLINE       0     0    10
        wwn-0x5000ccax3 ONLINE       0     0     8
        wwn-0x5000ccax4 ONLINE       0     0     8
        wwn-0x5000ccax5 ONLINE       0     0     8
        wwn-0x5000ccax6 ONLINE       0     0     8
        wwn-0x5000ccax7 ONLINE       0     0     8
        wwn-0x5000ccax8 ONLINE       0     0     8

errors: Permanent errors have been detected in the following files:

    data_pool3/(redacted)/downloads@backup_script-2023-11-28-0901:/(redacted).mkv
    data_pool3/(redacted)@backup_script-2023-11-28-2001:/ISOs/Ubuntu/23.10/ubuntu-23.10.1-desktop-amd64.iso
    data_pool3/(redacted)@backup_script-2023-11-07-0901:/(redacted).mkv

Hey wow, even though my problem is getting worse (maybe), an actual honest-to-god ISO showed up in the problem file list!

[–] 3-2-1-backup@alien.top 1 points 11 months ago (1 children)

This is my backup server, so no. Primary does.

 

So just getting around to checking my logs on my backup server, and it says that I have a permanently damaged file that's un-repairable.

How is this even possible on a raidz2 volume where each member shows zero problems and no dead drives? Isn't that whole point of raidz2, so that if one (er, two) drives have a problem the data is recoverable? How can I figure out why this happened and why it was unrecoverable, and most importantly, prevent it in the future?

It's only my backup server and the original file is still A-OK, but I'm really concerned here!

zpool status -v:

3-2-1-backup@BackupServer:~$ sudo zpool status -v
pool: data_pool3
state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
 see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 06:59:59 with 1 errors on Sun Nov 12 07:24:00 2023
config:

    NAME                        STATE     READ WRITE CKSUM
    data_pool3                  ONLINE       0     0     0
      raidz2-0                  ONLINE       0     0     0
        wwn-0x5000ccaxxxxxxxx1  ONLINE       0     0     0
        wwn-0x5000ccaxxxxxxxx2  ONLINE       0     0     0
        wwn-0x5000ccaxxxxxxxx3  ONLINE       0     0     0
        wwn-0x5000ccaxxxxxxxx4  ONLINE       0     0     0
        wwn-0x5000ccaxxxxxxxx5  ONLINE       0     0     0
        wwn-0x5000ccaxxxxxxxx6  ONLINE       0     0     0
        wwn-0x5000ccaxxxxxxxx7  ONLINE       0     0     0
        wwn-0x5000ccaxxxxxxxx8  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

    data_pool3/(redacted)/(redacted)@backup_script:/Documentaries/(redacted)
[–] 3-2-1-backup@alien.top 1 points 11 months ago (2 children)

zfs --version also does the trick.

[–] 3-2-1-backup@alien.top 1 points 11 months ago

Makes me really glad I almost never bother to upgrade my pool flags!

(I mean seriously, I can't think of the last time I had a use for new flags!)

[–] 3-2-1-backup@alien.top 1 points 11 months ago (1 children)

Kind of want someone to re-run the numbers on this with updated storage densities.

[–] 3-2-1-backup@alien.top 1 points 11 months ago

I'm going to disagree with you there, it's completely situationally dependent. I tried running a wifi point-to-point link from my house to my detached garage, ran like hot garbage. Replaced the link with powerline, was much more stable and faster.

Right tool for the right job. Well really the right tool is to bury something (preferably fiber) between the buildings, but I'm not made out of money and the power line was already buried!

[–] 3-2-1-backup@alien.top 1 points 1 year ago

#*POOF*

OK who rubbed the lamp? Of course I follow the rule religiously!

[–] 3-2-1-backup@alien.top 1 points 1 year ago

This isn't about power monitoring, it's about eco-shifting. Those are very different things!

I use power monitoring to figure out when things that aren't smart are on/off. IDGAF about being more green; when I need to use my appliances I need to use them, period!

[–] 3-2-1-backup@alien.top 1 points 1 year ago

Following in case anyone has any good ideas. (I got nothing, sorry!)

[–] 3-2-1-backup@alien.top 1 points 1 year ago

Eh, PoE is meant to power devices. It doesn't care much whether that's a phone, a camera, or whatever. Worst case it'll just be slow depending on what flavor of PoE we're talking about.

[–] 3-2-1-backup@alien.top 1 points 1 year ago

I don’t want the switch to stop sending power to the outlet.

Make it so, then! Inside the switch electrical box, disconnect the outlet from the switch, and connect it to constant power instead. Easy-peasy, works with everything!

view more: next ›