Homelab

1034 readers

10 users here now

Rules

Be Civil.
Post about your homelab, discussion of your homelab, questions you may have, or general discussion about transition your skill from the homelab to the workplace.
No memes or potato images.
We love detailed homelab builds, especially network diagrams!
Report any posts that you feel should be brought to our attention.
Please no shitposting or blogspam.
No Referral Linking.
Keep piracy discussion off of this community

founded 2 years ago

MODERATORS

communick@selfhosted.forum

rglullis

Help! How do I troubleshoot NAS becoming unavailable, requiring hard reboot? ZFS pools in Debian 12 shared via SMB/NFS, LGA1366 Intel X5670 18 GB RAM. (alien.top)

submitted 2 years ago by Asinafuthimanahahfoo@alien.top to c/homelab@selfhosted.forum

4 comments fedilink hide all child comments

What should I monitor/log and how should I monitor/log to determine why my headless NAS is often becoming unavailable?

The problem:

Another machine that depends on the NAS routinely has its services unavailable because the NFS mounts are no longer mounted.
When that happens, sometimes a sudo mount -a recovers them.
Other times, the NAS is not pingable, so I go to the physical host, plug in monitor/keyboard and find that I can't log in. The login screen is frozen, requiring hard reboot.
Often when I leave a monitor attached (VGA), I come back to a screen that says:

critical medium error, dev sda, sector 163776752 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 2

I started a sudo smartctl -t long /dev/sda a few hours ago, and sometime since then, the server depending upon it no longer had NFS mounted. But a simple sudo mount -a resolved.

What the server was also doing when it had a network blip:

rclone was backing up to backblaze b2
Acting as NFS server for Plex/*arr media server
Acting as NFS storage for Proxmox machine (but no VMs or CTs running)

Pasted some zpool output below. Details about the machine:

Repurposed old hardware, just built this Debian 12 NAS a couple months ago
Operates as backup destination for other machines
Operates as media location for my Plex machine - other server that mounts the NAS via NFS.
P6X58D-E LGA 1366 motherboard, Intel X5670 CPU, 18 GB (3x4GB, 3x2GB triple channel)
8 hard drives connected to LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)

10GbE to managed TP-Link switch through one port on Mellanox Connectx-3 MCX312A-XCBT EN

➜ sudo zpool list NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT nvr 5.45T 3.35T 2.10T - - 2% 61% 1.00x ONLINE - tank 70.9T 34.4T 36.5T - - 0% 48% 1.00x ONLINE -

➜ sudo zpool status -v pool: nvr state: ONLINE scan: scrub repaired 0B in 08:49:40 with 0 errors on Sun Nov 12 09:13:41 2023 config:

      NAME            STATE     READ WRITE CKSUM
      nvr             ONLINE       0     0     0
        mirror-0      ONLINE       0     0     0
          6T-75LN0J4  ONLINE       0     0     0
          6T-95A2PNV  ONLINE       0     0     0

errors: No known data errors

pool: tank

state: ONLINE scan: scrub repaired 1M in 16:44:16 with 0 errors on Sun Nov 12 17:08:27 2023 config:

      NAME              STATE     READ WRITE CKSUM
      tank              ONLINE       0     0     0
        raidz1-0        ONLINE       0     0     0
          12T-5PGJ4A0D  ONLINE       0     0     0
          12T-Z2J26EBT  ONLINE       0     0     0
          12T-5PGHSZJC  ONLINE       0     0     0
        raidz1-1        ONLINE       0     0     0
          14T-9KG38U5L  ONLINE       0     0     0
          14T-9KG81HRL  ONLINE       0     0     0
          14T-9RGG5ZDC  ONLINE       0     0     0

errors: No known data errors

you are viewing a single comment's thread
view the rest of the comments

[–] merkuron@alien.top 1 points 2 years ago (1 children)

I’ve had drive failures bring down entire systems. Replace sda and see if the problems continue.

[–] Asinafuthimanahahfoo@alien.top 1 points 2 years ago

Fair enough! Going to start with memtest, per another comment, and narrow things down one at a time - probably by removing sda next.