harsh3466

joined 9 months ago
[–] harsh3466@lemmy.ml 6 points 4 days ago

We're totally fucked.

 

EDIT 2: After learning that aliases aren't really suited for regex, and trying the script, I thought maybe reloading the .bashrc file wasn't enough to refresh the aliases, so I closed my terminal and after reopening the terminal and trying the script again it works just fine.

Okay, I've tried searching for help on this and I can't find anything, and I'm banging my head on my desk trying to figure out how to get this to work.

I routinely have to capitalize the first letter in a series of files that are passed to me. So I'll get:

file01
file02

And so on. I use perl rename (I'm using Fedora) with the following command and regex, and from within the directory it works as expected:

prename 's/(^[a-z]?)/\U$1/' *

I do this a lot. At least once a day, which calls for an alias or script.

I tried adding it as an alias to my .bash_aliases like so:

alias cap="prename 's/(^[a-z]?)/\U$1/' *"

And when I do, instead of capitalizing the first letter of the filenames it removes them. Searching got me nothing, in part because I probably am not asking the right question.

So then thought I'd write a dead simple bash script named cap (after removing the alias and reloading .bashrc)

#! /bin/bash

prename 's/(^[a-z]?)/\U$1/' *

And when I use cap in the directory, the script also cuts off the first letter instead of capitalizing it.

I suspect it's the $1 variable in the regex that's causing the problem, but I can't figure out how to address it so it works correctly in the alias or the script.

EDIT: I just tried some more searching and found that regex won't work in aliases, so it explains that, but I still can't figure out how to get it to work in the script.

[–] harsh3466@lemmy.ml 10 points 1 week ago

I know it's a privacy focused browser, and I've used it on my iPad. It's a decent enough browser. The best feature is that on iOS it actually supports plugins like ublock.

[–] harsh3466@lemmy.ml 38 points 2 weeks ago* (last edited 2 weeks ago)

The isn't snark. The answer is simply greed. The rich want to be richer. They want it all. The mentality is, "I don't care about anyone else, I want it all."

Edit: removed a redundant sentence

[–] harsh3466@lemmy.ml 14 points 2 weeks ago (1 children)

:shocked pikachu:

[–] harsh3466@lemmy.ml 7 points 3 weeks ago (1 children)

So people will get something like .25 USD?

[–] harsh3466@lemmy.ml 1 points 4 weeks ago (1 children)

Ah. Yeah. I think then you'll want to look into cloudflare tunnels. I believe that should get you through the cgnt and deal with the dynamic IP ll in one go.

[–] harsh3466@lemmy.ml 5 points 1 month ago (4 children)

You can deal with the non-static IP by using duckdns.org

[–] harsh3466@lemmy.ml 1 points 1 month ago

Another copy. Would have been crazy if it was the exact copy I had.

[–] harsh3466@lemmy.ml 3 points 1 month ago (2 children)

I was at a used bookshop the other day and found the same Caldera Open Linux 2.2 book and cd that I used to install my first linux distro on a pc. Man that was exciting!

[–] harsh3466@lemmy.ml 6 points 1 month ago

I get it. But I still hate it.

[–] harsh3466@lemmy.ml 36 points 1 month ago (5 children)

I despise the “flashback to a thing that literally happened five minutes ago to make sure you connect that with whatever just happened/is about to happen.”

Total fucking turnoff. I’m here watching the show and I’m not an idiot. Flashback to something last season or a number of episodes ago? Fine. Some people need a reminder. Within the same episode? GTFO of here with that shit.

 

Hello! I’m looking for book recommendations for learning programming fundamentals.

To be clear, I’m not necessarily looking for a book on learning language(s), but rather, programming, theory I guess you might call it?

For example, I’ve been playing around a lot in my terminal writing bash scripts, and I just implemented my first function. Another example, I know the phrase “Object Oriented programming”, but have no idea what it means.

I learn well by doing, and I’ve learned a lot just writing scripts and reading about bash scripting, but I also realize there’s a lot about programming at a higher level that I know nothing about.

 

Am I crazy in thinking that the shop I was in that has CentOS 3 running their self checkouts should have a more up to date and currently supported OS? These are brand new self checkouts (the shop has had them for about a year now, but you get my point.)

It’s a genuine question. Am I wrong in thinking that using this OS on a self checkout is a terrible idea? (FWIW this shop is an international retailer)

I have no stake in the shop or anything. I just happened to be there when they had to reboot a self checkout and I noticed the OS version as I was going by.

 

First, before this rather large infodump, I want to thank anyone that takes the time to read through this to offer any information or advice on trying to resolve this issue.

Here’s the issue I’ve been struggling with.

I keep getting this error on my server:

INFO: task txg_sync:1615 blocked for more than 241 seconds.
Tainted: P           O      5.15.0-112-generic #122-Ubuntu

Background:

I’ve got a homeserver running Ubuntu server 22.04 (no DE) with an Intel Core i7-6700K CPU @ 4.00GHz, 32GB RAM, and two ZFS pools. The OS is installed on its own 128GB SSD, and the two ZFS pools consist of a 128GB SSD in its own pool for the server’s cache, and a 4x8TB HDD RAIDZ1 pool that is my main data/server storage (much more detailed system info below).

I have a bunch of services running in Docker containers, and overall everything is great, except for when that error rears up.

The error seems random, but occurs most reliably, but not consistently, when I’m trying to write larger media files to the RAIDZ1 pool. I am aware that this an IOPS issue, but so far I have not been able to diagnose it.

Back in February, I was carrying some boxes down to the basement where the server rack is, and I accidentally kicked a stool into the server which knocked the shit out of it (it’s a tower pc that I built back in 2016 or 2017 and repurposed in 2020ish to server use). When I turned on the monitor it was in total panic mode. The screen was gibberish colors and flickering madness.

I had to force shutdown with the power button. I waited a good couple of minutes, and on reboot, everything seemed fine. Until I noticed the new txg_sync error a week or so later when I went down to add some media to the server.

After a lot of searching and reading that didn’t turn up pertinent info, I ran across a comment that said this error is almost always hardware related. Like a loose connection or a failing disk or something. With me having knocked the shit out of the server, I realized I should have opened the case up and checked it all out. I shut it down, opened it up, and found a loose connector on the motherboard. I reseated it, checked everything else (though not thoroughly enough, which we’ll get to), and rebooted hoping I had found the problem.

It seemed fine for a bit, but no luck. The error returned.

More searching with no luck, and then about a month ago, a friend he suggested I check all the SATA connectors by disconnecting each one and reconnecting to insure a good, solid connection. I had previously checked if they were seated when I opened the case, but didn’t disconnect and reconnect. While doing this, I found a SATA cable with a busted clip and thought again I had found the problem. I replaced the cable, and went about a week before the error resurfaced.

It continues to occur, as I mentioned inconsistently. Most reliably, but not always, when writing data to the RAIDZ1 pool.

I have run a thorough memtest, and there were no errors or issue with the RAM, and as far as I can tell, there are no errors/failures with the HDDs.

Below is a lot of system info, and an example of what I find in dmesg for the error.

System Info

OS & Kernel

Ubuntu 22.04.4 LTS
Linux [redacted user] 5.15.0-113-generic #123-Ubuntu SMP Mon Jun 10 08:16:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

CPU

Architecture:           x86_64
  CPU op-mode(s):       32-bit, 64-bit
  Address sizes:        39 bits physical, 48 bits virtual
  Byte Order:           Little Endian
CPU(s):                 8
  On-line CPU(s) list:  0-7
Vendor ID:              GenuineIntel
  Model name:           Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
    CPU family:         6
    Model:              94
    Thread(s) per core: 2
    Core(s) per socket: 4
    Socket(s):          1
    Stepping:           3
    CPU max MHz:        4200.0000
    CPU min MHz:        800.0000
    BogoMIPS:           7999.96
    Flags:              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dno wprefetch cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities
Caches (sum of all):    
  L1d: 128 KiB (4 instances)
  L1i: 128 KiB (4 instances)
  L2:  1 MiB (4 instances)
  L3:  8 MiB (1 instance)
NUMA:                   
  NUMA node(s):      1
  NUMA node0 CPU(s): 0-7
Vulnerabilities:        
  Gather data sampling: Vulnerable: No microcode
  Itlb multihit:        KVM: Mitigation: VMX unsupported
  L1tf:                 Mitigation; PTE Inversion
  Mds:                  Mitigation; Clear CPU buffers; SMT vulnerable
  Meltdown:             Mitigation; PTI
  Mmio stale data:      Mitigation; Clear CPU buffers; SMT vulnerable
  Retbleed:             Mitigation; IBRS
  Spec rstack overflow: Not affected
  Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl and seccomp
  Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:           Mitigation; IBRS; IBPB conditional; STIBP conditional; RSB filling; PBRSB-eIBRS Not
                         affected; BHI Not affected
  Srbds:                Mitigation; Microcode
  Tsx async abort:      Mitigation; TSX disabled

RAM

Memory Device
        Array Handle: 0x004A
        Error Information Handle: Not Provided
        Total Width: 64 bits
        Data Width: 64 bits
        Size: 8 GB
        Form Factor: DIMM
        Set: None
        Locator: DIMM_A2
        Bank Locator: BANK 1
        Type: DDR4
        Type Detail: Synchronous
        Speed: 2133 MT/s
        Manufacturer: Corsair
        Serial Number: 00000000
        Asset Tag: 9876543210
        Part Number: CMK16GX4M2A2400C16  
        Rank: 2
        Configured Memory Speed: 2133 MT/s
        Minimum Voltage: Unknown
        Maximum Voltage: Unknown
        Configured Voltage: 1.2 V
Handle 0x004D, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x004A
        Error Information Handle: Not Provided
        Total Width: 64 bits
        Data Width: 64 bits
        Size: 8 GB
        Form Factor: DIMM
        Set: None
        Locator: DIMM_B1
        Bank Locator: BANK 2
        Type: DDR4
        Type Detail: Synchronous
        Speed: 2133 MT/s
        Manufacturer: Corsair
        Serial Number: 00000000
        Asset Tag: 9876543210
        Part Number: CMK16GX4M2A2400C16  
        Rank: 1
        Configured Memory Speed: 2133 MT/s
        Minimum Voltage: Unknown
        Maximum Voltage: Unknown
        Configured Voltage: 1.2 V
Handle 0x004E, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x004A
        Error Information Handle: Not Provided
        Total Width: 64 bits
        Data Width: 64 bits
        Size: 8 GB
        Form Factor: DIMM
        Set: None
        Locator: DIMM_B2
        Bank Locator: BANK 3
        Type: DDR4
        Type Detail: Synchronous
        Speed: 2133 MT/s
        Manufacturer: Corsair
        Serial Number: 00000000
        Asset

Disks

NAME                      MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
loop0                       7:0    0  63.9M  1 loop /snap/core20/2264
loop1                       7:1    0  63.9M  1 loop /snap/core20/2318
loop2                       7:2    0    87M  1 loop /snap/lxd/27948
loop3                       7:3    0    87M  1 loop /snap/lxd/28373
loop4                       7:4    0  38.7M  1 loop /snap/snapd/21465
loop5                       7:5    0  38.8M  1 loop /snap/snapd/21759
sda                         8:0    0 223.6G  0 disk 
├─sda1                      8:1    0     1M  0 part 
├─sda2                      8:2    0     2G  0 part /boot
└─sda3                      8:3    0 221.6G  0 part 
  └─ubuntu--vg-ubuntu--lv 253:0    0   100G  0 lvm  /
sdb                         8:16   0 223.6G  0 disk 
├─sdb1                      8:17   0 223.6G  0 part 
└─sdb9                      8:25   0     8M  0 part 
sdc                         8:32   0   7.3T  0 disk 
├─sdc1                      8:33   0   7.3T  0 part 
└─sdc9                      8:41   0     8M  0 part 
sdd                         8:48   0   7.3T  0 disk 
├─sdd1                      8:49   0   7.3T  0 part 
└─sdd9                      8:57   0     8M  0 part 
sde                         8:64   0   7.3T  0 disk 
├─sde1                      8:65   0   7.3T  0 part 
└─sde9                      8:73   0     8M  0 part 
sdf                         8:80   0   7.3T  0 disk 
├─sdf1                      8:81   0   7.3T  0 part 
└─sdf9                      8:89   0     8M  0 part 

ZFS

version
zfs-2.1.5-1ubuntu6~22.04.4
zfs-kmod-2.1.5-1ubuntu6~22.04.3
list
NAME        SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
srvrcache   222G  13.8G   208G        -         -    17%     6%  1.00x    ONLINE  -
srvrpool   29.1T  17.5T  11.6T        -         -     1%    60%  1.00x    ONLINE  -
status
pool: srvrcache
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
       the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:00:35 with 0 errors on Sun Jun  9 00:24:36 2024
config:

        NAME        STATE     READ WRITE CKSUM
        srvrcache   ONLINE       0     0     0
          sdb       ONLINE       0     0     0

errors: No known data errors

 pool: srvrpool
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 11:41:27 with 0 errors on Sun Jun  9 12:05:31 2024
config:

        NAME        STATE     READ WRITE CKSUM
        srvrpool    ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            sdd     ONLINE       0     0     0
            sde     ONLINE       0     0     0
            sdf     ONLINE       0     0     0
            sdc     ONLINE       0     0     0

errors: No known data errors
iostat
capacity     operations     bandwidth 
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
srvrcache   13.8G   208G      0     22  48.0K   291K
  sdb       13.8G   208G      0     22  48.0K   291K
----------  -----  -----  -----  -----  -----  -----
srvrpool    17.5T  11.6T     33     59  2.14M  1.03M
  raidz1-0  17.5T  11.6T     33     59  2.14M  1.03M
    sdd         -      -      8     15   554K   275K
    sde         -      -      7     13   536K   252K
    sdf         -      -      8     15   562K   275K
    sdc         -      -      8     14   539K   252K
----------  -----  -----  -----  -----  -----  -----

PCI

00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers (rev 07)
00:01.0 PCI bridge: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) (rev 07)
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 530 (rev 06)
00:14.0 USB controller: Intel Corporation 100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller (rev 31)
00:16.0 Communication controller: Intel Corporation 100 Series/C230 Series Chipset Family MEI Controller #1 (rev 31)
00:17.0 SATA controller: Intel Corporation Q170/Q150/B150/H170/H110/Z170/CM236 Chipset SATA Controller [AHCI Mode] (rev 31)
00:1b.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #17 (rev f1)
00:1c.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #1 (rev f1)
00:1d.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #9 (rev f1)
00:1f.0 ISA bridge: Intel Corporation Z170 Chipset LPC/eSPI Controller (rev 31)
00:1f.2 Memory controller: Intel Corporation 100 Series/C230 Series Chipset Family Power Management Controller (rev 31)
00:1f.3 Audio device: Intel Corporation 100 Series/C230 Series Chipset Family HD Audio Controller (rev 31)
00:1f.4 SMBus: Intel Corporation 100 Series/C230 Series Chipset Family SMBus (rev 31)
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-V (rev 31)
01:00.0 VGA compatible controller: NVIDIA Corporation GP106 [GeForce GTX 1060 3GB] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GP106 High Definition Audio Controller (rev a1)
03:00.0 USB controller: ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller

dmesg

[321540.060243] INFO: task txg_sync:1615 blocked for more than 120 seconds.
[321540.060330]       Tainted: P           O      5.15.0-112-generic #122-Ubuntu
[321540.060408] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[321540.060495] task:txg_sync        state:D stack:    0 pid: 1615 ppid:     2 flags:0x00004000
[321540.060498] Call Trace:
[321540.060500]  <TASK>
[321540.060502]  __schedule+0x24e/0x590
[321540.060509]  schedule+0x69/0x110
[321540.060512]  schedule_timeout+0x87/0x140
[321540.060515]  ? zio_issue_async+0x12/0x20 [zfs]
[321540.060653]  ? __bpf_trace_tick_stop+0x20/0x20
[321540.060657]  io_schedule_timeout+0x51/0x80
[321540.060661]  __cv_timedwait_common+0x12c/0x170 [spl]
[321540.060669]  ? wait_woken+0x70/0x70
[321540.060672]  __cv_timedwait_io+0x19/0x20 [spl]
[321540.060679]  zio_wait+0x116/0x220 [zfs]
[321540.060799]  dsl_pool_sync+0xb6/0x400 [zfs]
[321540.060890]  ? __mod_timer+0x214/0x400
[321540.060894]  spa_sync_iterate_to_convergence+0xe0/0x1f0 [zfs]
[321540.060997]  spa_sync+0x2dc/0x5b0 [zfs]
[321540.061098]  txg_sync_thread+0x266/0x2f0 [zfs]
[321540.061206]  ? txg_dispatch_callbacks+0x100/0x100 [zfs]
[321540.061314]  thread_generic_wrapper+0x61/0x80 [spl]
[321540.061324]  ? __thread_exit+0x20/0x20 [spl]
[321540.061332]  kthread+0x127/0x150
[321540.061336]  ? set_kthread_struct+0x50/0x50
[321540.061339]  ret_from_fork+0x1f/0x30
[321540.061344]  </TASK>

Thank you again to anyone who takes the time to offer any info or advice on resolving this.

 

Most of the switching posts are from frustrated windows users making the jump. I’m already a Linux user on my server (Ubuntu for now, going Debian at some point) and a 2014 iMac for tinkering/testing (KDE Neon), and a couple of raspberry pis (raspberry pi os headless) but our main household computer is an M1 Mac mini that my wife and I both use.

Lately I’ve been super frustrated with macOS.

  • First, macOS just refuses to mount my USB 3 drives. I have a 1T seagate ssd and a 3T WD hdd (both exFat) and it just flat out refuses to see them. The same drives are visible and mount just fine on my server and the KDE iMac. On macOS, they’re invisible. They don’t auto mount, and they don’t show up in disk utility (gui or shell), which is really fucking annoying when I’m trying to move large files between machines
  • I use Cryptomator to encrypt data on macOS, and because of their whole walled garden shtick and how they continue to lock out system extensions, macfuse routinely breaks, rendering it impossible to access my data on macOS. Again, on the KDE iMac, everything just works as it should. On the Mac It’ll throw me the enable the extension warning, so I enable it. Then it tells me I have to re-boot to actually use the extension. I reboot, and it throws the enable extension warning again. Fucking infuriating.

I hadn’t already pulled the trigger on Asahi because my wife uses the m1 more than I do, and I didn’t want to break anything she does. However today was the last straw as a task that should have taken me maybe 15 minutes took two hours of fighting with macOS. After talking with her she gave me the go ahead to install Asahi. It helps that she does most everything in the browser and that the install is a dual boot setup with macOS still available.

I used to love macOS. It felt so intuitive and while it was never flawless, it mostly just got the fuck out of my way so I could do the things I wanted and needed to do. I still love a lot about Apple hardware, but fuck that shit os. I’m happy to be running Linux on all of the computers in the house.

Now I just have to learn the Fedora differences, having used Debian derivatives up until this point.

 

Not as much time to tinker this week, but I still had some fun and learned some things!

How to run a memory test using memtest86+

My error message is back, which means my nuke and pave approach didn’t solve the problem. So, yay to having a record of the error message?

Here’s the error:

echo 0 > / proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task txg_sync: 3557 blocked for more than 241 seconds.
Tainted: P 0 5.15.0-94-generic #104-Ubuntu

I decided to do a little more searching and found that the txg_sync is a zfs task. I know zfs uses a lot of RAM as part of it’s processing. As a result/starting point, I decided to do a memory test to see if I messed up any of my RAM modules when I knocked the shit out of my server.

Running a memory test was really easy. I downloaded the latest memtest86+ ISO, used balena etcher to flash it to a usb stick, booted from that stick, and let the test run.

I let it run for two full passes and got no errors.

So as of right now I know that the error is being caused by zfs writes/activity, but I don’t know why the error is happening, other than that I fucked something up when I knocked the shit out of my server.

How to set up a wireguard tunnel

This also has been on my list of things to figure out for quite awhile, and, turns out, with the wg-easy project, it is exactly as easy as the name implies. I found out about wg-easy through the Awesome Open Source YouTube channel. I’ve learned a lot from the guy that runs that channel, so I always check there when I want to learn more about something. Timing was fortuitous, since he had just dropped his video on wg-easy.

I’ve got wg-easy running on my vps, and I’m planning to connect my playground server to it so I can ssh in and play around during my breaks at work.

Grav CMS is pretty nice

I’ve kind of idly been looking for an alternative to Wordpress. I found out about Grav while bouncing around Linux YouTube looking for things to learn about/try. I’ve already tried 11ty and Hugo ssgs and neither worked for me.

Grav on the other hand was easy to get up and running, is easy to theme, (Theming was the problem I kept running into for both 11ty and Hugo), can be managed through cli or webui, and can have content added through the webui, or, more importantly for me, from markdown files on the server.

Whether or not I’ll actually use it to deploy a site remains to be seen. I’ll continue to tinker with it while I decide if I want to migrate my wordpress site over to it.

75
submitted 9 months ago* (last edited 9 months ago) by harsh3466@lemmy.ml to c/linux@lemmy.ml
 

Edit: I've made an account here on lemmy.ml as I routinely can't comment or post from my account on lemmy.world.

Bit of a week! As usual, had a lot of fun tinkering. Here’s my takeaways from this past week(ish).

I finally learned how to set up a cron job with elevated privileges

This is something I've had on my , "I should really get this figured out" list for about two years now, but instead have been inconsistently typing my rsync commands (Since I've also been too lazy to set up the aliases for these commands).

I spent a couple of days rebuilding my server from the OS up (for reasons which I will explain momentarily), and since I'm up on a fresh OS with all my containers and services up and running, I figured it was time I figure out this cron job thing.

The approach I took was to write a simple bash script for my backup. The script is four lines. Three of which are sudo rsync ..., and the last of which is a curl -d ... command.

The rsync commands are to incrementally back up my server data, cache, and docker volumes.

The curl command triggers a notification through my ntfy instance, (link is to ntfy, not to my instance), to let me know the backups have successfully completed.

In order for that to run properly, I also had to learn....

How to update sudoers privileges

After reading about crontab and privileges, I know I could have just edited /etc/crontab and run my script as root, but what would be the fun in that when I could also learn about changing privileges through sudoers! So I learned how to modify sudo priviliges by creating a new file in sudoers.d with the command:

sudo visudo -f /etc/sudoers.d/name-of-my-sudoers-file

And why that and not just editing /etc/sudoers directly with nano or vim or emacs? That was my first question when I saw that command and thought, "Oh, shit, I'm going to have to brush up on my vi/m."

Turns out, if you just rando edit sudoers (or add a file to /etc/sudoers.d/) with any old editor, you can fuck up the syntax, and if you fuck up the syntax, you can fuck up your ability to use sudo, and then you can't do anything requiring sudo on your machine without going through tremendous headache to fix it.

However, if you use sudo visudo ..., you get syntax verification to prevent you from breaking sudo.

And, on Ubuntu server, visudo uses nano by default, which meant I didn't have to worry about vim just yet (vim is on my roadmap of things to learn)

(Also, you can change the default editor visudo uses, but I don't remember the command because I won't be changing it until I get a grip on vim and can make a decision about which editor I want to use.)

With all that being said, I created a file in /etc/sudoers.d and added a line to allow my backup script to run with elevated privileges without requiring a password with this syntax:

username ALL=(root) NOPASSWD: /path/to/my/script

Good documentation/notes will save you like good backups will save you

This isn’t something that’s new to me, or to linux (Arch wiki ftw) but it’s something that 100% made rebuilding my server from the OS up a pretty worry free breeze.

So why did I rebuild my server again a little over a month after rebuilding it from the OS up? Turns out when you accidentally kick a stool while carrying a heavy box and that stool knocks the fuck out of your server, your OS can get fucked up.

This happened a few weeks ago, and boy was I panicked when I first kicked that stool into the server. After putting down the boxes I turned on the monitor and the screen was freaking out. It looked like a scrambled Max Headroom. I held the power button to force a shutdown, and after rebooting the server everything came back up and I thought, ”Holy shit, I dodged a bullet!”

(Bonus lesson, I learned to not leave the stool in front of the server rack!)

But, all was not well. My server data and cache are on zfs pools, and every time I tried to bulk add some of the shows or movies I prepared to the data pool, I would get this procsys kernel panic error. I had repeatedly been checking my zpool status, and everything was good there. So I was furiously searching trying to figure out what the error meant, and I kept finding folks with the same or similar errors who talked about checking logs, but whenever I checked logs I couldn’t find anything to indicate what was actually going on.

Finally, after a few days additional searching, I ran across a comment on a thread that said this particular error (I neglected to save the error, I wish I had) was usually a hardware related issue, like a loose connector, and I thought, ”Holy shit, that makes perfect sense after knocking the shit of my server!”

So I shut it down, opened it up, and sure enough there was a loose cable on the motherboard. I reseated it, checked the rest, rebooted, and over the course of the next week, it seemed all was well.

But I kept getting these weird errors. Not actual error messages, no more kernel panics, and data wrote to the zpool just fine. It was little things not working as expected. Commands that typically ran very speedily (like ls) were lagging, opening a file in nano took multiple seconds instead of being near instant, stuff like that.

I decided to go for a nuke and pave approach, rebuilding from the OS up again, which is where the documentation comes in. Since I started messing about with self hosting 2-3 years ago, I’ve kept meticulous notes on everything I have done and learned so that if I had to re-do it, I could open up Joplin, search for whatever I needed, and proceed. This has saved my ass multiple times over the years as I tinker, break shit, and fix it using my notes.

So yeah, in addition to having a good backup system, you should also keep good documentation for yourself.

edit: removed extra 4 from post title

view more: next ›