this post was submitted on 07 Mar 2024
35 points (88.9% liked)

Linux

48186 readers
1699 users here now

From Wikipedia, the free encyclopedia

Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).

Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.

Rules

Related Communities

Community icon by Alpár-Etele Méder, licensed under CC BY 3.0

founded 5 years ago
MODERATORS
 

My laptop is working just fine. It's from 2018 and it has an NVME drive.

It has an EFI boot partition and other partition with LUKS and LVM on top of that.

Since this week I see these logs from time to time:

Mar 07 17:31:14 almendra kernel: pcieport 0000:00:1d.6: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Mar 07 17:31:14 almendra kernel: pcieport 0000:00:1d.6:   device [8086:34b6] error status/mask=00000001/00002000
Mar 07 17:31:14 almendra kernel: pcieport 0000:00:1d.6:    [ 0] RxErr                  (First)
Mar 07 17:31:14 almendra kernel: pcieport 0000:00:1d.6: AER:   Error of this Agent is reported first
Mar 07 17:31:14 almendra kernel: nvme 0000:02:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Mar 07 17:31:14 almendra kernel: nvme 0000:02:00.0:   device [8086:0975] error status/mask=00000001/00002000
Mar 07 17:31:14 almendra kernel: nvme 0000:02:00.0:    [ 0] RxErr                  (First)

The devices are:

$ lspci -vv | grep 1d.6
00:1d.6 PCI bridge: Intel Corporation Device 34b6 (rev 30) (prog-if 00 [Normal decode])

$ lspci -vv | grep 02:00.0
02:00.0 Non-Volatile memory controller: Intel Corporation Optane NVME SSD H10 with Solid State Storage [Teton Glacier] (prog-if 02 [NVM Express])

The laptop works like always, but I have the impression that the NVME drive is telling me something bad.

It happens from time to time:

$ journalctl --since yesterday | grep -c "nvme 0000:02:00.0: PCIe Bus Error: severity=Corrected, type=Physical"
9

Do you know what does it mean?

you are viewing a single comment's thread
view the rest of the comments
[–] rotopenguin@infosec.pub 7 points 8 months ago* (last edited 8 months ago) (3 children)

Given that it's just an interface error, you could try turning it all off, take the drive out and hit its contacts with electronics contact cleaner (I guess CRC brand is good as any). Work it a little bit, let it dry before putting it all back together.

Another possibility is that power management is being naughty. Fiddle with ASPM or APST.

Oh and do a btrfs/zfs scrub to check that your data is correct.

[–] possiblylinux127@lemmy.zip 2 points 8 months ago* (last edited 8 months ago) (1 children)

Doing a scrub on bad hardware will make corruption worse in many cases. When you have faulty hardware freeze everything

This person has had the same device for 6 years. If the drive was used heavily it probably just failed due to age

[–] rotopenguin@infosec.pub 1 points 8 months ago

Yeah, you're probably right. I'm thinking in terms of "not a raid, no redundant copies available" scrub, where the main output would be a sanity check of data checksums.

[–] vsis@feddit.cl 1 points 8 months ago* (last edited 8 months ago)

I used a hand dust blower intended for photography gear. I opened the laptop, blew the dust, disconnected the SSD and blew the socket and it's surroundings.

Now I will monitor the logs and see if it helps.

Thanks.

[–] mvirts@lemmy.world 1 points 8 months ago

Dont forget to blow on it