this post was submitted on 18 Mar 2024
37 points (100.0% liked)
Chat
7500 readers
22 users here now
Relaxed section for discussion and debate that doesn't fit anywhere else. Whether it's advice, how your week is going, a link that's at the back of your mind, or something like that, it can likely go here.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Off to a fairly rough start, unfortunately :/
Spent seven hours today trying and failing to get docker to work with our Jenkins deployment at work, and on top of that, my brand new GPU keeps “falling off the bus” (Ubuntu, 4070 Ti Super, randomly screen freezes and need a reboot to fix - but PC still runs so I can SSH in & check dmesg and whatnot). Sometimes it’s every 12 hours or so, or even more, but sometimes (today, for instance), it feels like it’s every ten minutes. Which … sucks.
Side note… if anybody knows how the heck to fix a GPU falling off the bus… please let me know, lol. It only happens when I’m using the PC (as in, if it’s on but the mouse ain’t moving, it doesn’t seem to happen), and I’m running the latest & greatest NVIDIA 550 drivers. Ubuntu 22.04. Reseated GPU, running a 1000W EVGA PSU and the Kill-a-watt attached to it never goes above 450 or so. And the crash never seems to happen when it’s under a huge amount of load, like doing AI stuff… only ever seems to happen when I’m browsing files and such. Anyone ever run into this before?? All of the google answers seem to say it’s a bad PSU or similar, but the PSU has been working just fine & dandy in other PCs, and this system wasn’t doing this at all with my old NVIDIA GPU (swapped last week)…
Typically In my experience, what you describe is a Power/Wattage issue. Could be a powerdown, sleep issue, or something like either the GPU isn't getting the power it needs when it needs it, or the PSU is just over taxed. Would really want to see DMESG logs and more hardware info (Do you have crashdumps?). Try disabling any powerdown or C-states for the GPU, prevent it from going to sleep.
Appreciate the response! After many, many hours of research, I came to the same conclusion. I tried a whole multitude of solutions that worked for others and none of them seemed to work - except for a weird hacky “solution” to just permanently set the power state of the GPU to max. Unfortunately, that means it consumes ~50 watts idle instead of the 5-10 it managed beforehand… but the fact that it fixed the system lockups made it worth it. I think the issue was something having to do with the GPU not properly waking up from lower power modes - so I super appreciate the advice :)