Hi guys! I have a rather beefy machine. AMD Ryzen 7700, 32GB DDR5, GPU 7800XT 16GB, several NVME drives for OS, general data, games. And yet…after a while it becomes completely unresponsive. Mouse freezes, keyboard doesn’t key anything, and the screen gets completely frozen. Meanwhile the disk led gets full activity, almost constantly red. So…While this might be crazy pagination turning the system to a crawl (I have an 8GB swapfile), I want to be able to determine what’s going on. Is there a way I can check any log, or enable any kind of logging that would tell me what happened on the seconds before it became completely unresponsive? Who takes all my memory??

Normal situations where this happens:

Firefox open, multiple windows, lots of tabs. Maybe ~5-8GB of RAM.

Virtmanager running a Windows VM, running a work remote desktop…4GB of RAM

Steam…1GB of RAM

Thunderbird, Deluge, Telegram, Whatsapp…Not much more really.

This shouldn’t even come close to the RAM capacity of this machine. And yet…it really looks like it suffocates without memory. How can I check for issues?

  • AllHailTheSheep@sh.itjust.works
    link
    fedilink
    arrow-up
    2
    ·
    edit-2
    1 day ago

    smartctl would be what your looking for even for ssds (although ssds fail quick enough that if smartctl catches something there’s a chance it’s already too late, smartd allows for scheduled tests and I’ve definitely saved data off of ssds because I had daily smart tests running that caught early failure).

    I however strongly disagree with the hardware issue. there is no indication that this is hardware (honestly hardware accounts for VERY few issues like this, and RAM failing still happens but is 98% a thing of the past). diagnosing without any logs is a bit of a lost cause, we simply don’t have enough info, hopefully OP updates the post with the output of journalctl from the last boot.

    • Oinks@lemmy.blahaj.zone
      link
      fedilink
      arrow-up
      2
      ·
      edit-2
      1 day ago

      Bad RAM is still a thing (even on regular PCs), there’s a reason ECC memory has a market (true ECC, not the stuff that DDR5 has built-in). But I agree that it’s likely just an OOM/Thrashing situation. Linux famously doesn’t handle them very well, and the behavior OP is seeing is very much consistent with that.

      • AllHailTheSheep@sh.itjust.works
        link
        fedilink
        arrow-up
        2
        ·
        edit-2
        24 hours ago

        dead ram definitely still happens, yes, but it’s exceedingly rare. I fix hundreds of PCs a year, and I maybe get one or two a year where the root cause is actually bad ram. more often it’s configuration issues or hardware implementation issues, for example the gigabyte x870 boards really don’t like XMP for some reason.

        ecc doesn’t really have anything to do with whether a ram stick fails or not, it can help with misbehaving sticks but if a stick is dead it’s dead and ecc can’t help a dead region.