Hi guys! I have a rather beefy machine. AMD Ryzen 7700, 32GB DDR5, GPU 7800XT 16GB, several NVME drives for OS, general data, games. And yet…after a while it becomes completely unresponsive. Mouse freezes, keyboard doesn’t key anything, and the screen gets completely frozen. Meanwhile the disk led gets full activity, almost constantly red. So…While this might be crazy pagination turning the system to a crawl (I have an 8GB swapfile), I want to be able to determine what’s going on. Is there a way I can check any log, or enable any kind of logging that would tell me what happened on the seconds before it became completely unresponsive? Who takes all my memory??

Normal situations where this happens:

Firefox open, multiple windows, lots of tabs. Maybe ~5-8GB of RAM.

Virtmanager running a Windows VM, running a work remote desktop…4GB of RAM

Steam…1GB of RAM

Thunderbird, Deluge, Telegram, Whatsapp…Not much more really.

This shouldn’t even come close to the RAM capacity of this machine. And yet…it really looks like it suffocates without memory. How can I check for issues?

  • JayArr@lemmy.today
    link
    fedilink
    English
    arrow-up
    1
    ·
    8 hours ago

    I seem to remember having similar issues on Neon, back shortly after it came out. I chalked it up to it being bleeding edge’ish, went back to Kubuntu and then Debian.

  • glitching@lemmy.ml
    link
    fedilink
    arrow-up
    2
    ·
    edit-2
    11 hours ago

    you’re nowhere close to RAM exhaustion. I had similar mishaps on an all-AMD system a few gens back and it manifested itself as micro-stutters that occasionally grew to such manifestations. I think I remember it was fixed via a combination of kernel switches and progressively better performance as new versions of kernel and modules/drivers progressed.

    no idea what KDE Neon is based on (Ubuntu LTS?), but I’m guessing you rock pretty old kernels and relatively modern hardware, which is a pain. also you don’t need a swapfile, use zram. or just switch to fedora or sumsuch that takes care of all them things for you.

  • Eugenia@lemmy.ml
    link
    fedilink
    English
    arrow-up
    13
    arrow-down
    2
    ·
    1 day ago

    This looks like either a driver issue, but more likely, a hardware issue. Either your nvme, or your RAM, is faulty. Run memcheck (it’s a bootable thing you run to make sure your ram is ok), and I’m sure there are tests for ssds too.

    • AllHailTheSheep@sh.itjust.works
      link
      fedilink
      arrow-up
      2
      ·
      edit-2
      1 day ago

      smartctl would be what your looking for even for ssds (although ssds fail quick enough that if smartctl catches something there’s a chance it’s already too late, smartd allows for scheduled tests and I’ve definitely saved data off of ssds because I had daily smart tests running that caught early failure).

      I however strongly disagree with the hardware issue. there is no indication that this is hardware (honestly hardware accounts for VERY few issues like this, and RAM failing still happens but is 98% a thing of the past). diagnosing without any logs is a bit of a lost cause, we simply don’t have enough info, hopefully OP updates the post with the output of journalctl from the last boot.

      • Oinks@lemmy.blahaj.zone
        link
        fedilink
        arrow-up
        2
        ·
        edit-2
        1 day ago

        Bad RAM is still a thing (even on regular PCs), there’s a reason ECC memory has a market (true ECC, not the stuff that DDR5 has built-in). But I agree that it’s likely just an OOM/Thrashing situation. Linux famously doesn’t handle them very well, and the behavior OP is seeing is very much consistent with that.

        • AllHailTheSheep@sh.itjust.works
          link
          fedilink
          arrow-up
          2
          ·
          edit-2
          24 hours ago

          dead ram definitely still happens, yes, but it’s exceedingly rare. I fix hundreds of PCs a year, and I maybe get one or two a year where the root cause is actually bad ram. more often it’s configuration issues or hardware implementation issues, for example the gigabyte x870 boards really don’t like XMP for some reason.

          ecc doesn’t really have anything to do with whether a ram stick fails or not, it can help with misbehaving sticks but if a stick is dead it’s dead and ecc can’t help a dead region.

  • Oinks@lemmy.blahaj.zone
    link
    fedilink
    arrow-up
    8
    ·
    edit-2
    1 day ago

    If your DE/Launcher uses systemd scopes properly you might be able to see something in the journal. As an example somewhere in my logs I can see this:

    Jan 17 17:52:50 sky systemd[2171]: app-niri-steam-40213.scope: Failed with result 'oom-kill'.
    Jan 17 17:52:50 sky systemd[2171]: app-niri-steam-40213.scope: Consumed 6h 32min 39.773s CPU time, 9.4G memory peak, 6.2G memory swap peak.
    

    That’s pretty clearly severe thrashing and an eventual OOM event caused by a game. If you’re not familiar, the command journalctl -e -b -1 gives you the last log lines from the last boot. Use d and u to navigate the pager and q to quit. This will only work if the launcher you are using sets up transient systemd scopes and doesn’t just fork-exec into the application (Fuzzel does the wrong thing by default, as do many others).

    I’ve also seen large Steam downloads causing such issues, so capping your download speed might help. As could enabling ZRAM.

    Edit: Also, this is most likely completely unrelated but do note that Neon is basically abandoned. You should very much consider switching to a maintained distribution, whether that’s another Ubuntu spin or Fedora or something else entirely.

    • iturnedintoanewt@lemmy.worldOP
      link
      fedilink
      arrow-up
      2
      ·
      15 hours ago

      Thanks for the journalctl command, I think I was looking for hints like this. I’ll be reviewing my journalctl next time I get a crash. Regarding Steam, since it’s using NVME both for the OS and the gaming disk, it downloads at rather crazy speeds without slowing down the OS (as long as I’m not trying additionally something else also crazy of course…but I can continue browsing and watching videos just fine).

      Also, this is most likely completely unrelated but do note that Neon is basically abandoned. You should very much consider switching to a maintained distribution, whether that’s another Ubuntu spin or Fedora or something else entirely.

      Thanks! Yeah I might reconsider a whole system wipe. I’ve tried shortly Fedora before, and Nobara for a few years, but I think I’d prefer something Ubuntu-based with KDE. Something that it’s not Kubuntu, that is. I don’t want snap crap.

  • Hiro8811@lemmy.world
    link
    fedilink
    arrow-up
    2
    ·
    edit-2
    1 day ago

    Ok so I kinda had a similar problem. Difference is that I was using Arch and full disk encryption. System would freeze up if I tried writing big files and disk light would start blinking. It might not be that to so maybe run “journalctl -b -1” the next boot after your system freezes and check towards the bottom of the log to see if there are any errors, usually red. Another way is to use btop running in the background and when the system gives any sign that it’ll freeze switch to btop and check what’s going on. Edit: something that came to me is to try to switch to another tty using Ctrl+alt+number, I’m not sure how neon works so try 2 or 3 or 4.

  • ik5pvx@lemmy.world
    link
    fedilink
    arrow-up
    2
    ·
    1 day ago

    In addition to all the good suggestions already here, consider installing early-oom and configure it to kill the stuff you care less, maybe one of those heavy electron-based clients.

    • anon5621@lemmy.ml
      link
      fedilink
      arrow-up
      1
      ·
      12 hours ago

      Better use systemd-oomd it comes with systemd already on arch and works pretty well

  • Magister@lemmy.world
    link
    fedilink
    arrow-up
    5
    ·
    1 day ago

    When this happens, can you switch console (ctrl-alt-f1) or restart X (ctrl-alt-backspace) or can you ssh from another PC? you can also in a window have a journalctl or something tailing the logs and see if something is happening there

  • folekaule@lemmy.world
    link
    fedilink
    arrow-up
    2
    ·
    1 day ago

    It may not be the raw RAM usage.

    My first suspect is the Windows VM especially if it’s running enterprise security software 4GB is probably not enough for modem Windows and it could be trying to use its page file, thrashing your disk in the process.

    Are you able to collect some data from system monitor on paging and disk activity? That could help you narrow it down. You can use btop for a quick terminal option if your gui is non responsive (assuming your could switch to a console). Vmstat is another option that you can run in the background to collect stats over time, but it’s not user friendly.

    • iturnedintoanewt@lemmy.worldOP
      link
      fedilink
      arrow-up
      1
      ·
      15 hours ago

      Nothing much enterprise…It’s running “Windows App”, just a glorified RDP with extra authentication settings for SSO etc. Hence why I gave it only 4GB. It’s not just GUI not being responsive, everything is. It’s a full freeze, and I can’t get to the text consoles either. Most I can aspire to, I think, is to gather data from right before the freeze happens…and check it after I reset the computer.

      • folekaule@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        14 hours ago

        I see. My concern was with security scanning tools often put on computers by enterprise IT departments but it sounds like that’s not the case here.

        In your situation, assuming you’re not finding what you seek with journalctl, I think I would use a tool like vmstat or sar to collect periodic snapshots of CPU, memory, and io. You can tell it to collect data every X seconds and tee that to a file. After you reboot you can see what happened leading up to the crash. You should be able to import the data into a spreadsheet or something for analysis, but it’s not very intuitive and you’ll need to consult man pages for the options and how to interpret them.

        There are a lot of good suggestions in this thread. I would lean towards a hardware or driver issue, maybe bad RAM. Unfortunately these things take a lot of trial and error to figure out.

  • doodoo_wizard@lemmy.ml
    link
    fedilink
    arrow-up
    1
    ·
    edit-2
    1 day ago

    Neon doesn’t force you to actually update the ubuntu it’s built on unless you manually do it iirc. Update your shit and report back.

    Once you decide not to try that, top, btop atop or htop can tell you the amount of ram you’re using. They will all also tell you how your disk writes are doing.

    It doesn’t sound like you have a ram issue, it sounds like you have a disk issue. First and foremost, once you’ve verified that you have plenty of memory available using a tool described above, expand your windows vm to 8gb. Windows would aggressively page if it had only 4gb and windows in a vm will also aggressively page when it only has 4gb, except it has to go through kvm to access those qcows.

    It sounds like you have way too many tabs open. Close some and see if that helps you out. You can highlight a bunch of them by selecting one and ctrl-shift clicking on another one to get every tab in between. Right click and add to bookmarks then close them.

    Next, use spinrite with I think a level 3 scan on all your nvme drives. It shaves a write cycle off the top (you have hundreds of thousands at the very least) but in return makes everything fast again. Flash memory becomes less responsive as read cycles on a block pile up until it’s rewritten.