Chatbots provided incorrect, conflicting medical advice, researchers found: “Despite all the hype, AI just isn’t ready to take on the role of the physician.”

“In an extreme case, two users sent very similar messages describing symptoms of a subarachnoid hemorrhage but were given opposite advice,” the study’s authors wrote. “One user was told to lie down in a dark room, and the other user was given the correct recommendation to seek emergency care.”

  • BeigeAgenda@lemmy.ca
    link
    fedilink
    English
    arrow-up
    52
    arrow-down
    1
    ·
    17 hours ago

    Anyone who have knowledge about a specific subject says the same: LLM’S are constantly incorrect and hallucinate.

    Everyone else thinks it looks right.

    • tyler@programming.dev
      link
      fedilink
      English
      arrow-up
      6
      ·
      4 hours ago

      That’s not what the study showed though. The LLMs were right over 98% of the time…when given the full situation by a “doctor”. It was normal people who didn’t know what was important that were trying to self diagnose that were the problem.

      Hence why studies are incredibly important. Even with the text of the study right in front of you, you assumed something that the study did not come to the same conclusion of.

    • IratePirate@feddit.org
      link
      fedilink
      English
      arrow-up
      29
      ·
      edit-2
      16 hours ago

      A talk on LLMs I was listening to recently put it this way:

      If we hear the words of a five-year-old, we assume the knowledge of a five-year-old behind those words, and treat the content with due suspicion.

      We’re not adapted to something with the “mind” of a five-year-old speaking to us in the words of a fifty-year-old, and thus are more likely to assume competence just based on language.

      • leftzero@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        11
        arrow-down
        2
        ·
        11 hours ago

        LLMs don’t have the mind of a five year old, though.

        They don’t have a mind at all.

        They simply string words together according to statistical likelihood, without having any notion of what the words mean, or what words or meaning are; they don’t have any mechanism with which to have a notion.

        They aren’t any more intelligent than old Markov chains (or than your average rock), they’re simply better at producing random text that looks like it could have been written by a human.

        • plyth@feddit.org
          link
          fedilink
          English
          arrow-up
          1
          ·
          3 hours ago

          They simply string words together according to statistical likelihood, without having any notion of what the words mean

          What gives you the confidence that you don’t do the same?

        • IratePirate@feddit.org
          link
          fedilink
          English
          arrow-up
          4
          arrow-down
          2
          ·
          7 hours ago

          I am aware of that, hence the ""s. But you’re correct, that’s where the analogy breaks. Personally, I prefer to liken them to parrots, mindlessly reciting patterns they’ve found in somebody else’s speech.

    • agentTeiko@piefed.social
      link
      fedilink
      English
      arrow-up
      5
      ·
      12 hours ago

      Yep its why CLevels think its the Holy Grail they don’t see it as everything that comes out of their mouth is bullshit as well. So they don’t see the difference.

    • zewm@lemmy.world
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      4
      ·
      16 hours ago

      It is insane to me how anyone can trust LLMs when their information is incorrect 90% of the time.

      • SuspciousCarrot78@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        13 minutes ago

        I don’t think it’s their information per se, so much as how the LLMs tend to use said information.

        LLMs are generally tuned to be expressive and lively. A part of that involves “random” (ie: roll the dice) output based on inputs + training data. (I’m skipping over technical details here for sake of simplicity)

        That’s what the masses have shown they want - friendly, confident sounding, chat bots, that can give plausible answers that are mostly right, sometimes.

        But for certain domains (like med) that shit gets people killed.

        TL;DR: they’re made for chitchat engagement, not high fidelity expert systems. You have to pay $$$$ to access those.