Chatbots provided incorrect, conflicting medical advice, researchers found: “Despite all the hype, AI just isn’t ready to take on the role of the physician.”
“In an extreme case, two users sent very similar messages describing symptoms of a subarachnoid hemorrhage but were given opposite advice,” the study’s authors wrote. “One user was told to lie down in a dark room, and the other user was given the correct recommendation to seek emergency care.”



Anyone who have knowledge about a specific subject says the same: LLM’S are constantly incorrect and hallucinate.
Everyone else thinks it looks right.
That’s not what the study showed though. The LLMs were right over 98% of the time…when given the full situation by a “doctor”. It was normal people who didn’t know what was important that were trying to self diagnose that were the problem.
Hence why studies are incredibly important. Even with the text of the study right in front of you, you assumed something that the study did not come to the same conclusion of.
A talk on LLMs I was listening to recently put it this way:
If we hear the words of a five-year-old, we assume the knowledge of a five-year-old behind those words, and treat the content with due suspicion.
We’re not adapted to something with the “mind” of a five-year-old speaking to us in the words of a fifty-year-old, and thus are more likely to assume competence just based on language.
LLMs don’t have the mind of a five year old, though.
They don’t have a mind at all.
They simply string words together according to statistical likelihood, without having any notion of what the words mean, or what words or meaning are; they don’t have any mechanism with which to have a notion.
They aren’t any more intelligent than old Markov chains (or than your average rock), they’re simply better at producing random text that looks like it could have been written by a human.
What gives you the confidence that you don’t do the same?
human: je pense
llm: je ponce
I am aware of that, hence the ""s. But you’re correct, that’s where the analogy breaks. Personally, I prefer to liken them to parrots, mindlessly reciting patterns they’ve found in somebody else’s speech.
Yep its why CLevels think its the Holy Grail they don’t see it as everything that comes out of their mouth is bullshit as well. So they don’t see the difference.
It is insane to me how anyone can trust LLMs when their information is incorrect 90% of the time.
I don’t think it’s their information per se, so much as how the LLMs tend to use said information.
LLMs are generally tuned to be expressive and lively. A part of that involves “random” (ie: roll the dice) output based on inputs + training data. (I’m skipping over technical details here for sake of simplicity)
That’s what the masses have shown they want - friendly, confident sounding, chat bots, that can give plausible answers that are mostly right, sometimes.
But for certain domains (like med) that shit gets people killed.
TL;DR: they’re made for chitchat engagement, not high fidelity expert systems. You have to pay $$$$ to access those.