For one month beginning on October 5, I ran an experiment: Every day, I asked ChatGPT 5 (more precisely, its “Extended Thinking” version) to find an error in “Today’s featured article”. In 28 of these 31 featured articles (90%), ChatGPT identified what I considered a valid error, often several. I have so far corrected 35 such errors.


Finding inconsistencies is not so hard. Pointing them out might be a -little- useful. But resolving them based on trustworthy sources can be a -lot- harder. Most science papers require privileged access. Many news stories may have been grounded in old, mistaken histories … if not on outright guesses, distortions or even lies. (The older the history, the worse.)
And, since LLMs are usually incapable of citing sources for their own (often batshit) claims any – where will ‘the right answers’ come from? I’ve seen LLMs, when questioned again, apologize that their previous answers were wrong.
Which LLMs are incapable of citing sources?
All of them. If you’re seeing sources cited, it means it’s a RAG (LLM with extra bits). The extra bits make a big difference as it means the response is limited to a select few points of reference and isn’t comparing all known knowledge on a subject matter.