For one month beginning on October 5, I ran an experiment: Every day, I asked ChatGPT 5 (more precisely, its “Extended Thinking” version) to find an error in “Today’s featured article”. In 28 of these 31 featured articles (90%), ChatGPT identified what I considered a valid error, often several. I have so far corrected 35 such errors.


Yes and no. I have enjoyed reading through this approach, but it seems like a slippery slope from this to “vibe knowledge” where LLMs are used for actually trying to add / infer information.
The issue is that some people are lazy cheaters no matter what you do. Banning every tool because of those people isn’t helpful to the rest of humanity.
Don’t discard a good technique cause it can be implemented poorly.