For one month beginning on October 5, I ran an experiment: Every day, I asked ChatGPT 5 (more precisely, its “Extended Thinking” version) to find an error in “Today’s featured article”. In 28 of these 31 featured articles (90%), ChatGPT identified what I considered a valid error, often several. I have so far corrected 35 such errors.

  • Echo Dot@feddit.uk
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    1
    ·
    9 hours ago

    But we don’t know what the false positive rate is either? How many submissions were blocked that shouldn’t have been, it seems like you don’t have a way to even find that metric out unless somebody complained about it.

    • Ace@feddit.uk
      link
      fedilink
      English
      arrow-up
      3
      ·
      edit-2
      9 hours ago

      I can then check through anything it flags and manually moderate.

      It isn’t doing anything automatically; it isn’t moderating for me. It’s just flagging submissions for human review. “Hey, maybe have a look at this one”. So if it falsely flags something it shouldn’t, which is common, I simply ignore it. And as I said, that error rate is moderate, and although I haven’t checked the numbers of the error rate, it’s still successful enough to be quite useful.