• Hotzilla@sopuli.xyz
    link
    fedilink
    arrow-up
    1
    arrow-down
    1
    ·
    1 day ago

    GPT-5 without “thinking” mode got the answer wrong.

    GPT-5 with thinking answered:

    Here are the 21 US states with the letter “R” in their name:

    Arizona, Arkansas, California, Colorado, Delaware, Florida, Georgia, Maryland, Missouri, Nebraska, New Hampshire, New Jersey, New York, North Carolina, North Dakota, Oregon, Rhode Island, South Carolina, Vermont, Virginia, West Virginia.

    It wrote a script that verified it while doing the “thinking” (feeding the hallusinations back to the LLM)

    • skuzz@discuss.tchncs.de
      link
      fedilink
      arrow-up
      3
      ·
      20 hours ago

      “Thinking” mode is just sending wave upon wave of GPUs at the problem until the killbots hit their pre-set kill count. One could roughly simulate that by not using thinking mode and just feeding the answer and question back to the LLM repeatedly until it eventually gets an answer that might be “right”. These companies have hit a technological wall with LLMs and will do anything to attempt to look like they still have forward inertia.

      • jj4211@lemmy.world
        link
        fedilink
        arrow-up
        3
        ·
        19 hours ago

        Well, not quite, because they don’t have criteria for ‘right’.

        They do basically say ‘generate 10x more content than usual, then dispose of 90% of it’, and that surprisingly seems to largely improve results, but at no point is it ‘grading’ the result.

        Some people have bothered to provide ‘chain of thought’ examples and even when it’s largely ‘correct’, you may see a middle step be utterly flubbed in a way that should have fouled the whole thing, but the error is oddly isolated and doesn’t carry forward into the subsequent content, as would be the case in actual ‘reasoning’.