Model Evaluation and Threat Research is an AI research charity that looks into the threat of AI agents! That sounds a bit AI doomsday cult, and they take funding from the AI doomsday cult organisat…
Are you using Claude web chat or Claude code? Because my experience with it is vastly different eve when using the same underlying model. Clause code isn’t perfect and gets stuff wrong, but it can run the project check the output and realize it’s mistake and fix it in many cases. It doesn’t fix logic flaws, but it can fix hallucinations of library methods that don’t exist.
Are you using Claude web chat or Claude code? Because my experience with it is vastly different eve when using the same underlying model. Clause code isn’t perfect and gets stuff wrong, but it can run the project check the output and realize it’s mistake and fix it in many cases. It doesn’t fix logic flaws, but it can fix hallucinations of library methods that don’t exist.