Model Evaluation and Threat Research is an AI research charity that looks into the threat of AI agents! That sounds a bit AI doomsday cult, and they take funding from the AI doomsday cult organisat…
Reading the paper, AI did a lot better than I would expect. It showed experienced devs working on a familiar code base got 19% slower.
It’s telling that they thought they had been more productive, but the result was not that bad tbh.
I wish we had similar research for experienced devs on unfamiliar code bases, or for inexperienced devs, but those would probably be much harder to measure.
I don’t understand your point. How is it good that the developers thought they were faster? Does that imply anything at all in LLMs’ favour? IMO that makes the situation worse because we’re not only fighting inefficiency, but delusion.
20% slower is substantial. Imagine the effect on the economy if 20% of all output was discarded (or more accurately, spent using electricity).
Reading the paper, AI did a lot better than I would expect. It showed experienced devs working on a familiar code base got 19% slower. It’s telling that they thought they had been more productive, but the result was not that bad tbh.
I wish we had similar research for experienced devs on unfamiliar code bases, or for inexperienced devs, but those would probably be much harder to measure.
1% slowdown is pretty bad. You’d still do better just not using it. 19% is huge!
I don’t understand your point. How is it good that the developers thought they were faster? Does that imply anything at all in LLMs’ favour? IMO that makes the situation worse because we’re not only fighting inefficiency, but delusion.
20% slower is substantial. Imagine the effect on the economy if 20% of all output was discarded (or more accurately, spent using electricity).
Yes it suggest lower cognitive load.
I’m not saying it’s good, I’m saying I expected it to be even worse.