Model Evaluation and Threat Research is an AI research charity that looks into the threat of AI agents! That sounds a bit AI doomsday cult, and they take funding from the AI doomsday cult organisat…
Same experience here, performance is mediocre at best on an established code base. Recall tends to drop sharply as the context expands leading to a lot of errors.
I’ve found coding agents to be great at bootstrapping projects on popular stacks, but once you reach a certain size it’s better to either make it work on isolated files, or code manually and rely on the auto complete.
So far I’ve only found it useful when describing bite-sized tasks in order to get suggestions on which functions are useful from the library/API I’m using. And only when those functions have documentation available on the Internet.
Same experience here, performance is mediocre at best on an established code base. Recall tends to drop sharply as the context expands leading to a lot of errors.
I’ve found coding agents to be great at bootstrapping projects on popular stacks, but once you reach a certain size it’s better to either make it work on isolated files, or code manually and rely on the auto complete.
So far I’ve only found it useful when describing bite-sized tasks in order to get suggestions on which functions are useful from the library/API I’m using. And only when those functions have documentation available on the Internet.