Just want to clarify, this is not my Substack, I’m just sharing this because I found it insightful.

The author describes himself as a “fractional CTO”(no clue what that means, don’t ask me) and advisor. His clients asked him how they could leverage AI. He decided to experience it for himself. From the author(emphasis mine):

I forced myself to use Claude Code exclusively to build a product. Three months. Not a single line of code written by me. I wanted to experience what my clients were considering—100% AI adoption. I needed to know firsthand why that 95% failure rate exists.

I got the product launched. It worked. I was proud of what I’d created. Then came the moment that validated every concern in that MIT study: I needed to make a small change and realized I wasn’t confident I could do it. My own product, built under my direction, and I’d lost confidence in my ability to modify it.

Now when clients ask me about AI adoption, I can tell them exactly what 100% looks like: it looks like failure. Not immediate failure—that’s the trap. Initial metrics look great. You ship faster. You feel productive. Then three months later, you realize nobody actually understands what you’ve built.

  • jj4211@lemmy.world
    link
    fedilink
    English
    arrow-up
    16
    ·
    15 hours ago

    An LLM can generate code like an intern getting ahead of their skis. If you let it generate enough code, it will do some gnarly stuff.

    Another facet is the nature of mistakes it makes. After years of reviewing human code, I have this tendency to take some things for granted, certain sorts of things a human would just obviously get right and I tend not to think about it. AI mistakes are frequently in areas my brain has learned to gloss over and take on faith that the developer probably didn’t screw that part up.

    AI generally generates the same sorts of code that I hate to encounter when humans write, and debugging it is a slog. Lots of repeated code, not well factored. You would assume of the same exact thing is fine in many places, you’d have a common function with common behavior, but no, AI repeated itself and didn’t always get consistent behavior out of identical requirements.

    His statement is perhaps an over simplification, but I get it. Fixing code like that is sometimes more trouble than just doing it yourself from the onset.

    Now I can see the value in generating code in digestible pieces, discarding when the LLM gets oddly verbose for simple function, or when it gets it wrong, or if you can tell by looking you’d hate to debug that code. But the code generation can just be a huge mess and if you did a large project exclusively through prompting, I could see the end result being just a hopeless mess.v frankly surprised he could even declare an initial “success”, but it was probably “tutorial ware” which would be ripe fodder for the code generators.