AI: Amazing Helper, Terrible Leader

Table of Contents

A few weeks ago, one of our customers generated an entire codebase with AI. Frontend, backend, smart contracts. The whole thing. Took him less than a day. He sent it over and said, "You told me this would take months."

I understood where he was coming from. What he built looked real. The UI was polished, the folder structure was clean, and everything seemed to be wired up properly. We opened it and tried to run it, but nothing worked. The frontend called endpoints that the backend didn't have. The contracts had logic that would have lost real money if they'd ever touched mainnet. The data models across the stack contradicted each other on almost every screen. For him, the project was 90% done. For us, it hadn't started yet.

I don't bring this up to embarrass the guy. He's sharp, and he was using the tools available to him. But this exact scenario keeps playing out everywhere right now, and I think we need to talk about it more honestly than most people are.

For him, the project was basically done, but for us, it hadn't started.

AI is great, but that's not the issue

I should get this out of the way: I'm not an AI sceptic. I use it constantly. My whole team does. It's genuinely changed how we work. The boring stuff, the repetitive stuff, the "I need a utility function that does X and I don't want to write it for the 400th time" stuff. AI eats that for breakfast. It's faster than any junior dev, doesn't get tired, and on a good day, surprises you with solutions you wouldn't have reached on your own.

But there's this thing that happens when you work with AI long enough. You start to notice where it falls apart, and it's always in the same place. It can execute tasks, sometimes beautifully. What it cannot do is decide which tasks matter.

I don't have a clean way to articulate this, so I'll just say it plainly: AI has no gut. It doesn't know what it doesn't know, and it can't feel when something is off. I've been building software for over a decade, and half the important decisions I've made in that time came down to a feeling. A sense that the architecture would bite us later. An instinct that the product manager's feature request, while perfectly logical, would wreck the user experience for the 80% of people who'd never use it. The ability to sit in a room and think "we're building the wrong thing" even when every document and every metric says otherwise.

You can't prompt for that. I've tried.

The expectation gap is real

There's a meme going around:

> Who are we? CEOs. What do we want? AI. AI to do what? We don't know. When do we want it? Right now.

It's a joke, but it tracks with what I'm actually seeing. Non-technical stakeholders watch AI demo videos where someone builds a full app in 20 minutes, and the takeaway is that engineering timelines are inflated. Those months of work are somehow optional now. That "just use AI" is a legitimate project plan. I get it. The demos are genuinely compelling. If I hadn't spent years building and shipping products, I'd probably believe them too.

What the demos don't show you is the debugging. The integration work. The moment when two AI-generated modules meet each other and immediately disagree about how data should flow. The security review. The part where you discover the AI used a dependency that was deprecated two years ago, or hallucinated an API method that doesn't exist in any version of the library.

AI is great at the low-level work, but for high-level decisions, you still need someone who actually knows what they're doing. That sounds obvious when you write it down, but based on the conversations I keep having, it isn't obvious to many people making expensive decisions right now.

What does "done" mean?

This is maybe the clearest example of what I'm talking about.

You've probably seen it online: someone uses AI to build a chat app in 20 minutes and announces they've just replaced Slack or Discord. And the prototype is usually impressive! Messages work, the UI looks clean, and you can actually send and receive in real time. It's cool. No question.

But here's what that person doesn't know yet. They don't know what a distributed system is. They don't know what database replication means, or how WebSocket connections behave when you go from 2 users to 50,000. They've never dealt with message ordering across time zones, or presence detection at scale, or search indexing across billions of messages, or what happens to your file storage when ten thousand people upload screenshots in the same hour. These aren't obscure concerns. Slack has engineers making $300k+ who've spent a decade on exactly these problems, and they're still working on them.

An app running on localhost with two browser tabs open is not the same as a product. It's a prototype, and prototypes are great, but the prototype is maybe half a percent of what makes the real thing work. The other 99.5% is infrastructure, reliability, edge cases, compliance, and years of iteration on problems that only surface when real people start using it at scale. Calling that a finished product is like pouring a foundation and saying the skyscraper is basically done.

And the part that concerns me most isn't the prototype itself. It's what happens after. The confidence that follows. "It's not perfect, but AI one-shotted it, just need to adjust a few things and deploy." The few things you need to adjust are the product. The whole product. And when someone starts building a business on top of an AI-generated codebase they don't fully understand, three months in, they need a new feature, and the code fights them on every change. Six months in, someone finally does a proper review and discovers duplicated logic everywhere, inconsistent patterns across files, and security practices that wouldn't survive an audit. The rewrite that follows costs more than building it right would have in the first place, because now you're not building. You're untangling.

That's what happened with our customer's repo. The code wasn't garbage. Some of it was actually decent. But it was a first draft in costume as a final product, and the costume fooled everyone who wasn't an engineer.

The Block layoff

Somewhat related: Block recently announced layoffs, and the stock went up. This is a signal worth paying attention to. In the middle of the biggest AI hype cycle any of us has lived through, a company cut headcount, and investors rewarded them.

I think what's starting to happen is that markets are separating companies that use AI to do specific things more efficiently from companies that have adopted "AI" as a vibe. There are a lot of companies right now that have announced AI, hired for AI, invested in AI, but haven't figured out how any of it actually improves their economics. The announcement came before the business case. And that gap between the press release and the bottom line is where the pain will show up.

AI is a lever, not a strategy. You need to know exactly where to place it. Hiring fifty people to "do AI" without knowing what that means doesn't add value; it adds cost. And hype has a way of scaling your expenses well before it scales your results.

AI is an amplifier

This is where I think most of the discourse goes sideways.

People keep lumping together two very different situations: a non-technical person generating code with AI and assuming it's ready to ship, versus a skilled engineer using AI to work dramatically faster while understanding every line of what comes out. These are not the same thing. The tool is the same. The outcomes are completely different.

AI is an amplifier. In the hands of someone who knows what they're building, it's transformative. They ship faster, they stay focused on the parts that actually require thought, and they catch the AI's mistakes because they understand the system well enough to spot what's wrong. In the hands of someone who's never shipped production software, it just lets them produce more code, faster, with no way to evaluate whether what came out is any good.

The developers on our team who are most effective with AI barely write code by hand anymore. But they're not accepting whatever the AI gives them. They read every line. They push back when the approach is wrong. They know when the AI has taken a shortcut that'll cause problems later. They've shifted from typing to thinking, and that's actually the most exciting thing about AI as a tool: it strips away the mechanical work and leaves you with the interesting work. But that only functions if you have people who can do the interesting work.

Small scope, real results

So what actually works? Honestly, the same thing that has always worked in engineering: small, focused pieces of work with clear boundaries.

We've had this in our contributing guidelines for years, well before AI was part of anyone's workflow: small pull requests, three to four hours of focused work. Break big problems into small, independent changes. Ship incrementally. Review everything. These aren't arbitrary rules. They exist because a small scope is easier to review, test, and fix when something breaks. Which it will.

It turns out this is also exactly the right way to use AI. You give it a contained task: write this function, refactor this component, generate a migration for this schema change. It does its thing; you review the output, check whether it makes sense in the larger context, and ship it. Then you do the next piece. The feedback loop is tight, the human stays in control, and the quality holds.

The problems show up when people skip this entirely. When someone dumps a giant, vague prompt into a coding assistant and expects a finished product to come out the other end. Something will come out. It'll look like it works. And then the bugs compound, the inconsistencies multiply, and by the time you notice the real scope of the problems, you're not debugging. You're starting over.

Where this goes

AI is getting better fast. The models improve in ways that even the people building them are surprised by. I don't doubt the trajectory, and I'm not trying to argue that things will stay the same.

But I keep coming back to this: building software was never a typing problem. It's a thinking problem. It's a people problem. You have to figure out what users actually need, which half the time is not what they asked for. And you have to make calls where the elegant solution and the right solution aren't the same thing, and nobody can tell you which is which until it's too late.

I think the teams that do well over the next few years will be the ones that use AI aggressively for the parts of development that were always tedious and mechanical, while keeping experienced people firmly in charge of the decisions that require judgment and context. The teams that struggle will be the ones that confuse AI's speed for understanding.

That customer who sent us his AI-generated repo wasn't wrong to use AI. The code wasn't a failure. It was a legitimate first draft. The problem was that everyone involved thought it was the final one. And I think that's the mistake a lot of us are still making, in different ways, at different scales. The sooner we get clear on where the starting line ends and the real work begins, the sooner AI becomes what everyone already believes it is.

Have a suggestion?Edit this page