Develop with AI: Balancing speed and confidence without becoming a bottleneck

9 min lesen

Prasenjit Sarkar photo

Prasenjit Sarkar

Solutions Marketing Manager

Viktor Vorona photo

Viktor Vorona

Staff Fullstack Developer

A few months ago, my team finished an experiment: build a complex production application entirely with AI coding assistance. The app shipped. It works. But the lessons from getting there changed how I think about software development more than any tool I've used in the last decade.

This is what we learned.

From novelty to necessity

My relationship with AI coding agents has evolved in roughly three phases, and I suspect many engineers have lived through the same arc.

In 2024, ChatGPT was a curiosity. Great for explaining concepts, drafting emails, occasionally rubber-ducking a problem. As an actual coding tool? It felt like a toy.

Early 2025, the wave of IDE-integrated assistants arrived—Cursor, Windsurf, and the new Copilot. These were interesting. Better autocomplete (just keep on pressing tab), better in-context suggestions. But fundamentally, the way we developed software hadn't changed. We were still writing code, just slightly faster.

The second half of 2025 was when something genuinely shifted. Claude Code arrived. And after a few weeks of using it seriously, the realization landed: this is real. We are in a new era.

Today, in 2026, I'd ask you the same question that I keep asking myself: what is the most important tool in software development?

A year ago, the answer was obvious—the IDE. That's where the code was produced. Today, for me, the IDE has become a heavy text viewer with Git integration. I manually change maybe 1% of the code that ships. The actual software development has moved somewhere else.

How does AI-assisted development create a code review bottleneck?

Speed creates its own problem, and the whole industry is starting to feel it.

Gartner put it cleanly: "AI-assisted development has created a critical review bottleneck as engineering teams generate dramatically more new code without increasing code review capacity."

The math is brutal. The AI writes code in minutes. Reviewing it well takes hours. And so engineers are left staring at what looks like a false choice:

  • Review everything, and you become the bottleneck. The speed advantage you bought with AI? Gone—burned in your PR queue.
  • Review nothing, and you ship fast, but bugs accumulate, vulnerabilities slip through, tech debt piles up and confidence in the codebase quietly collapses.

It is a lose-lose situation, so we tried something different.

What we were actually building

The application we were building is a desktop app that complements CLI agents like Claude Code—bringing UI clarity to their power and adding tools to turn casual users into power users. Think of it as an IDE flipped on its head: the focus is reviewing AI-generated code instead of writing code yourself.

This is not a small app. It's an Electron application with React, multiple backend services, a built-in terminal, embedded SonarQube for IDE in Connected Mode—and it has to work on Windows, MacOS, and Linux. This had real production complexity.

We shipped extremely fast. Features that would have taken days landed in hours. The speed was exhilarating.

But the questions wouldn't go away.

Should we even review the code if the app is working? If yes—how deeply? If not—where does the confidence come from that this won't fail in production?

And then, gradually, the speed itself started to wobble. The more complex the app became, the harder it was to ship without breaking something else. More iterations to land a feature correctly. More time debugging code the AI had written the day before.

That's when the fundamental thing clicked.

Every session is a new software developer

Every session, it's a different “person”. It sees your codebase for the first time. It doesn't remember what it did last time. It doesn't remember why something was built a certain way. It has no scar tissue, no shared context, no muscle memory for your patterns.

When you onboard a new engineer to a team, you accept that they'll need weeks—sometimes months—to internalize the codebase. With AI, you're onboarding a new engineer every single session. And you can't sit with them at lunch and explain the history of that one weird module.

Once we saw the problem in those terms, we stopped. We did a big refactor. We restructured. We rethought the architecture from the ground up.

And through that process, the real question crystallized: how do we bring consistency to code produced by a different person every single time?

The answer is not better prompts

This part has surprised me.

The answer isn't a polished CLAUDE.md. It isn't a perfect skills file. It isn't a smarter system prompt. Those things help—they're table stakes—but they aren't where the leverage is.

The leverage is in the codebase itself.

If you think of the AI as a very talented software developer encountering your code for the first time every session—and that is literally what's happening—then you start to understand how critical the codebase becomes as a teaching artifact. It's not just code. It's documentation. It's the only context the AI reliably has.

A human teammate adapts to your team's quirks over time. The AI doesn't get that runway. Every session starts from zero. So your codebase has to do the work that, with humans, gets done by tribal knowledge.

Three rules that actually worked

Here's what made the biggest difference for us.

1. A readable architecture and file structure

Before AI, you structured files so you and your teammates could find things. You had internal conventions, and people learned them. Now, the AI starts from scratch every time. Your structure has to be readable as a book—obvious enough that someone with no context can navigate it cold.

2. Strict size limits

We enforced a hard rule: no file over 500 lines. This actually came directly from a SonarQube rule that for other projects we'd been ignoring for years. We stopped negotiating with it. We stopped saying "600 isn't that much worse than 500."

The difference in code quality was immediately noticeable. Smaller files mean smaller blast radius for changes, more focused context windows, and clearer responsibilities. The AI gets the file, understands the file, edits the file—instead of getting lost in a thousand-line god class.

3. Absolute consistency

If one component is built one way and another is built another way, the AI will struggle to identify the right pattern. One of our concrete pain points: we used react-query in some places and raw fetch calls in others. When your patterns are inconsistent, the AI picks the wrong one—randomly, or worse, the worst one. When patterns are consistent, the AI follows them naturally.

We built our application from scratch and still had to refactor mid-flight. If you have a large existing codebase, the good news is that refactoring with AI is cheaper than it's ever been. Pick a part. Clean it up. Use it as a reference for everything that follows. See if it moves the needle. Adapt and refactor more.

And to be clear, we're not talking about deep optimizations or grand architectural shifts. We're talking about file structure, file names, and extracting functions. The basics. The boring things. The things that, it turns out, matter enormously.

What is the right way to review code when AI is writing most of it?

Once the architecture is in shape—and this is an ongoing practice, not a one-time event—the next question is how to keep it that way. Code review is the obvious answer. But what does code review look like when most of the code is being written by an AI?

Pull up any PR. What's the most important piece of information on that screen?

A year ago, I'd have said the diff itself. The actual code. I read every line.

Today, it's the list of changed files.

I know my architecture. I know what I asked the AI to do. So when I look at the result, the file list tells me almost everything I need to know:

  • If the AI changed files I didn't expect, I need to understand why.
  • If my request was simple but the AI changed 100 lines, that's suspicious. Something went sideways.
  • If a file that was 200 lines is now 1,000, the structure is decaying, not improving. Something needs to be split out.
  • If it touched something sensitive—authentication, payments, data access, state management, whatever matters in your domain—that's where I slow down and review carefully. Not everything. Just the parts that matter.

At some point, you need a consistent way to verify whether the change is safe and aligned with your standards.

I don't need to read every line to catch these problems. The shape of the change tells me whether the AI understood the intent and whether the architecture is still clean after the change landed.

The verification layer

For everything else—duplication, missing coverage, code smells, the long tail of small problems—I delegate to SonarQube as my verification layer.

Will it catch everything? No.

Will it catch enough that I'm confident shipping? Yes.

To be clear, this doesn't mean I never read the code. The code is still my responsibility. But it's also my responsibility to know what's sensitive and needs a closer look, and what's good enough if SonarQube says it meets quality standards. The job of the reviewer has shifted from line-by-line scrutiny to triage—knowing where to spend attention and where to trust the safety net.

How do speed and code quality work together when using AI?

When this clicks, you get a feedback loop that compounds in your favor.

Clean architecture guides the AI to follow patterns and write better code. SonarQube catches what slips through. You review decisions, not details. You ship with confidence at speed. And because the codebase remains healthy with every change, the AI continues writing good code.

What scaled for us wasn’t just generating faster. It was creating a tighter loop: guide the AI with a cleaner codebase, verify what changed, and fix issues before they compound.

Speed and confidence stop being in tension. With the right setup, they live together.

Three things to think about starting Monday

If you take nothing else from this, take these three.

1. Look at your code differently. What if you saw your codebase for the first time, every single day? How easy is it to navigate? How obvious is it where each piece of the application belongs? If the answer isn't "very," your AI is paying that tax on every change.

2. The LLM is not your junior developer. Stop thinking of it that way. It's a new, very talented person every time you start a session. It doesn't remember yesterday. It doesn't carry over context. Your codebase has to teach them.

3. Code quality is not optional. You might be tempted to think the AI will figure it out anyway. From my experience, that's wishful thinking. The quality of AI-generated code depends directly on the quality of your existing codebase—and on every single change applied to it. Better quality in, better results out.

The bottleneck is real. But it's not a choice between speed and confidence. It's a choice about where you invest your engineering effort: in writing more code, or in shaping a codebase that lets AI write good code on your behalf.

I know which one I'm betting on.

Schaffen Sie Vertrauen in jede Zeile Code

Integrieren Sie SonarQube in Ihren Workflow und beginnen Sie noch heute mit der Suche nach Schwachstellen.

Rating image

4.6 / 5