AI Coding Is Creating a Code Review Crisis

Feb 13

AI is fundamentally changing how software developers work. We are rapidly moving from a world where engineers primarily write code to one where they primarily review it. I’m increasingly interested in what happens next.

Role Reversal

In traditional software development, engineers write code and (at least in mature organizations) that code is subsequently reviewed by other engineers who were ideally domain experts but hopefully at least passingly familiar with the task at hand.

In AI-assisted development, large blocks of code are instantly generated by LLMs. Entire features, modules, and refactors appear in seconds. Big changes that used to take days can now be conjured into being by a single prompt.

Although we can argue that the “author” is the engineer who wrote the prompts, in practice that engineer’s role increasingly resembles the secondary reviewer in traditional development. Namely, they must:

Validate
Sanity-check
Approve
Merge

(At least ideally that’s what they do, but more on that in a second.)

On the surface, all of this sounds efficient. The tedious act of typing in code has been optimized away and replaced with the responsibility of reading – reviewing – the code.

Unfortunately, reviewing is a different cognitive task from authoring. And very few software developers have been trained or learned to do it well.

Verification Debt

A recent article in IT Pro noted that even though developers don’t fully trust AI-generated code, fewer than half rigorously review it before committing. If you think that sounds risky, you’re obviously right. If the volume of code output increases dramatically but code review rigor doesn’t increase in proportion, your teams will accumulate verification debt.‍ ‍

Verification debt is worse than messy code. Developers (or even an AI) can easily identify messy code if only they bother to look at it. But verification debt erodes your confidence that the system behaves as intended.

If you follow this through to its logical conclusion, the current state of AI-assisted software development turbocharges the rate at which you lose trust in the state of your system.

Research Warns Us

I’m looking forward to seeing what I’m sure will be a flood of research into the quality of AI-generated code. But old-school research can give us a lot of insights into what to expect.

Software engineering researchers have studied code review effectiveness for a long time. And the conclusions are very consistent:

Smaller changes are easier to review.
Large diffs reduce defect detection rates.
Reviewer expertise and code familiarity matter a lot.
Cognitive load limits how much a reviewer can effectively evaluate, particularly in a single sitting.

A lot of this we have known since the 1970s. None of it is controversial. Yet as an industry, we’re plowing forward directly against it.

AI Workflows Increase Cognitive Load

Agent-based and “vibe coding” approaches often produce:

Huge multi-file patches.
Refactors bundled with new functionality.
Architectural changes hidden among verbose generated code.
Code that looks clean on a micro-level but masks hidden complexity.

These days, instead of reviewing five small intentional changes, AI tools are handing software developers a huge artifact and asking them to determine whether it’s correct. Or largely abdicate the responsibility, at least if the IT Pro-cited survey is correct.

This flies in the face of what review research says leads to software quality.

In just a couple of years, the AI coding ecosystem has produced workflows that increase output while degrading the conditions under which humans are effective reviewers. This is a huge systems failure and it seems like the current set of reward systems for AI-assisted development are often structured to reward it.

Most Developers and Organizations Aren’t Great at Review in the First Place

Anyone who has worked at Google and subsequently worked at not-Google can tell you that there are significant differences among organizations in the review discipline. At Google, for example, review culture is formalized and heavily reinforced:

Small changes are encouraged (and, really, socially enforced).
Review is mandatory.
Tooling reinforces the process.
Doing reviews is a core responsibility for all engineers.

Most organizations don’t operate that way, and therefore most software developers operating today are more familiar with environments where review is:

A time-pressured approval.
A quick scan during a coffee break.
A rubber stamp on somebody else’s change.

So when you combine:

A weak review culture
Developers inexperienced with the review discipline
AI that creates
- Dramatically more code volume
- Bigger diffs
- Overconfidence among many engineers

What you have is a very predictable failure mode.

This is an Executive Management Problem

Recently I’ve seen more software developers on the cutting edge of AI and especially agent-driven development lamenting about cognitive overload, stress, and burnout. This isn’t simply a developer complaint. It’s a management and governance issue.

Really it’s simple math. If AI doubles code output while review capacity remains flat (or worse, shrinks due to perceived productivity gains leading to layoffs) the percentage of code receiving deep scrutiny inevitably declines.

This tradeoff won’t show up in sprint velocity metrics. It will show up downstream as:

Security incidents
Regulatory exposure
Failed audits
Subtle bugs in critical systems
Architectural chaos

This is a classic example of taking a short-term win at the cost of long-term pain.

Breaking the Cycle

Organizations that treat AI-assisted coding seriously and intentionally will:

Enforce diff size limits including for AI output. Large AI-generated patches should be broken up into atomically reviewable units.
Separate experimentation from production. Merges into production should remain incremental and independently verified.
Invest in review capability. It’s time to take review training seriously. Academic curricula should include it as a first-class subject. In the meantime, organizations should bring in experienced practitioners to establish review standards and train teams in disciplined evaluation.
Track defect escapes. And see if AI-generated code ends up being a root cause. If so, that’s a clear signal.

AI doesn’t reduce the need for engineering discipline. It increases it.

Organizations that recognize this will compound their AI-driven advantage. Those that don’t will accumulate verification debt until the system breaks down.

Let’s Connect

If your organization is adopting AI-assisted development, review discipline can’t be an afterthought. I work with engineering leaders to strengthen review processes before verification debt becomes a risk event. If this resonates, let’s talk!

Scott McMaster https://www.smcmaster.com