The Bot Tax

How to measure the hidden cost of AI review loops before bots burn time, tokens, and attention

Apr 24, 2026

black digital device at 8 — Photo by Harrison Broadbent on Unsplash

Great engineering managers don’t just ship—they build organizations capable of shipping. But that meta-skill is rarely named, let alone taught.

Our book Engineering Manager’s Compass focuses on the unspoken rules of the role: how to read organizational structures, how to turn messy metrics into real decisions, and how to build teams that deliver without you holding everything together.

Get the book

It’s 2026. A developer opens a ticket, types a prompt, and walks away to grab a coffee. An agent vibecodes the feature, opens a merge request, and pings a reviewer. The reviewer’s bot reads the diff, leaves a dozen suggestions, and waits. The author’s bot responds to the bot, rewrites half the patch, and pushes again. Somewhere in the background, CI spins up, a security bot chimes in, and an architecture bot asks whether we’ve considered the implications on the service boundary.

The merge request eventually lands. Everyone is happy. Nobody asks the obvious question: how much did this actually cost?

The invisible meter

Every step in that pipeline burns three things: time, tokens, and attention. Time is the easy one to talk about and the hardest one to measure, because most of it happens while humans are doing something else. Tokens have a price tag, but it’s spread across half a dozen vendor invoices and rarely attributed back to the change that triggered them. Attention is the one nobody logs at all.

We’ve built a system where the marginal cost of kicking off another round of review feels like zero. It isn’t. It’s just been moved off the human’s calendar and onto a GPU somewhere, which is exactly the kind of cost that doesn’t show up until the quarterly bill lands.

Three ways to ship the same change

Let’s imagine the same small feature, delivered three different ways.

1. The human-driven version

One engineer writes the code. Another engineer reviews it. They have a short conversation in the PR, maybe a call if it gets spicy. The change lands.

Total cost: two humans, a couple of hours each, maybe a day of elapsed time.

Artifacts produced: the code, a short review thread, a shared mental model of the change in two heads.

2. The pair-programming version

Two engineers sit together (or on a call) and write the code as one. Review is continuous. The change lands with no formal PR review, or a rubber-stamp one.

Total cost: two humans, roughly the same hours, but fully overlapping.

Artifacts produced: the code, two heads that deeply understand it, and a transfer of skill in both directions.

3. The bot-mediated version

An agent writes the code. A review bot reviews it. The author’s bot responds to the review bot. A human skims the final result and clicks merge.

Total cost: minimal human time, substantial machine time, a long trail of generated artifacts nobody will ever read again.

Artifacts produced: the code, a transcript of a conversation between machines, and zero heads that deeply understand what just shipped.

What we’re not measuring

The uncomfortable part isn’t that option 3 is slower or more expensive in raw terms. It might not be. The uncomfortable part is that we don’t know, because almost no one is measuring:

Wall-clock time from ticket to merge, across all three modes.
Total compute and token spend per merged change.
Rework rate: how often does a bot-mediated change come back as a bug or a follow-up?
Knowledge retention: six weeks later, does anyone on the team know how this code works?
Review quality: are the bot’s comments catching things humans would have caught, or are they noise that trains humans to rubber-stamp?

Without these numbers, every debate about AI-assisted development is vibes versus vibes.

The back-and-forth problem

There’s a specific failure mode worth naming. When two bots talk to each other across an PR, the conversation can go on for a surprisingly long time. Each suggestion triggers a rewrite. Each rewrite triggers new suggestions. The diff churns. The humans, who were supposed to be the circuit breakers, have long since tuned out because “the bots are handling it.”

In a human review, social cost caps the loop. Nobody wants to be the reviewer who left seventeen nits, so they don’t. Bots have no such governor. They will happily optimize a function eight times in a row, each time producing a “better” version by some metric that nobody chose.

This is the bot tax: the cost of a loop that has no natural stopping condition.

What you can actually do

I don’t think the answer is “turn off the bots.” The answer is to treat them like any other expensive system: instrument them, attribute their costs, and put a human in the loop at the points that matter.

A few things worth trying:

Measure per-PR cost. Time, tokens, and review rounds. Make it visible on the PR itself.
Cap the loop. After N bot-to-bot rounds, a human has to step in. Not as a rubber stamp but as a decision-maker.
Compare modes. Pick a sample of changes and deliver them three ways (solo, pair, bot-mediated). Measure outcomes at one week, one month, and one quarter.
Track rework. A change that ships fast and comes back as three bugs was not cheap.
Protect pairing. It’s the mode most at risk of being quietly eliminated, and the one that produces the most durable outcome: humans who understand the system.

Conclusion

The promise of AI-assisted development was that it would free humans from toil. The risk is that we’ve built a toil generator that runs on someone else’s bill and on our team’s understanding of its own code.

Before we declare the bot-mediated workflow a win, somebody has to do the math. Not the vendor. Not the bot. Us.

Incremental forgetting

Discussion about this post

Ready for more?