There’s a general trend of advice that engineers have to “carefully review” agent output. This advice is useless in the same way post-incident “be more careful” mandates are useless. Not because review isn’t important, but because it misunderstands the fundamental trade-off at play.
Lorin Hochstein frames this perfectly through his article on ETTO principle which talks about trade-offs in resilience engineering:
One of the big ideas from resilience engineering is the efficiency-thoroughness trade-off… The idea is that there’s a fundamental trade-off between how quickly we can complete tasks, and how thorough we can be when working on each individual task.
This isn’t new with AI. Software engineers have always navigated this trade-off. What changed is how much work agents can produce versus how much cognitive load reviewing that work requires. Agents shift work from writing to reviewing, and reviewing is cognitively expensive:
As any human software engineer will tell you, reviewing code is hard. It takes effort to understand code that you didn’t write. And larger changes are harder to review, which means that the more work that the agent does, the more work the human in the loop has to do to verify it.
The faster the agent works, the more review burden you carry. But the incentives push toward speed as that’s why everyone is jumping on the AI bandwagon in the software world. Amazon figured this out years ago with Jeff Bezos’s principle: “Good intentions don’t work, mechanisms do.” Their culture explicitly rejects the post-incident playbook:
No company can rely on good intentions like ‘We must try harder!’ or ‘Next time remember to…’ to improve a process, solve a problem, or fix a mistake.
This is why Amazon writes an ungodly amount of COEs, to both understand how system (or mechanisms) are going wrong and which actions can improve them. I’ve seen many services across AWS going from being flaky to becoming foundational services that some of the world’s most critical infrastructure rely on. The same logic applies to AI agent accountability. “Carefully review the output” is the 2025 version of “We must try harder!”. This is right alongside “use AI responsibly,” “validate model outputs,” and “keep a human in the loop”. These phrases all assume the problem is attention rather than structure.
Telling people to “be thorough” doesn’t change the incentive structure or reduce the cognitive load of verification. What does change it is infrastructure. Simon Willison’s excellent vibe engineering post catalogs what actually makes thoroughness scalable:
It’s also become clear to me that LLMs actively reward existing top tier software engineering practices… If your project has a robust, comprehensive and stable test suite agentic coding tools can fly with it. Without tests? Your agent might claim something works without having actually tested it at all.
Tests shift verification from manual review to automated checks. Same with good version control habits, comprehensive documentation, and continuous integration. These practices don’t eliminate the ETTO trade-off, they change its parameters. But you do get to move faster and catch more problems because the infrastructure does some of the thoroughness work. The architect/implementer pattern some engineers are using follows the same logic: one agent designs, another implements, the first reviews. You’ve artificially created checkpoints that prevent the natural slide toward pure efficiency. It’s more overhead than letting one agent run end-to-end, but it catches problems before they compound.
I’ve found that command compression follows the same pattern. Early on you’re explicit and verbose because you haven’t calibrated the trade-off yet. As you learn the agent’s failure modes, you compress and learn to be more efficient at the spots where thoroughness that matters.
The challenge is that efficiency gains are immediate and visible while thoroughness costs are delayed and diffuse. An agent that ships a feature in an hour feels productive. The subtle bug that surfaces three weeks later doesn’t obviously trace back to insufficient review. This is organizational, not individual. You can remind engineers to review carefully, but if the environment rewards velocity over correctness, the trade-off gets made at the system level:
The ETTO principle tells us there’s a trade-off here: the incentives push software engineers towards completing our development tasks more quickly, which is why we’re all adopting AI in the first place.
Hochstein’s conclusion gets it right:
In the wake of this incident, the software engineers will be reminded that AI agents can make mistakes, and that they need to carefully review the generated code. But, as always, such reminders will do nothing to improve reliability. Because, while AI agents change way that software developers work, they don’t eliminate the efficiency-thoroughness trade-off.
I’m curious whether organizations that explicitly acknowledge the ETTO trade-off end up with different AI adoption patterns. Do they invest more heavily in automated verification infrastructure? Do they resist the “let AI do everything” narrative because they’ve calculated the review burden? And I wonder if the ETTO framing helps explain why incremental collaboration works so well with agents. It’s not just about better context or clearer prompts, it’s about distributing the thoroughness burden across multiple smaller checkpoints instead of one massive end-of-task review.
The trade-off doesn’t go away with better models or better tools. It’s structural. The question is whether we’re building infrastructure that manages it, or just hoping people will “be more careful.”