Walk into any engineering org right now and the slack threads around PRs have shifted. The argument about whether AI PRs are noisy is mostly worn out. Everyone begrudgingly accepts the queue is heavier than it used to be. The argument that’s left is messier. There’s a bad PR is sitting in your queue. What do you actually do with it?

Sending it back with comments feels like the path of least resistance. But what if the author is either an AI agent that won’t remember the feedback on the next PR? Or a developer who doesn’t understand the underlying code enough to internalize your advice? Rejecting the PR feels harsh. Fixing it yourself doesn’t scale. You can’t just leave your long thoughtful comments anymore.

Well, sending it back was an illusion. It feels productive because you could steer someone to create the right code with less work. In reality, it forces the author into a loop of guessing, typing, and (these days) prompting, which eventually pulls you back in to re-verify their AI’s second attempt. The back-and-forth is the most expensive possible outcome.

From apenwarr.ca:

The job of a code reviewer isn’t to review code. It’s to figure out how to obsolete their code review comment, that whole class of comment, in all future cases, until you don’t need their reviews at all anymore.

(Think of the people who first created “go fmt” and how many stupid code review comments about whitespace are gone forever. Now that’s engineering.)

By the time your review catches a mistake, the mistake has already been made. The root cause happened already. You’re too late.

When you build a system that guarantees you never have to leave the same review comment again, that effect compounds even while you sleep. Take this case-study from Roblox. They mined years of pull-request history, clustered the recurring feedback, and built a pipeline that promotes patterns worth keeping into a knowledge base their agents consult before they write code.

Previously, an experienced engineer might spend hours every week reviewing PRs, repeatedly flagging the use of a blocking FetchData call inside high-frequency loops. If the expert is out of town or misses an error, their knowledge may not be applied, and an anti-pattern could slip into production.

Once you encode the rule, the expert doesn’t have to be in the room. In my case, encoding rules allowed us to eliminate missing tests for input validations, remove stale feature flags, and catch intentional quirks around data-modeling that an internet-trained AI would treat as a bug.

And by the way, you don’t need a massive data-mining operation like Roblox or Meta. A handful of AI agent sessions to analyze patterns and then come up with scriptable gates is quite helpful. There’s plenty of prior art on how to reliably enforce gates, and you can use AI agents to script a decent analysis pipeline within days in the background. For reference, see agent-audit, which I wrote in a few days on my phone (for personal use) and continue to value on a weekly basis.

There’s one caveat with the AGENTS.md file. When a reviewer catches a new pattern on an AI PR, the temptation is to dump a new rule into a global AGENTS.md file so the agent never repeats it. I learned the hard way how badly this fails. At one point, our AGENTS.md ballooned past 1,000 lines. The agents became horribly slow and incompetent, even at trivial changes, because they were trying to satisfy generic guidelines that didn’t matter for the task at hand. We cut the file down to 80 essential lines and then further down, and iteration speed skyrocketed. I say iteration speed, because the code commit rate was the highest it had ever been in the codebase’s five-year commit history. The fix was progressive disclosure where we are loading rules only when the context calls for them, rather than relying on a single dumping ground.

What about the PR that does reach a human? Link to heading

Even with the greatest of gates, PRs should require 5 minutes (and no more) of human review for many reasons. If the PR is already mostly there, leaving a small number (three to four) of well-targeted comments with a bias toward approval works best. But if the PR is fundamentally bad and you feel the urge to write a long critique, the PR is the wrong venue. Jump on a quick call or pair-program the final mergeable output to save days of back-and-forth.

Another tool I rely on is pushing minor fixes directly to the author’s branch when that’s faster. A one-line nudge committed and pushed is often the cheapest way to unstick a near-ready PR. If that feels like crossing a red line, leave the suggestion in a comment. GitHub’s inline suggest-changes feature is the lower intrusive version of this idea and there is some evidence that it increases PR merge rates.

If the same class of issue keeps coming back, doesn’t matter if it’s the same author or the same agent, you’re better off assuming that there are rules is missing from CI. You can then find the patterns from recurring review comments as I alluded to before, and encode them as team standards. Coaching through small incremental comments is cheaper in the moment but, in the long run, its the wrong fix.

Take open source projects for example. They often work with random, unexpected PR contributions and so a lot of back-and-forth happens through comments. This creates churn. Famously, curl shut down its bug bounty in January under the weight of unverified AI slop. But soon after they layered specialized analyzers like ZeroPath, AISLE, and Mythos on top of their existing deterministic checks and are now projecting roughly 50 curl vulnerabilities in 2026. Their security report volume has doubled year over year, and almost every report now uses AI to clear “a long backlog” of bugs the existing tools missed.

The lesson is that raw AI output carelessly aimed at humans is the problem, but well-filtered AI output in the hands of rational experts yields gold.

In the internal corporate setup, you have the advantage of controlling the pre-submit environment. You can place a filter directly in your commit hooks and CI checks, right next to the rules you’ve already encoded. And on every dimension you don’t explicitly constrain, you create cognitive and intent debt because the model regresses toward its training corpus. And most of that public code doesn’t represent your codebase.

Take my “learning experience” from the past few weeks. An agent-authored endpoint quietly accepted oversized payloads in production for over a week. We had a payload-size checker, but there was an undercounting bug because we missed internal framework specific headers. The trend finally surfaced during a weekly review when the latency curve was visibly spiking. My encoded rule to “validate input” failed, but my new CI check works fantastically on every build. It asserts the size-checker’s count against a synthetic payload of known length.

Of course, I could have just relied on AI code-reviews to leave a well-targeted comment to catch the missing headers. But that introduces automation bias where reviewers drop their guard around AI code simply because the AI could plausibly catch issues, but it also plausibly might not. Deterministic options are the lesser evil and quite cheap to create, especially compared to the cost of a PR. You’d not only be detecting and fixing issues earlier in the coding workflow, but you wouldn’t have to depend on reviewers being perfectly vigilant.

So, stop playing tennis with your PRs and bouncing them back and forth. For the bad PR in front of you right now: add the rule, don’t write a wall of comments. If you can’t add the rule yet, push the one-line fix to their branch. If you can’t push the fix, pair on it. The long review comment might feel cathartic or impressive the first time, but you’ll dread it eventually. True work is figuring out which rule, spec, or structure change would have prevented the PR in the first place. Never write the same comment thrice.

Reviewing code is about finding gaps in your coding, and now in your coding automation. Before, fixing them holistically was extremely expensive outside few edge cases. But whenever we had the chance, the industry has grabbed them with both hands because these gates have a amplifying effect. The machines can write the code now. It’s time we communicate to them in ways they’d understand what good code actually looks like, too.


This post titled Stop Bouncing PRs first appeared on rishi.baldawa.com. Email Rishi for any comments, feedback, or general banter. If you liked it, send it to someone who’d grumble about bad PRs.