Writing as Diagnostic Mirrors for Agents

Agents do not just write code. They select libraries, split code, add retries, create schemas, rearrange data, and decide if current divisions will remain.

Many of those decisions are made during implementation. No one’s going to create a separate review called “Should this path now depend on persistent storage?” The diff just shows code. The tests will pass, but the solution hides architecture decisions.

Architectural choices consistently outlive the conversations that created them. The difference now is all about speed and opacity. An agent can make three different versions of the same feature, each with a different dependency graph, state model, or failure mode. A recent paper calls this “vibe architecting”, which is cute enough to be annoying and accurate enough to keep using.

Clearly, we should enforce architectural choices more strictly, and there’s growing effort around it.

Put the ADRs in the repo. Feed them to the agent. Point the AI reviewer at the decision log. Turn the stable rules into checks.

This has utility, and people are already building it, yet I suggest a less ambitious course. Before you go with an agent-written ADR splattered across your codebases, you should use it to check if there was even an actual decision.

A Mirror Link to heading

Architecture Decision Records are somewhat cliché, yet they serve a useful purpose in documenting the rationale behind decisions made long ago. Note that RFCs, design documents, or similar artifacts are different flavours where the upcoming rationale will apply as well. In an agent workflow, an ADR has another use as a mirror into the LLMs.

Whether an agent adds a queue, changes a boundary, introduces a cache, or alters a public contract, you can ask it to draft the decision it “thinks” it just made. Not because you want the “load-bearing” document. Not yet. You ask because the draft tells you what the agent thinks the trade-off is.

This is different from documentation or prompting. If you ask a good agent directly, it can make a weak decision sound plausible but when it is forced into the mould where it has to present tradeoffs, alternatives, and consequences, and when a person has to sit down and read this artifact as they would for any other design doc, you’d end up playing a whole different ballgame.

I noticed this at work but also while doing routine work on quamina-rs, a Rust port of its namesake fast rule-matching engine. When I synced with the upstream source recently, the agent had to carry over a change in the memory management surface. This led to replacing an old runtime memory-budget API with a read-only stats API. While functionally, the budgeting feature around this memory management stayed roughly the same, there was still the public contract change the agent treated as an implementation detail.

The code looked mechanically reasonable. The 1000+ tests were all green too. But when I asked the agent to write any decisions down, there was a mismatch. The draft brought up solely the implementation port and performance record keeping. The API contract change was no where until I brought it up and then it gave me the familiar “you’re absolutely right!”

This was the moment worth slowing down for. I have seen this exact failure mode across other sessions before.

I don’t think the agent made a bad change, but it did make an architecture-shaped change without treating it like one. Forcing the change into an ADR format gave me a structured surface to read against. My judgment caught the gap that the green tests and clean diff hid.

Do Not Promote Every Mirror Link to heading

If the trend holds, the field is going to overbuild the current hype. Don’t think of it as my prediction, think of it like it’s a Tuesday.

When agents can generate ADRs, compliance notes, migration plans, review comments, and policy files on demand, the temptation is to keep all of it because “you never know”. But such actions lead to repositories packed with authoritative-seeming documents, that prove unintelligible to anyone. They become the equivalent of the terms and conditions that you definitely should be reading, but don’t bother reading because there’s too much to even know where to start.

In fact, most agent-written decision notes should die. We pay too much attention to what agents type out as it is already.

Instead, keep the ones where the decision is expensive to reverse, likely to be rediscovered, or important enough that future engineers should understand the rejected options. If the decision is local to one task, easy to change, or mostly execution detail, let it live in the PR and end there.

And to reiterate, the document is not valuable because markdown was produced. It is valuable only if someone with context accepts the tradeoff and is willing to own the consequence.

This is why I get nervous about the rising trend of autonomous architecture enforcement. The jump from “the agent writes ADR” to “the agent decides the rules and skip ADR” will be quick. That’s not the way it works. The draft should be distrusted first. It’s only after a person has accepted or rewritten the decision that the team should decide what to do next.

Choose The Future Encounter Link to heading

Once the decision is real, choose the cheapest future encounter that still works.

If that rule is mechanical, make any enforcements mechanical too. It means blocking human interventions on auto-generated code through CI checks. If UI code must not call the database, make you a dependency exclusion rule.

If the signal is suggestive, make any enforcements suggestive too. An LLM can spot when a PR crosses a known boundary and ask for the reasoning behind it. You can use an LLM for loosely defined contextual checks, but these shouldn’t block PRs. Put them instead in a review prompt or scoped repo instructions.

The simple split is to use hard rule when the crietria is deterministic, looser warning where the rules are suggestive, and add in human review when things get contextual. The point is to stop re-litigating settled constraints while keeping people responsible for decisions.

Route With Taste Link to heading

Another easy mistake is to feed every ADR, design doc, and team convention to the agent. It feels safe. It usually creates a different problem.

Agents are not helped by a junk drawer of context. They will satisfy rules that do not matter, miss the one that does, or spend half the session summarizing stale material. Adding more context is like overwatering your plants. It actually makes everything worse. And expensive.

Instead, start with basic routing, that’s optimized for human comprehension. So, when it comes to tagging decisions, you should think about things like package, path, dependency, service and decision type.

Try using CODEOWNERS, README, CONTRIBUTOR.md, import graphs, migration folders, generated client paths and grep before you go for vector embeddings and RAG pipelines. And for the love of everything that’s nice, test responsibly. If a change affects db/, show the persistence decisions. If it touches API compatibility, show compatibility decisions. If it changes copy, leave the architecture library alone. This is called progressive disclosure in the LLM world; elsewhere, it’s simply tasteful discretion.

Not everyone in the company must understand the router. Mainly that answer seekers can easily determine how and why decisions were made. Some thrilling mystery solving vanishes in the process, but at least you’re safer from the hallucinogenic manic that considers speculative internet content as divine gospel.

Mirror First Link to heading

Teams should still break the old rules. Good architecture changes. What made sense three years prior may no longer be logical. Such shifts in logic just require clear expression.

That’s my useful version of ADRs when working with agents. It uses the agent-written draft as a mirror first. If the mirror shows there is no real decision, I delete it. If it shows a weak decision, I reject it. If it shows a meaningful trade-off, I rewrite it in human language and decide how the next human or agent should encounter it.

The machines can generate code now. Make them show the architecture they think they are changing before you let that architecture become the system.

This post titled Writing as Diagnostic Mirrors for Agents first appeared on rishi.baldawa.com. Share it around if you think it’ll help someone. Email Rishi for any comments, feedback, or general banter.