The traditional specification has a well-known failure mode: you write it, reference it during initial development, then watch it slowly diverge from what the code actually does. Six months later, the spec describes a system that no longer exists. GitHub’s spec-driven development work and AWS’s Kiro IDE invert this relationship entirely. The specification becomes the source code and automation is the compiler.

Marc Brooker frames the current approach as “vibe coding”—prompting AI agents step-by-step without a bigger picture of what you’re building. Each prompt solves a local problem, but there’s no coherent vision holding it together. The power is real, but limited because it’s missing the full context of what a program should do and why.

The GitHub team working on Spec Kit makes a related but distinct observation:

We treat coding agents like search engines when we should be treating them more like literal-minded pair programmers. They excel at pattern recognition but still need unambiguous instructions.

I’ve noticed this pattern in my own work. When I give an agent a vague prompt, it generates something that looks right but doesn’t quite work. The agent isn’t searching for answers. It’s executing instructions (and hence often gets the “overconfident intern” moniker). The mismatch happens because I was thinking it understands my intent when I should have instead focused on “compiler that needs precise syntax.”

The Spec Kit approach makes this explicit. You write specifications that describe user journeys, experiences, and success criteria. Then you provide technical constraints like stack, architecture, compliance requirements. The agent generates a plan, breaks it into tasks, and implements each one. But here’s what’s different from traditional development:

Instead of reviewing thousand-line code dumps, you, the developer, review focused changes that solve specific problems. The coding agent knows what it’s supposed to build because the specification told it. It knows how to build it because the plan told it. And it knows exactly what to work on because the task told it.

This connects directly to incremental collaboration patterns. The spec provides the breakdown structure that makes incremental work possible. Without it, you’re back to one-shot prompts with 38% success rates. With it, you get the 83% success rate from sequential sub-tasks with clear verification points.

The documentation sync problem gets solved as a side effect. GitHub engineer’s markdown-as-programming-language experiment embeds user-facing docs directly in the spec:

the user-facing documentation from README.md is embedded in the specification. This keeps documentation and implementation in sync. If I want to add an alias for the -o argument, I just update README.md with no extra steps required.

The spec is main.md. The implementation is main.go. When you change the spec, the agent “recompiles” the code. This is closer to how reliable verifiers work. The spec defines what “correct” means before any code exists. You can’t verify against a vague prompt, but you can verify against a structured specification.

The durability benefit is quite real as well. Brooker describes specifications as “version controlled, human-readable super prompts” that become “long-lived, durable, and official.” When you add a requirement to the spec instead of just prompting it, the agent doesn’t forget about it as you make other changes. The spec becomes a North Star that guides agent work across multiple iterations.

Large companies struggle with where to put security policies, compliance rules, design system constraints. Usually these live in someone’s head or buried in wikis.

With Spec Kit, all of that stuff goes in the specification and the plan, where the AI can actually use it.

That’s the interface layer between human intent and AI execution. Not prompts as those are too ephemeral. Not code comments which drift. Specifications as first-class artifacts that generate code the same way source files generate binaries. Amazon has been writing specifications in various forms for years from working backwards documents, design docs, formal specs with TLA+ because it helps teams move faster by making it more likely they’re writing the right code to solve the right customer problem.

The failure modes are predictable though. Specs can get messy like any code. Even the Github post acknowledges you need to ask the agent to clean up main.md periodically. And there’s no free lunch on the “describe what you want” problem. Clarity is still the hard part of software development. The spec-driven pattern doesn’t eliminate that difficulty, it just makes the cost explicit upfront rather than discovering it later when generated code doesn’t work.

I’m curious how this pattern scales beyond the greenfield examples I’ve seen. Large brownfield codebases have implicit assumptions and inconsistent patterns that don’t fit neatly into specifications. What happens when you need to integrate with existing systems that weren’t spec-driven? Does the pattern break down, or do you end up with hybrid approaches where some components are spec-driven and others traditionally written? And at what codebase size does maintaining the spec become as expensive as maintaining documentation the old way?