Picking a markup language for a side project led me to the “Norway Problem”. You write [GB, IE, FR, DE, NO]. YAML returns [GB, IE, FR, DE, False]. Norway vanishes but not a bug (though I can’t call it a feature either).

YAML 1.1 defines NO as boolean false, along with yes, Yes, n, N, and a dozen others. The parser did what the spec said. You knew NO was a string. You just never said so.

The “fix” is quoting "NO" but that’s a workaround, not the solution. YAML 1.2 dropped this behavior thirteen years ago but it was too late. By then YAML 1.1 was wide-spread and most implementations still default to 1.1. That inevitable inertia had kicked in. Specifications may iterate and improve, but the world kept running old assumptions like IPv4 vs IPv6. Specifications are hard.

A higher stakes version was NASA’s infamous Mars Climate Orbiter. The interface spec said metric, Lockheed’s software was using pound-force seconds while NASA’s navigation expected newton-seconds. So $327 million (or more than half a billion in today’s money) burned up in the Martian atmosphere. Both teams followed the spec as they read it. The spec just didn’t enforce a minor but crucial requirement with enough clarity. Specifications are hard.

I have sinned as well. Event-ruler needed to create a syntax to express match condition A or condition B; but there were existing rules with or as keys that I didn’t want to break. So I went with $or which confuses everyone because ruler’s AND matching works different and is hard to make sense when $or exists as well. “Where’s the $and?” they ask. And then the primitive $or only works as OR condition if you have 1/ array with 2+ objects, 2/ no reserved keywords, and 3/ right structure. Miss one and $or quietly becomes a literal field name for backward compatibility but this also means a typo or over-the-wire corruption leads to mis-matches. In Event Driven systems, that can be data loss risk. Meaining unless you read the specifications carefully, your implementations can have silent or grey failure. Specifications are hard.

Sure the gaps eventually got filled. We worked around YAML, upgraded our libraries, even landed on Mars many times over, and rewrote rule-engines while shying away from past mistakes; but sometimes the fill doesn’t match the intent. Sometimes Norway still becomes False. Sometimes spacecraft still blow up. Specifications are hard.

In the world of hand-crafted software this is more fixable because someone has gone too deep for their own good, but then there’s the ongoing drive to have specs as compiler targets for automation. When the spec generates code, ambiguity crashes the house down because there’s too much room for interpretation, and lossy compression.

Put it more bluntly, LLM prompting has the same problem, which is probably why it stays hard despite better models. It’s why /compact is useless. As is “Make it better”, “Fix the bug”, and “ultrathink” as they’re all underspecified. Model will read your prompt like YAML reads NO, filling gaps from training of mostly unverified internet speak. So no, the models didn’t need to improve. The math behind them was sound but the data was a mess, and the spec was incomplete. That’s what everyone is fixing in this current rat-race. But they sometimes forget that specifications are hard.

On the flip side, specifications have always been hard. We’ve just gotten better at working around them. The old way was to ship, wait months, learn the spec was wrong. Now we iterate faster and correct sooner, in software for sure but in so many other domains. But faster iteration isn’t better specification. We’re just failing quicker and recovering quicker. And so Norway still becomes False when someone forgets add the extra workaround. Specifications are hard.