The Arecibo Message: A Masterclass in Over-Engineering

In 1974, Frank Drake and Carl Sagan sent a message to aliens from the Arecibo telescope in Puerto Rico. The message compressed humanity’s greatest hits into 1,679 binary pulses. It had our DNA structure, population count, solar system layout, and even the dimensions of the telescope itself. The number 1,679 was deliberately chosen. It’s a semi-prime, the product of exactly two prime numbers (23 and 73). The assumption was that any sufficiently advanced civilization would recognize this and know to arrange the bits as a 23×73 grid to reveal the hidden image.

Except when Drake gave the message to his colleagues to decode, none of them could figure it out. Not even the people who understood binary, prime numbers, and the context that this was supposed to be a message from humans. The semi-prime structure was too subtle. The binary numbers at the top used an undocumented bullet point system. The human figure was also so pixelated that it looked like noise. In an earlier experiment at a 1961 SETI meeting, Drake sent a 551-bit message to colleagues and only one person cracked it.

It’s easy to say A for effort, F for execution, but the problem was not over-engineering. The problem was that compression requires a shared decompression algorithm.

Drake and Sagan encoded the message with assumptions about what aliens might understand from prime numbers, binary, DNA helixes, solar systems, and even how they’re arrange the pixels in 23x73 (and not 73x23). Each assumption was a piece of the decompression key, but they had no way to verify whether aliens (or even humans without context) possessed that key. They packed everything they thought was important into 1,679 bits, hoping something would be recognizable enough to bootstrap understanding. It was a reasonable gamble given they had no idea what alien mathematics, biology, or communication patterns might look like.

This is a common information asymmetry problem. Encoding is easy when you have all the context. Decoding without that context is nearly impossible. Cue the Nobel Prize winning Market for Lemons.

There’s an identical pattern in software abstractions. You build const memoizedFn = pipe(curry(fn), cache) and think “elegant functional composition!” You name variables ctx or svc because “everyone knows what those mean.” But you’ve compressed your intent into patterns that require your mental model to decompress. The compression saved you typing while it costs them understanding. That semi-prime encoding in the Arecibo message is like choosing Protocol Buffers over JSON because it’s “more efficient”. Technically true, but now everyone needs the schema to decode anything.

The DNA base pair count Drake transmitted was 4.3 billion because that’s the number scientists believed in 1974, but it turned out to be wrong. The actual number is closer to 3.2 billion. The population was also 4.3 billion, roughly accurate then but wildly outdated now. The message showed Pluto as a planet. It was aimed at the M13 star cluster 25,000 light-years away, but by the time it arrives, those stars will have drifted. Even the encoding that could theoretically be decoded contains information that’s no longer accurate.

The same thing happens with APIs too often. You encode domain knowledge into endpoint names, parameter structures, error codes. All perfectly logical if you understand the system’s internals. But the person trying to integrate it doesn’t have your context. They’re staring at your 1,679 bits wondering what assumptions they’re missing.

The uncomfortable part is Drake and Sagan weren’t amateurs. Drake formulated the Drake equation. Sagan popularized science for a generation. These were brilliant people making a considered bet about what might be universally comprehensible. They couldn’t test whether their encoding was decodable since they didn’t have access to beings without their context. As scientific American noted, the message “was meant more as a demonstration of human technological achievement than a serious attempt to enter into a conversation.” But even as a demonstration, it reveals you can’t verify your compression scheme works until someone without your context tries to decompress it.

Back in software world, when you’re building internal tools, you’re encoding for future developers who won’t have your current context. When you’re designing APIs, you’re encoding for consumers you’ll never meet. Even code review is your best approximation of that decoder test where reviewers struggle to understand your change without you explaining it. That’s a compression problem, not a review problem. While it may feel cognitively rewarding to build clever abstractions, the question is usually whether your ideas are decodable without you standing there explaining it.

Sometimes the complexity is genuinely needed, like when databases compress SQL into execution plans for good reason. But can someone verify your compression is correct without already knowing the answer?

For Arecibo, I keep coming back to Drake’s colleagues staring at the binary string, unable to crack it. It’s the curse of knowledge: you can’t unsee how to decode something once you already understand it. Since you know, how do you test whether your encoding is decodable? If you’re not sure, there’s a solid chance that all your clever encoding is just 1,679 bits of noise.