An epidemic of “All developers will use LLMs” top-down mandates has spread across the software industry. I’m not sure if such pushes are ever necessary.

At least when it comes to AI, most folks really just need access. Once that is taken care of I’ve seen half the engineering org pick up the good stuff within months. For example, in late 2024, folks had figured out that the models and tools were good enough for validating tests, writing scripts, and exploring greenfield work. Basically lower risk areas where a bad generation won’t cost a production incident or developer pain. The smart folks had the judgement to leave critical codepaths alone. Nobody had to tell them to do this.

But those not in the weeds had no way to know any of this because… well they aren’t in the weeds. So they feel compelled to solve their information gap with a policy hammer.

To be fair, I get why the mandates happen. There’s real pressure to perform. Other companies are hyping up massive productivity gains, and doing nothing feels worse than doing something wrong. But we’re still in pretty early stages. Nobody really knows what AI-native development actually looks like a few years down the line. Anyone saying otherwise is selling something or being naive. We might be looking at the future of software engineering already, but we might also be having a Concorde moment where impressive technology doesn’t survive contact with economics at scale.

But the mandate lands anyway, and what happens next is what always happens next.

The AI usage numbers spike. Mostly in the areas where engineers had already been making good calls without being told, because now the metrics are actually being tracked. Then leadership shows some nice graphs at the next all-hands and pushes on the numbers to go even higher. This in turn gets translated down the org-chart as we need to make the numbers go up even if it means “move fast, break things”. And so quality starts slipping. PRs take forever to review. Tickets, discussions, and milestones take 2x more time if you’re lucky. As usage numbers go up and productivity doesn’t materialize, the mandate gets louder. Gastowns get built, more training gets pushed, stricter targets get set, and AI gets woven into performance reviews. The usage keeps growing. The costs keep growing. Productivity, though, remains a mystery.

Eventually leaders get fed up and start taking wild stabs in the dark at random stuff, maybe even pull back (“let’s get back to basics” or whatever) and the whole cycle starts over.

If you look past these closed loop oscillations at the top and start paying closer attention to what is happening on the ground, you’ll see a different reality. Take the example of what kinds of AI usage patterns come up. Often on the same team, but sometimes even in the same person.

It’s a nuanced conversation but broadly, I see three types of users across various conversations on the topic:

  1. The explorers run everything through AI. They’ll use a lot of tokens and tools, but they’ll also hit a lot of dead ends because most experiments fail. That’s just what exploration looks like. Their value lies in the map they are drawing for everyone else. Which tools worked where, which prompts produced garbage, which codebases the models couldn’t handle, why, and so on.
  2. Then there are the effective adopters. They’re usually busy being productive, so they come late and often get less credit because they aren’t flashy about their AI usage. They picked up tools that the explorers say work and then walk the golden path. Because they are targeted and efficient, they are terrible all-hands slide material for usage graphs and they will be hard to spot in the data. But they will be the ones shipping, now more than ever.
  3. And then there are the noise makers. I don’t think we talk about the noise makers enough. They’re the ones whose numbers show up in the all-hands graphs, or are often the loudest on their use of the tech. This isn’t always their fault, sometimes it’s a capable engineer who got the wrong signal about what “good adoption” looks like because it’s incredibly hard to scale out nuance, but their behavior is pretty typical. In the case of AI, it’s making time for AI because it was the thing to be seen doing. High token counts, impressive demos, yet nothing that could survive a week in production. I watched one person spend three weekends building an agent workflow that automated a task they could have done with crontab and a curl command (the first result if you look on Stack Overflow). The usage numbers, naturally, looked great even if the automation never worked reliably enough to ship.

In every case that I’ve heard through the grapevine, the engineers closest to the work just learned to tune out the policy-driven noise because the mandates didn’t reflect what was needed on the ground. (There are also late adopters and naysayers worth listening to as they’re a great source of what’s unaddressed technologically or culturally, but sweeping mandates just turn them into noise-makers too.)

So if you are looking to go all in on AI, beware of how you position your dashboards and measurements. The stuff that are easiest to track are the ones most likely to mislead you. Just like if you reward lines of code, you’ll see lots of impressively oversized PRs; now you’ll see lots of counterintuitive changes if you track token usage. Measure number of PRs and suddenly single tickets get associated with lots of smaller PRs, which has some benefits but gamifies just as easily. You can also try to track backlog counts, time to resolution, and more; but these can force a team to pick up less work to make the numbers look happy. I know of at least one startup where PR counts dropped because engineers were busy chasing AI targets and missed a rather important milestone. I don’t fault them. This happens every time an individual’s usage becomes a performance signal. It doesn’t help that people still genuinely believe they’re faster.

So What Does Work? Link to heading

Herein lies a twist. Rather than focusing on individual numbers or single person performances, the same data are pretty useful as bottleneck detectors at the organizational level or higher aggregates. Once you detect the bottlenecks, you must go after the said bottlenecks. Could be that your repositories have ownership problems, and you need to get everything baselined. Could be that your PRs aren’t getting looked at within a day or two because teams treat outside changes as less important or lack resourcing. Could be many more other things, but once you start addressing them; you get better ROI than asking everyone to “use AI”. If AI is good for something, trust that the smart folks will figure out where today’s tech is useful like they’ve done in the past. It’s not like everyone had to be told to move away from [REDACTED CAMEL TECHNOLOGY]. Speaking of trust, you can look at the data at the team or individual levels to double check if the data is accurate, but 1/ communicate very clearly that the intent is to primarily detect gaps, and 2/ commit to addressing the said gaps in some meaningful way. If you don’t, you lose trust and then find more noise-makers again.

If you’re in very early stages, knowledge sessions and all-hands demos can help create buzz and gain more early adopters, but do so when the business has the space to absorb the losses from experiments. The on-the-ground adoption spreads differently. Someone shows their team a way to save real time on a real problem. Could be in a pair-programming session, in a mob programming session, a jaw-dropping write-up, or a “hey, look what this did” on a Friday afternoon. Next the sister teams start to take notice. If your neighboring team told the org “AI cleared our entire sprint backlog,” you’d treat that very differently than hearing “some team at another company cleared their sprint backlog with AI.”

As said before, none of this is revolutionary and that’s sort of the point. AI is a “mirror and multiplier”. It intensifies whatever was already happening. Teams with rigor find ways to automate rigor, teams with brittle processes and noise making abilities get to amplify their bad habits. The tools never change the trajectory, people and their decisions do. Which means the only thing worth mandating is your vanilla engineering health. You don’t need to dress that up as an AI initiative.

Fine, But What Actually Worked For You? Link to heading

This was aggressively boring but boring works. Across our mid-size engineering org, we ran pilot programs that spanned 6-8 weeks. It meant a handful of people trying promising tools as part of their actual project work. They were expected to hit problems, and figure their way out. The results were always mixed. You listen to what problems surface, make targeted changes where they help, and then spread whatever stuck. Sometimes you hit a dead end (cough JetBrains cough) and start fresh with a brand new solution. Lots of small changes that didn’t make for a good all-hands slide, but it was working.

For example, we had data to recommend Claude Code as the default over Cursor and WindSurf within a month of a pilot, and we were able to strongly recommend the whole org to default to Opus 4.6 within two weeks of its release. We found that modules within mono-repos needed their own agent configurations, ownership of modules needed baselining policies, and that AI-assisted code review needed its own tuning based on what reviewers were actually ignoring or repeatedly flagging. We also held things back (for the org) like a HumanLayer inspired approval process that didn’t make sense as the default process for small-scale changes. We also added a lot of verbose AGENTS.md files to our codebase that were grossly unhelpful, before reverting back to smaller files two months later. Also, the less I say about my original efforts to track AI usage, the better.

Having a pilot process where you can create an experimentation flywheel meant you can also address the “too many tools” problem. Anything promising went onto a manually ordered trial list that available pilot folks can pick up as capacity becomes available. Sometimes they’ll need to be reminded that the goal is to answer “is this ready for org-wide adoption?” but overall it was a promising pattern. Our learnings also typically extended to other organizations with minor tweaks.

The most interesting thing that came out of it was convergence. The investments that help AI agents work better turned out to be the same developer productivity investments engineers had been requesting for years. Better documentation, faster local setup, improved test coverage, deployment safety nets, and so on. But this is just a datapoint of one, and comes with an extensive amount of caveats that don’t quite fit into this lengthy post today.

For what it’s worth, I haven’t seen companies get hurt yet because they adopted AI too slowly. What I’ve seen hurt companies are 1/ mandates that go too hard, 2/ tolerance of bad outcomes for too long because they were chasing trends, and 3/ burning through the trust of the people (be it customers, employees, or investors).

The ones that I’ve seen improve were better at paying attention to ground reality, and acting on what they were hearing before it became a crisis. Not just blindly giving what people were asking for, but acting with more intent. Without the acting part, they’d just be back at unnecessary mandates with more extra steps.