The Velocity Trap

Software Metrics

Oct 24

There’s an AI tool for everything now. One to write your code. One to comment on your code. One to summarize your pull request. Another to explain why your tests failed. There’s even one to “summarize your sprint goals” . This, if we’re being honest, usually means “turn panic into bullet points.”

Every week seems to bring a new Copilot, Whisperer, Genie, or Assistant. The developer workstation is starting to look like a crowded airplane cockpit where every AI insists it knows the best way to fly or land the plane, sometimes both at the same time.

The result? We’re still measuring the wrong things, only faster. Teams can now produce twice as many story points, commit three times as many lines of code, and close bugs faster than they can be reported. All of this done without improving a single user's experience. It’s like bragging that your factory doubled production, only to discover the boxes are full of boxes. AI doesn’t just accelerate development. It accelerates whatever you’re already optimizing for. If your metrics reward activity, you’ll get more activity. If they reward customer value, you’ll get better products.

With every developer now armed with a Copilot, Whisperer, or Genie, the dashboards have never looked better. More commits, more story points, more “productivity.” It’s pretty, but none of it proves we’re building better software. AI hasn’t fixed bad measurement; it’s just made it faster.

These are not the metrics you’re looking for

Story Points per Sprint

Ah, story points, the illusion of progress. Once upon a time, in a startup far away, they helped teams estimate effort. Now they’re a high-stakes currency in the Game of Sprints, where everyone’s trying to hit their “velocity target” like it’s an Olympic sport. Developers learn to inflate estimates (“That’s definitely a 13-pointer!”) or slice tickets thinner than deli meat just to keep the charts trending up. Add AI into the mix, and suddenly the same work “magically” burns down twice as fast. Congratulations! Your team has achieved hyperspeed, right into the wall of meaningless productivity.

Lines of Code (LOC)

Measuring productivity by lines of code is like judging an author by word count. Charles Dickens wins, and Hemingway cries quietly into his whiskey. With AI, you get 3 Dickens for the price of a Hemingway. More lines don’t mean more value; it just means your repo now looks impressively large while your future self silently weeps at the merge conflict ahead.

Commits and Merge Requests

At first glance, lots of commits look great, until you realize most of them read like “fix typo,” “really fixed typo,” and “final_final_fix_for_real_this_time.” AI makes this even more entertaining: developers can now summon boilerplate in seconds and commit with abandon, flooding your repo with micro-changes that crash your Graphana dashboard. Grab the popcorn —it’s productivity theater, where a flurry of commit activity impresses executives but confuses the poor sucker doing the next code review.

Tickets Closed or Bugs Resolved

Closing tickets is like a dopamine hit wrapped in a Jira notification. In many organizations, the race to “burn down” the backlog becomes the entire sport. Developers grab the low-hanging fruit (“fixed typo in README”) while the nasty architectural gremlins continue to nest in the shadows. AI helps grow the nests into log cabins: automated suggestions and code fixes let teams close tickets. The problem? The gremlins keep coming back, only now they’re better documented.

What should you be measuring instead?

At some point, every leadership team realizes that all those colorful dashboards full of story points, commits, and lines of code don’t actually explain that customer attrition is a problem. It’s like tracking how many treadmills your gym members use without noticing nobody’s losing weight. AI has only supercharged the ability to measure “what we have done”. So instead of asking “how much did we do,” it’s time to ask “did what we do matter?” The shift from volume to value isn’t just semantic, it’s cultural. It means moving from measuring how hard people are working to measuring whether their work is moving the needle. AI can help here, too, but only if you point it at the right questions. The following are my favorite metrics. They tell you whether you're satisfying the customer and providing real value. AI, like before, just helps you do it faster.

Cycle Time to Value

Forget velocity charts — cycle time to value is the grown-up metric that tracks how long it takes for an idea to become something customers can actually use. It measures the time from when work starts (e.g., the Jira ticket moves from “To Do”) until it’s deployed and delivering impact. The data already lives in your CI/CD system, Git logs, and ticketing tools — Jenkins, GitLab, and Jira know exactly how long every piece of code has been “growing mold in the cupboard.” Add AI and things get fun: you can have automated assistants summarize slowdowns (“this pull request sat unreviewed for 4 days because everyone was at offsite trivia night”) or recommend which repetitive steps to automate next. It’s the difference between celebrating how many pizzas you made and knowing whether anyone actually ordered them.

Defect Density

If velocity is the illusion of progress, defect density is its sobering hangover. This metric counts the number of bugs per some grouping/lines of code, telling you whether AI is generating clean, maintainable logic or just optimistic nonsense at scale. The data hides in your issue tracker (like Jira), testing reports (SonarQube, Checkmarx, or Snyk), and post-release incidents. AI can automatically tag which commits included generated code and compare bug frequency between human-only and AI-assisted changes. If you see fewer bugs in AI code, congratulations, you’ve achieved the dream. If it’s higher, well, at least you have empirical proof that your “copilot” may be headed to the wrong airport.

AI Rework Ratio

This one measures how much AI-generated code survives human review without needing to be rewritten — essentially, the “trust but verify” score of your automation strategy. You can calculate it by comparing the volume of AI-suggested code (captured in telemetry from tools like GitHub Copilot or Tabnine) against what ends up merged after review. The data lives inside your IDE analytics, Git diffs, and pull request comments. If your rework ratio is high, it means your developers are treating AI suggestions like dubious Tinder matches — entertaining, but not relationship material. Low rework ratios? That means your AI is finally writing code that humans don’t roll their eyes at.

Customer Outcome Score

At the end of the day, no metric matters more than customer satisfaction: are users happier, faster, safer, or less likely to rage-quit your app? This one pulls from analytics platforms ( Google Analytics, Datadog), customer NPS surveys, and support tickets. AI can correlate release data with adoption spikes or churn dips, giving leadership something better than “we shipped 47% more features”. You can instead report “we shipped something people actually like.” When AI helps link your commits to customer smiles, you’ve escaped the Velocity Trap completely.

Conclusion

This is not meant to disparage Copilot, CodeWhisperer, or any of the other digital sidekicks now living rent-free in our IDEs. Many people love them and use them daily. They’re brilliant tools that save time, reduce friction, and occasionally (with enough prompting) produce elegant code. That said, if you measure the wrong thing, even the best AI will happily optimize you into a dead end. Albeit you will get to your dead end faster, cleaner, and with better syntax highlighting.

The real challenge isn’t adopting AI. The real challenge, as it has always been since the beginning of software development (1970’s for you whipper-snappers) is measuring outcomes that reflect customer value, software quality, and team well-being. AI is the amplifier, not the orchestra. It plays whatever tune you conduct. Define success around impact, and your Copilot (and the half-dozen other copilots) will help your teams build something that actually moves the business forward.

AI4DevMetricsCitizenDeveloper

Paul Karsten