Tokenomics: Why Capping AI Token Spend Backfires

If you only have a minute, here is what you need to know.

When the AI bill jumps, the reflex is to cap it. That usually costs you more than it saves, because a cap measures what you spent, not what you got for it.
The math does not support it. A fully loaded developer costs about $83 an hour. An hour of heavy AI coding costs a fraction of that, even on standard API pricing. Capping the cheap input to save money on the expensive worker is backwards.
Caps do not reduce AI use. They push it into personal accounts, off your security perimeter. In a 2025 Microsoft survey of UK employees, 28% of workers using unapproved AI said their company gave them no approved option.
A cap set on today's prices is out of date fast. The cost of the same AI capability has fallen about 50x in a year.
The fix is to manage tokens like a budget for results, not a ceiling on spend. Track cost per outcome, fund the work that makes each dollar go further, and use soft alerts instead of hard cutoffs.
This is a known discipline. The FinOps Foundation already formalized it for AI.

In my last piece, The Token Tax, I walked through a scene that is playing out at enterprises right now. A leader rolls out an AI coding tool to 200 developers. Three months later, finance flags a line item that has gone from $5,000 a month to $85,000. Nobody can explain it. Nobody was watching.

I argued then that the waste is real, that it comes from untrained developers, and that you should set budgets as a forcing function. I stand by that. But I have watched too many teams read "set a budget" as "set a hard cap," panic at the invoice, and swing straight past discipline into rationing.

The token tax is real. A spend cap is not the cure. It is a more expensive version of the same mistake.

The reflex that backfires

When the bill spikes, capping it feels like control. Finance sets a hard ceiling per team, the system starts blocking requests when a project hits its limit, and the scary line on the chart finally goes flat.

The line went flat because you stopped measuring the only thing that mattered.

A cap only sees what you spent. It says nothing about what you got back. A team that spends its whole budget shipping forty features and a team that spends nothing shipping zero features look identical to a cap. One is the best investment you made all year. The other quietly went back to doing everything by hand. The cap cannot tell them apart.

It also hits the wrong people. The developer who burns the most tokens is often your best one, pointing AI at the hardest problems, where it saves the most human hours. A cap throttles that person exactly as hard as it throttles someone pasting a 2,000-line file into a chat window to ask what is wrong with it.

Now run the actual numbers. The median US software developer earns $133,080 a year, per the Bureau of Labor Statistics. Fully loaded with benefits, taxes, and overhead, that is roughly $173,000, or about $83 an hour. An hour of heavy AI coding, even on standard metered API pricing with a top model, runs somewhere between $10 and $30.

A fully loaded developer hour: about $83. An hour of heavy AI coding: $10 to $30 in tokens.

When a cap forces that developer to drop the AI and work by hand, you spend the $83 to save the $20.

A GitHub study found developers using an AI assistant finished a task 55% faster. So the cap does not just cost you tokens. It trades expensive human time to conserve a cheap input, thousands of times a month, automatically. No CFO would sign off on that trade if you wrote it on a whiteboard.

The money you "save" just hides

Here is the part that should end the argument.

A cap does not stop people from using AI. It moves them. They open a personal ChatGPT or Claude account, paste in the same company code they were working on, and keep going. The work continues, just outside your logging, your data governance, and your security perimeter.

This is not hypothetical. A 2025 Microsoft survey of UK employees found that among workers using unapproved AI, 28% said their company gave them no approved option. Restriction does not curb shadow AI. It is one of the things that causes it. And the cost you pushed off the books comes back worse: IBM found that organizations with high levels of shadow AI saw about $670,000 more in breach costs than those with little or no shadow AI. You turned a visible, governable bill into an invisible liability.

One more problem: the price you capped is already falling. Epoch AI found the cost of reaching the same AI capability has dropped about 50x per year. A ceiling set this quarter is out of date the next. You are managing a moving target with a fixed number, and the number is always wrong.

Tokenomics is unit economics

If a cap is the wrong tool, what is the right one?

Stop treating tokens as a cost to cut and start treating them as money you spend to get output. The question is never "how do we spend less?" It is "what did each dollar produce, and how do we get more?"

You do not have to invent this. The FinOps Foundation, the same group that gave enterprises a way to manage cloud cost a decade ago, has made FinOps for AI an official discipline. It does not tell you to cap spend. It tells you to tie cost to business outcome. Its headline metric, Time to Achieve Business Value, compares the cost of doing the work with AI against the cost of doing it some other way, meaning people. The standard already says it: measure value, not spend.

Six levers of enterprise tokenomics: visibility first, cost per outcome, soft budgets, fund efficiency, right-size models and plans, and showback accountability

Six levers that govern spend without cutting developers off

Here is how to run tokenomics without rationing your team into the shadows.

1. Visibility before ceilings

You cannot manage what you cannot see, and almost nobody can see token spend by developer, by project, or by outcome. I wrote the full playbook for this in The Token Tax: deploy an open source tool like LangFuse, LiteLLM, or Helicone, and within a week you have per-team cost attribution without buying a vendor platform. Start there, before you touch a single limit. You will usually find your top 5% of users drive 40 to 50% of spend, and the data tells you whether they are your best engineers or your most wasteful ones. That answer changes everything you do next.

2. Measure cost per outcome, not cost

Pick a unit that matters to the business: cost per merged pull request, per shipped feature, per closed ticket. Divide token spend by that unit. Now the number means something. A team whose cost per merged PR is falling is getting more efficient even if its total bill is rising, because it is shipping more. Total spend going up is not a problem. Cost per outcome going up is.

3. Soft budgets, not hard cutoffs

Budgets still matter. The difference is between a budget that creates awareness and a cap that creates a wall. Alert the team lead at 70 and 90 percent. Trigger a conversation, not a block. What you never do is cut off a developer in the middle of a task, because that is the exact moment they reach for a personal account. A budget should make people think. It should not make them stop.

4. Fund efficiency so each dollar buys more

This is the lever caps make people forget. Instead of lowering the spend, raise the output per dollar. Progressive disclosure of tool definitions produced an 85 to 100x token reduction in my benchmarks while improving accuracy. A good agent rules file costs 3,000 tokens and saves 15,000 to 20,000 per session. The TOON pattern cut structured-data tokens by 85%. A cap lowers the ceiling on what your team can do. Efficiency raises the floor on what each dollar produces.

5. Right-size the model and the plan

Not every task needs your most expensive model, and not every developer needs metered pricing. Route simple work to cheaper, faster models and save the frontier models for what needs them. Then match the pricing to the usage. A predictable, heavy daily user is often far cheaper on a flat-rate subscription than on per-token billing. I build production enterprise software on a $200 flat subscription for exactly that reason. Moving your power users from metered API access to flat plans can cut their effective cost sharply with no change in behavior.

6. Showback, not punishment

Show each team its own consumption and its own cost per outcome, and let the numbers create the pressure. When an outlier shows up, treat it as a coaching signal, not a violation. Maybe they are your most productive engineer. Maybe they never learned structured prompting and need an afternoon of training. The data tells you which. Cost awareness built into the culture beats cost ceilings imposed on it. Cloud taught us that ten years ago.

What to do this week

Compute your cost per outcome. Take last month's token spend for one team and divide it by something real: merged PRs, shipped features, closed tickets. That single number moves the conversation off gross spend and onto value.

Pressure-test your caps. For any hard cap you run, ask one question: when it bites, what does the developer do instead? If the answer is "work by hand" or "use a personal account," you are spending $83-an-hour labor or six-figure breach risk to save twenty dollars of tokens. Convert those hard cutoffs to soft alerts.

Find your shadow AI. Ask your developers, anonymously, what tools they actually use. The gap between your approved list and their real behavior is the spend your cap relocated instead of removed.

Fund one efficiency investment. Pick your highest-cost project and ship one lever: a shared rules file, progressive context loading, or a model-routing layer. Measure cost per outcome before and after. The return shows up within a week, and it compounds in a way no cap ever will.

The cap will always be tempting, because it is fast and it makes the scary line go flat. But it governs the cheapest input in your business while ignoring the most expensive one, and it does its work by pushing your best people to do more by hand, or to do it somewhere you cannot see. The teams that win the next phase will not be the ones that spent the least on tokens. They will be the ones that got the most out of every one they spent.

Matthew Kruczek is Managing Director at EY, leading Microsoft domain initiatives within Digital Engineering. Connect with Matthew on LinkedIn to discuss token economics, AI cost governance, and developer enablement for your organization.

References

U.S. Bureau of Labor Statistics. "Software Developers, Occupational Outlook Handbook." Median annual wage $133,080 (May 2024). bls.gov
GitHub. "Research: Quantifying GitHub Copilot's Impact on Developer Productivity." Randomized controlled trial, 55% faster task completion. github.blog
Microsoft. UK workplace AI survey (Censuswide), October 2025. 28% of unapproved-AI users cite no approved alternative. ukstories.microsoft.com
IBM. "Cost of a Data Breach Report 2025." Shadow-AI breach premium about $670,000. ibm.com
FinOps Foundation. "FinOps for AI Overview." Official scope; Time to Achieve Business Value KPI; cost tied to business outcome. finops.org
Epoch AI. "LLM inference prices have fallen rapidly but unequally across tasks." Median 50x/year decline. epoch.ai
Kruczek, M. "The Token Tax: Why Untrained Developers Are Your Most Expensive AI Problem." matthewkruczek.ai

Tokenomics: Why a Spend Cap Is the Most Expensive Way to Save Money