How We Cut AI Token Costs 85% with One Design Pattern

Last week, a CTO showed me his Azure OpenAI logs and pointed to something odd: "We're hitting rate limits, but we're not even sending that much data."

We dug in together. His team was sending product catalogs to GPT-4 for analysis, straightforward categorization and insights. But here's what we found: for every 100 products he sent, only about 60% of the content was actual information. The rest? Formatting overhead.

Curly braces. Quotation marks. Colons. Commas. Repeated field names. Over and over again.

That's when I showed him TOON.

The Efficiency Problem Nobody Talks About

When you send data to an AI model, every character matters. Not just for what you pay, for how fast the AI can process it, how much context window you consume, and how quickly you hit rate limits.

And JSON, the format we all use by default, is incredibly verbose.

Think about it like this: imagine explaining something to a colleague, but you had to follow a strict format where you repeated their name before every single sentence. "Hey John, I need you to know this. Hey John, here's another thing. Hey John, one more point." You'd lose patience pretty quickly, right?

That's essentially what's happening with AI right now. We're making it read the same field names hundreds or thousands of times in a single request.

What TOON Actually Does

TOON (Token-Oriented Object Notation) strips away the repetition. It's not revolutionary, it's just applying common sense to a new problem.

Here's a real example. Let's say you're sending employee data to an AI for analysis:

The traditional way (JSON):

{
 "employees": [
 {"id": 1, "name": "Sarah", "department": "Sales", "salary": 75000},
 {"id": 2, "name": "Mike", "department": "Engineering", "salary": 95000},
 {"id": 3, "name": "Jessica", "department": "Sales", "salary": 72000}
 ]
}

The TOON way:

employees[3]{id,name,department,salary}:
1,Sarah,Sales,75000
2,Mike,Engineering,95000
3,Jessica,Sales,72000

Same information. Way less noise.

The AI doesn't need all those braces and quotes to understand your data. TOON provides a lossless serialization of the same objects, arrays, and primitives as JSON, but in a syntax that minimizes tokens. We've just been including them because that's how JSON works and JSON wasn't designed for talking to AI.

Does It Actually Work Better?

I'm naturally skeptical of "revolutionary new formats." Usually they're solving problems nobody has, or they create new issues in the process.

But the testing here tells an interesting story. Independent benchmarks testing LLM comprehension across different input formats using 209 data retrieval questions on 4 models show TOON achieves 73.9% accuracy compared to JSON's 69.7%, while using 39.6% fewer tokens.

That's the surprising part: the AI doesn't just tolerate the simpler format, it actually understands it better. Less clutter means clearer signal.

When This Actually Matters

TOON isn't for everything. Here's how I explain it to engineering teams:

TOON makes sense when:

You're sending lists of similar records (products, users, transactions, events)
You're bumping against context window limits
You're hitting rate limits more often than you'd like
Your data is mostly structured tables and lists, not deeply nested hierarchies
You want faster response times from your AI

Stick with JSON when:

Your data structure changes dramatically between records
You have complex nested objects within objects within objects
You're only making occasional AI calls
Your systems and tools are deeply integrated around JSON

Think of TOON as a specialized format for a specific job. You wouldn't use a race car for grocery shopping, but you also wouldn't use a minivan on a Formula 1 track.

Real-World Impact

Back to that CTO I mentioned. After we implemented TOON for his product analysis pipeline:

His token usage dropped by about 45%
He stopped hitting rate limits during peak hours
His team could send larger datasets in a single API call
Response times improved because there was less data to transmit and parse
The AI made fewer errors in data extraction

He didn't change his AI model. Didn't rewrite his application. Just changed the format he used when sending data to the AI.

Why This Matters Beyond Efficiency

We're still early in figuring out how to build AI systems that scale. TOON, though in its infancy, is getting a lot of attention from the developer community.

But here's the deeper insight: efficient AI systems aren't just about squeezing out performance gains. They're about being able to do things that weren't previously possible.

When you cut your token usage by 40-50%, you don't just get better performance. You can:

Send more context in a single request
Process larger datasets without breaking them into chunks
Keep more conversation history in memory
Build features that would have been impractical before

One client was building a customer service agent but kept hitting the context limit when trying to include the customer's full interaction history. After switching to TOON for the historical data, they could include twice as much context. The AI's responses became noticeably more relevant because it had the full picture.

The Bigger Pattern

TOON is interesting not just as a format, but as an example of a pattern we're going to see more of: reimagining how we interact with AI systems.

For decades, we've been designing data formats for machine-to-machine communication. JSON, XML, protocol buffers—they're all optimized for computers parsing data efficiently.

But AI is different. Language models process information more like humans do—they're looking for meaning and structure, not just parsing syntax. TOON combines YAML's indentation-based structure for nested objects with a CSV-style tabular layout for uniform arrays, creating something that's both machine-readable and closer to how language models naturally process information.

This is just the beginning. We'll see more innovations that optimize for AI consumption rather than traditional parsing.

Getting Started

If you want to explore TOON:

Pick one workflow — Don't try to convert everything at once. Find one place where you're sending repeated, structured data to AI.
Measure your baseline — Track your current token usage and response times for a week. You need to know where you're starting from.
Run a parallel test — Send the same data in both formats and compare. Look at token counts, response times, and output quality.
Evaluate the impact — If you see meaningful improvements in the metrics that matter to your application, expand. If not, you learned something valuable about your specific use case.

For most enterprise applications dealing with structured data at scale, the improvements are significant. But don't take my word for it—test it with your own data and workflows.

The companies that will win with AI aren't just the ones using the best models. They're the ones who understand that how you talk to AI matters as much as what you're asking it to do.

Connect with me on LinkedIn to discuss practical AI optimization strategies, or reach out to our Microsoft Digital Engineering team at EY if you're looking to scale AI implementations more effectively.

Matthew Kruczek

Managing Director at EY

Matthew leads EY's Microsoft domain within Digital Engineering, overseeing enterprise-scale AI and cloud-native software initiatives. A member of Microsoft's Inner Circle and Pluralsight author with 18 courses reaching 17M+ learners.

TOON: Why Your AI Is Working Harder Than It Should (And How to Fix It)