Agent Harnesses Need Fewer Layers, Not More

If you only have a minute, here's what you need to know.

Agent harnesses, the infrastructure wrapping AI agents for production use, are becoming the defining architecture pattern of 2026. But most enterprises are building them wrong.
Evidence from Vercel, Manus, and OpenAI's Codex shows that stripping tools and complexity consistently outperforms adding more scaffolding. Vercel cut 15 specialized tools down to 2 and saw accuracy jump from 80% to 100%.
Microsoft's Agent Framework takes a disciplined approach: approval workflows, context compaction strategies, and dual-language support in Python and .NET, without the bloat that plagues custom harness implementations.
The enterprises that will succeed with agent infrastructure aren't the ones with the most sophisticated orchestration layers. They're the ones with the fewest.
If your agent harness has more abstraction layers than your agent has tools, you've already lost the plot.

The AI industry spent 2025 building agents. 2026 is the year we figure out how to control them.

A new term is circulating in enterprise architecture circles: agent harness. It refers to the infrastructure layer that sits between an AI model and the real world, managing tool access, approval workflows, context windows, error recovery, and state persistence. Think of the model as the engine and the harness as the car. Without the car, the engine is impressive but useless. Without a good car, the engine destroys itself.

The concept isn't new. Anyone running agents in production has been building some version of a harness for the past year. What's new is that the industry is now treating harness engineering as a first-class discipline, with dedicated frameworks, design patterns, and an emerging consensus that the harness, not the model, determines whether agents succeed or fail in production.

I agree with that premise. But I'm watching enterprises draw exactly the wrong conclusion from it.

The complexity trap

The natural instinct when you recognize the harness matters is to build more of it. More abstraction layers. More specialized tools. More governance checkpoints. More orchestration logic. Enterprise architects see the agent harness and think: finally, something I can over-engineer.

The evidence points in the opposite direction.

Vercel's agent team ran an experiment that should be required reading for every enterprise architect. They had 15 specialized tools powering their AI coding agents. They removed 13 of them, keeping only two: bash execution and SQL queries. The result? Accuracy jumped from 80% to 100%. Token usage dropped 37%. Speed improved 3.5x. Fewer tools, dramatically better outcomes.

Manus, the autonomous agent framework, tells the same story. The team rebuilt their agent system four times. The greatest performance gains came not from adding capabilities but from removing complexity. They implemented filesystem-as-memory, aggressive context compaction (100:1 input-to-output ratio), and KV-cache optimization. The result was a 10x cost reduction through pure infrastructure simplification.

OpenAI's internal Codex agents converged on identical principles independently. Minimal, general-purpose tools. External state persistence through git and files. Structured error retention. Strict context discipline.

Three separate teams, three different organizations, one conclusion: the best agent harness is the one with the least in it.

Why less works better

This isn't counterintuitive once you understand the mechanics.

Every tool you add to an agent's context window competes for the model's attention. I wrote about this in my piece on progressive disclosure for MCP servers: 400 tools can consume 400,000+ tokens, exceeding even the largest context windows. But the problem isn't just token count. It's decision fatigue. Models, like humans, make worse choices when presented with too many options.

Specialized tools also create routing problems. When you give an agent 15 ways to accomplish similar tasks, it spends reasoning cycles figuring out which tool to use instead of solving the actual problem. Strip it down to bash and a database connection, and the model focuses its reasoning on what matters: accomplishing the objective.

Richard Sutton's bitter lesson from machine learning applies directly here. General methods that use computation beat specialized methods that try to encode human knowledge about the domain. Complex scaffolding becomes obsolete as models improve. The harness should simplify with model upgrades, not accumulate complexity.

What Microsoft gets right

This is the lens through which I've been evaluating Microsoft's Agent Framework, and its recently published agent harness patterns specifically.

The framework focuses on three building blocks: local shell execution with approval gates, hosted shell in managed environments, and context compaction. That's it. Three patterns, not thirty.

The approval workflow is particularly well-designed. In Python, you decorate a tool with @tool(approval_mode="always_require") and the framework handles the rest. In .NET, you wrap tools with ApprovalRequiredAIFunction. The pattern is explicit and minimal. There's no sprawling governance layer, just a clear gate at the point where the agent wants to do something irreversible.

Context compaction is where the real discipline shows. Long-running agent sessions inevitably exceed context windows. Microsoft's approach offers composable strategies: sliding window, tool result compaction, and truncation, combined through a pipeline. The .NET implementation chains ToolResultCompactionStrategy, SlidingWindowCompactionStrategy, and TruncationCompactionStrategy into a single PipelineCompactionStrategy. Configurable. Composable. Not over-abstracted.

And the dual-language support in Python and .NET matters more than it might seem. Enterprise teams aren't monolingual. The same harness patterns working identically in both ecosystems means your .NET backend team and your Python data science team can build agent infrastructure using shared concepts. That's practical enterprise architecture, not marketing.

The governance question

I can already hear the objection: "But we need governance. Compliance. Audit trails. We can't just give agents bash access and hope for the best."

Fair. But governance doesn't require complexity. The CNCF's framework for autonomous enterprise governance identifies four pillars: golden paths (pre-approved configurations), guardrails (policy enforcement), safety nets (automated recovery), and manual review gates. Notice what's missing: there's no pillar for "add seventeen orchestration layers."

The most effective governance I've seen in production agent systems follows a simple rule: intervene only when the model can't self-correct. That means approval gates for irreversible actions, sandboxing for execution environments, audit logging for everything, and nothing else. Every additional governance mechanism is a tax on agent performance that needs to justify its existence with a specific risk it mitigates.

What to do this week

If you're building or evaluating agent harness infrastructure, here's my recommendation:

Audit your tool count. If your agents have access to more than 5-7 tools, run the Vercel experiment. Strip down to the minimum general-purpose set and measure the difference. You may be surprised.

Adopt a framework, don't build from scratch. Microsoft's Agent Framework, LangGraph, or similar production-tested frameworks have already solved the foundational problems. Your engineering effort should go into your specific approval workflows and domain logic, not reinventing context management.

Measure harness complexity as a cost. Every abstraction layer, every custom tool, every governance checkpoint has a performance cost in tokens, latency, and error surface. Track it. If a layer doesn't measurably improve outcomes, remove it.

Design for deletion. As models improve, your harness should get simpler, not more complex. Build infrastructure that's easy to remove. The scaffolding you need today for GPT-4o may be unnecessary for whatever ships next quarter.

The enterprises that will win the agent infrastructure race aren't building the most sophisticated harnesses. They're building the most disciplined ones. And discipline, in this context, means knowing what to leave out.

References

Microsoft Agent Framework. "Agent Harness in Agent Framework." March 12, 2026. devblogs.microsoft.com
Pappas, E. "The Agent Harness Is the Architecture." DEV Community, 2026. dev.to
HTEKDev. "Agent Harnesses: Why 2026 Isn't About More Agents." DEV Community, 2026. dev.to
OpenAI. "Harness Engineering: Codex Agents." InfoQ, February 2026. infoq.com
Gupta, A. "2025 Was Agents. 2026 Is Agent Harnesses." Medium, 2026. medium.com
Kruczek, M. "Progressive Disclosure for MCP Servers." matthewkruczek.ai. matthewkruczek.ai
CNCF. "The Autonomous Enterprise and the Four Pillars of Platform Control." January 23, 2026. cncf.io
Sutton, R. "The Bitter Lesson." 2019. incompleteideas.net

This article is part of "The Agent-First Enterprise" series exploring how organizations can transform their operations around AI agent capabilities. Connect with me on LinkedIn or Substack to discuss agent harness architecture and production AI infrastructure for your organization.

Matthew Kruczek

Managing Director at EY

Matthew leads EY's Microsoft domain within Digital Engineering, overseeing enterprise-scale AI and cloud-native software initiatives. A member of Microsoft's Inner Circle and Pluralsight author with 18 courses reaching 17M+ learners.

Agent Harnesses Don't Need More Layers. They Need Fewer.

The complexity trap

Why less works better

What Microsoft gets right

The governance question

What to do this week

References

Matthew Kruczek

Continue Reading

Progressive Disclosure for MCP Servers

Agent Skills: The Missing Link in Your AI Agent Stack

Context Engineering for Enterprise AI