Microsoft Just Made AI Agents an Engineering Discipline

Executive read

If you only have a minute, here's what you need to know.

Microsoft's 2026 Frontier Stack is a complete reference architecture for treating AI agents as production engineering artifacts, not prototypes.
The stack covers six layers: capabilities, workflows, behavioral constraints, intelligence, infrastructure, and monitoring. No layer is optional.
Agent 365, GA on May 1, gives IT teams a unified control plane to observe, govern, and secure agents across any framework, not just Microsoft's.
The central thesis, "you get exactly what you measure," means teams that skip evaluation will ship agents that feel right but cost 20%+ more and perform worse.
Three mandates define production-grade agents: Build with open protocols (MCP, A2A, AG-UI). Measure with evals per component. Govern with identity and policy at fleet scale.

The stack nobody asked for that everybody needs

IDC projects 1.3 billion AI agents in production by 2028. Eighty percent of Fortune 500 companies are already using Microsoft's agent platform. And yet most enterprise teams I talk to are still building agents the way they built chatbots in 2023: string together an LLM, connect a few tools, run it a handful of times, and ship it when it "feels right."

That approach worked for demos. It does not work when an agent has access to your SAP connectors, your Dynamics instance, and your customers' data.

Microsoft's 2026 Frontier Stack is the first time a major platform vendor has published a complete, opinionated reference architecture for AI agent engineering. Not a product announcement. Not a feature list. An architecture that says: here is what a production agent system requires, layer by layer, and here is how the pieces connect.

I mapped the full stack into a single visual reference to make it easier to see the whole picture. This post is a companion to that diagram, not a rehash of it. If you want the details, open the image. What follows is my take on what this means for teams building agents today.

Build: the tooling layer finally has a spine

The first thing that stands out is how Microsoft organized capabilities into three distinct layers: tools, workflows, and behavioral constraints. That separation matters more than the individual products.

Tools are the atomic actions an agent can take. Logic Apps with 1,400+ enterprise connectors. A cloud-hosted MCP server at mcp.ai.azure.com. Computer Use for screen-level interaction (GA in .NET, preview in Python and TypeScript). Code Interpreter and Azure Functions for compute. Bing Grounding for real-time web data.

Workflows sit on top. The Microsoft Agent Framework merges Semantic Kernel and AutoGen into a unified SDK. It supports MCP, A2A, and AG-UI protocols natively. Hosted Agents provide a managed runtime with zero infrastructure ops. Multi-agent workflows get both a visual designer and a code API.

Behavioral constraints are the layer most teams skip. AGENTS.md files define role boundaries and constraints. Copilot Tuning lets you fine-tune without data scientists. Zero Trust for AI enforces behavioral policy at the platform level. This is where Microsoft makes a clear statement: agents need guardrails baked in, not bolted on.

The separation into three layers is the architecture decision. You can swap out tools without rewriting workflows. You can change orchestration patterns without touching behavioral policies. That's the difference between a product catalog and a reference architecture.

Measure: the eval gate that most teams will ignore

Here is where the stack gets uncomfortable. Microsoft's own AGENTS.md research found that auto-generated context files reduced task success rates while raising inference costs by more than 20%. The files that were supposed to help agents perform better made them slower and less accurate.

The lesson is direct: every tool, every skill, every context file is an engineering decision. Not a default. Not something you add because the template included it. Simpler configurations with rigorous evaluation consistently beat elaborate setups that were never tested against real work.

The Frontier Stack operationalizes this with the Foundry Evaluation SDK. The intended workflow: add a new component, define evaluations specific to that component, measure task success, inference cost, safety score, latency, and context quality. If the metrics don't improve, you revert. You don't ship on instinct.

Alongside evals, the monitoring layer includes OpenTelemetry-native distributed tracing through Azure Monitor, continuous red teaming built into Foundry, and Copilot Metrics for usage analytics. This is not optional instrumentation. It's the feedback loop that separates a prototype from a production system.

"You get exactly what you measure." That's the thesis of the entire architecture. If you're not measuring, you're guessing. Most teams are guessing.

Govern: Agent 365 and the fleet problem

Agent 365 goes GA on May 1, 2026 at $15 per user per month. It's a unified control plane that gives IT, security, and business teams visibility across every agent in the organization, whether those agents are built on Foundry, Copilot Studio, AutoGen, LangGraph, or third-party platforms.

Four capabilities: observe behavior and risk signals across all agents. Govern with policy enforcement and RBAC through Microsoft Entra. Secure with Defender, Purview, and Zero Trust for AI. And a registry, because Microsoft is already tracking over 500,000 agents internally.

The fleet problem is something most organizations haven't confronted yet. When you have five agents, you can manage them manually. When you have fifty, you need tooling. When you have five hundred, you need a control plane. Agent 365 is Microsoft betting that the fleet scenario arrives faster than most IT teams expect.

Underneath the control plane, Entra Agent Identity gives each agent its own enterprise identity. Purview labels are enforced at query time, not as an afterthought. Azure AI Content Safety provides input/output guardrails. This is the governance layer that compliance and security teams will care about, and it's the layer that most agent prototypes are missing entirely.

What this means for your team

The three mandates at the bottom of the Frontier Stack are the simplest summary of what changed: Build with the Agent Framework, Foundry, and open protocols. Measure with evals, tracing, and red teaming. Govern with Agent 365, Purview, and Zero Trust.

If your current agent work only covers the Build column, you're building demos. Production requires all three.

Here's what I'd do this week:

Audit your eval coverage. Pick your most important agent workflow. Can you measure its task success rate? Its inference cost per run? Its safety score? If you can't answer those questions with numbers, you're shipping on instinct. Start with the Foundry Evaluation SDK or build your own, but start.

Separate your layers. Look at how your agents are built today. Are tool definitions, orchestration logic, and behavioral constraints tangled together in one codebase? Separating them makes each layer independently testable and independently upgradable. The Frontier Stack's three-layer model is a good target state.

Plan for the fleet. If you have agents in production, start documenting them in a registry. If you're on Microsoft 365, evaluate Agent 365 when it goes GA on May 1. If you're not, build the equivalent: a catalog of what agents exist, what they can access, and who is responsible for them. The fleet problem will arrive before you're ready for it.

Microsoft declared 2026 the Year of the Agent. The Frontier Stack is the architecture that backs up that claim. It's not perfect, and not every organization will use every layer. But it's the most complete reference architecture for agent engineering that any major vendor has shipped to date.

The standard is here. The question is whether your team is building to it.

References

IDC. "Worldwide AI Agent Forecast." 2025. Projecting 1.3 billion agents by 2028.
Microsoft. "AI Agents in Azure AI Foundry." 2026. learn.microsoft.com
Microsoft. "Agent 365 Overview." 2026. learn.microsoft.com
Microsoft. "AGENTS.md Specification." 2026. github.com
Microsoft. "Microsoft Agent Framework." 2026. github.com
Kruczek, M. "The Agent-First Enterprise: Why Skills Are the Missing Link." matthewkruczek.ai. matthewkruczek.ai

This article is part of "The Agent-First Enterprise" series exploring how organizations can build production-grade AI agent systems. Connect with me on LinkedIn or Substack to discuss agent architecture and engineering practices for your organization.

Matthew Kruczek

Managing Director at EY

Matthew leads EY's Microsoft domain within Digital Engineering, overseeing enterprise-scale AI and cloud-native software initiatives. A member of Microsoft's Inner Circle and Pluralsight author with 18 courses reaching 17M+ learners.

Microsoft Just Made AI Agents an Engineering Discipline

Executive read

The stack nobody asked for that everybody needs

Build: the tooling layer finally has a spine

Measure: the eval gate that most teams will ignore

Govern: Agent 365 and the fleet problem

What this means for your team

References

Matthew Kruczek

Continue Reading

You're Measuring Agentic AI Wrong

Computer-Using Agents in Microsoft Foundry

Why Skills Are the Missing Link