A developer asked me yesterday what they should learn next in AI. They expected me to say "pick an agent framework." Microsoft Agent Framework, LangGraph, CrewAI. The hot stuff.
I told them to learn context window management instead.
They looked at me like I'd suggested they learn to type.
Executive Read
If you only have a minute, here's what you need to know.
- Context window management is the foundational skill for AI development. Before you touch an agent framework, you need to understand how to control what goes into a model's context, how to structure it, and how to recognize when it's degrading. This is the skill that separates developers who get real value from AI from those who don't.
- Every agent is just a context window with a job. When you create an agent in Microsoft Agent Framework, you're defining a context window. The
instructionsparameter is your system prompt. The tools you attach determine available actions. Quality depends on context design, not framework configuration. - Skills are context engineering made portable. Agent Skills package instructions, constraints, and references into self-contained context bundles that solve the progressive disclosure problem: loading the right knowledge at the right time instead of cramming everything in.
- Agent framework problems are almost always context problems. Poor agent coordination, quality degradation over long runs, duplicated work. These are symptoms of poorly managed context windows, not framework limitations.
- The practical path: localhost before production. Master single-context interactions first. Package repeatable work into skills. Then wire them into Microsoft Agent Framework workflows. The orchestration is the last step, not the first.
The Sequencing Problem
Here's a number that should feel familiar: 70% of enterprise AI initiatives fail to move beyond pilot phases [1]. I've cited that statistic before in this series, and I keep coming back to it because the root cause keeps showing up in different forms. This time, the form is a sequencing problem.
Developers are jumping straight to multi-agent orchestration, the most complex thing you can build, without understanding the single most important constraint their agents operate under: the context window.
It's like trying to build distributed microservices when you've never gotten an app running on localhost. You can learn Kubernetes all you want, but if you can't get the app running on your own machine, the orchestration layer isn't going to save you.
Context windows are your localhost. Agent frameworks are your production deployment. One comes before the other, and skipping ahead just means you'll debug harder problems later with less understanding of why things broke.
Anthropic published research in February 2026 analyzing millions of human-agent interactions across Claude Code and their public API [2]. One finding stood out: experienced users grant agents more autonomy over time, but they also interrupt more frequently. That's not a contradiction. It means experienced users have developed an intuition for when the agent's context is working well and when it isn't. They've learned to read the context window, even when they can't see it directly.
That intuition is the skill I'm talking about. And you can't develop it by starting at the framework level.
What Context Window Management Actually Means
When I say "context window management," I don't mean knowing that Claude has 200K tokens or GPT-4o has 128K. That's trivia.
I mean understanding how to make every token count.
What goes in and what stays out. Your context window is not a dumping ground. Every piece of information you put in competes with every other piece for the model's attention. I've watched developers stuff entire codebases into context and wonder why the model gives vague answers. You wouldn't hand someone a 500-page binder and ask them a question about page 247 without at least a bookmark.
Structured context beats raw context. There's a big difference between pasting a file and providing structured agent instructions that tell the model what the project is, what conventions to follow, and where to find things. This is exactly what AGENTS.md files do in Claude Code, and what system prompts do in Microsoft Agent Framework. The model doesn't just need information. It needs information organized the way it can use it.
Knowing when context is degrading. Long conversations drift. The model starts losing track of early instructions. Important details from 50 messages ago might as well not exist. Recognizing when this is happening, and knowing how to reset or restructure, is a practical skill that most developers have never thought about.
Understanding attention patterns. Not all tokens in the context window receive equal attention. Research on the "lost in the middle" phenomenon has demonstrated that information placed at the beginning and end of the context window has more influence than content buried in the middle [3]. In practice, this means the placement of your instructions matters as much as the content itself. Critical constraints belong at the top of your system prompt, not halfway down a wall of text. This is a design decision that most developers never consider, and it shows up directly in output quality.
These principles apply at every level, from a single Copilot prompt to a multi-agent pipeline running on Azure [4].
The Localhost Analogy
Think about what good localhost development teaches you. You understand the full request lifecycle. You see every input, every transformation, every output. Nothing is hidden behind a load balancer or message queue. You can debug anything because you have full visibility into one running process. You learn what actually matters: which data the app needs, which it doesn't, how state flows through the system.
I've said before that your full-stack education doesn't stop at localhost [5]. You need to understand deployment too. The same applies here, but in reverse. You need to master localhost before production makes any sense.
Context window management is the same discipline applied to AI. Working with a single model in a single session forces you to understand what information the model actually needs, how to structure it so the model uses it effectively, when the model is losing the plot and needs a reset, and how to decompose a big task into pieces that fit within attention limits.
These aren't theoretical skills. They show up every time you interact with an LLM, whether it's a prompt in GitHub Copilot, a conversation in Claude, or an API call inside a Microsoft Agent Framework pipeline.
The parallel goes further. On localhost, you learn to profile performance, manage memory, and understand resource constraints before you think about horizontal scaling. Context window management is the profiling and memory management of AI development. You need to understand the resource constraints of a single model before you start distributing work across multiple agents.
Skills as Context Engineering
Skills are the mechanism that makes agent-first organizations practical [6]. But there's a more fundamental way to think about them: skills are context engineering made portable.
A skill is a self-contained context package. Instructions, examples, constraints, and reference documents bundled together so the model has exactly what it needs for a specific type of task. When I set up a skill for processing incident reports, it includes the triage criteria, the escalation workflow, the formatting standards, the compliance requirements. The model doesn't have to figure any of that out from scratch. It just executes with precision.
The key insight worth restating here is progressive disclosure. Research from late 2024 found that LLM decision-making degrades when presented with more than 20-25 tools simultaneously [7]. Skills solve this through progressive disclosure: metadata at startup (roughly 100 tokens per skill), full instructions when a matching request arrives, resource files only on demand. The knowledge a skill can contain is effectively unlimited while context consumption stays bounded.
That's context window management in action. Not cramming everything in. Loading the right knowledge at the right time.
A well-designed skill is scoped: exactly the context needed for one type of work, nothing more. It's structured, with instructions and references organized so the model can find what it needs. It's self-contained, so the model doesn't need background from a previous conversation. And it's testable: you can evaluate whether the skill produces good output on its own, before plugging it into anything larger.
Hold those four properties in mind. They're about to become important.
How This Maps to Microsoft Agent Framework
Here's the thing nobody tells you about agent frameworks: every agent is just a context window with a job.
When you create an agent in Microsoft Agent Framework, you're defining a context window. The instructions parameter is your system prompt. The tools you attach determine what actions are available. The thread manages conversational state. The quality of that agent's output depends almost entirely on how well you've engineered that context.
Microsoft Agent Framework supports six core orchestration patterns [8]. Every single one of them succeeds or fails based on context design, not framework configuration. Let me connect the dots.
Sequential pipelines depend on context handoff. In a pipeline where a researcher agent passes output to a writer agent and then to a reviewer, the quality of the final output is determined by what each agent receives in its context. If your researcher dumps raw findings without structure, the writer has to spend its reasoning capacity organizing instead of writing. If you've practiced structuring context for a single model, you already know how to design clean handoffs between pipeline stages. The skill you built for structuring research output? That's your pipeline's handoff protocol.
Parallel fan-out depends on context scoping. When you spin up multiple agents simultaneously, each one needs a tightly scoped context that covers its specific responsibility without bleeding into the others. This is exactly what skills teach you. A skill scoped to "security review" doesn't include instructions for "technical writing." The same principle applies when you create specialized agents in a parallel workflow. Remember those four properties of a well-designed skill? Scoped, structured, self-contained, testable. Those are the same four properties of a well-designed agent.
Routing depends on context classification. The orchestrator agent that decides which specialist handles a request is making a context-dependent decision. It needs enough information to classify the request, but not so much that it gets confused. If you've never thought about what minimum context a model needs to make a good decision, your router agent is going to misclassify requests. I've seen teams debug routing failures for days before realizing the issue was that the router's system prompt was overloaded and the classification criteria were getting lost in the noise.
Hierarchical orchestration depends on context summarization. When a manager agent coordinates sub-agents and synthesizes their outputs, it needs to compress multiple context windows into a coherent result. This is the same skill as managing a long conversation: knowing what to keep, what to summarize, and what to discard. If you can't manage context in a single thread, you won't be able to manage the aggregation of multiple threads running through Azure OpenAI.
The problems developers bring to me about agent frameworks confirm this pattern. "My agents aren't coordinating well" means the context being passed between agents carries too much noise or too little signal. "Output quality degrades over long runs" means accumulated context is getting stale. "My agents keep duplicating work" means each agent's context doesn't include awareness of what others have done.
These are context problems wearing framework clothes.
The Practical Path
Here's what I'd recommend if you want to build effective AI agent systems on the Microsoft stack. This follows the crawl-walk-run approach [9].
Crawl: Master the single context window. Use GitHub Copilot in VS Code, Claude Code, or the Azure OpenAI API directly. Get good at writing system prompts that produce consistent results. Build AGENTS.md files for your projects. Pay attention to when the model is sharp versus when it's drifting. Learn to recognize the symptoms of bloated or poorly structured context. Experiment with prompt placement, token budgeting, and structured instructions. This phase is your localhost work. Don't skip it.
Walk: Build skills. Take repeatable tasks in your organization and package them. Clear instructions, structured inputs, expected outputs, reference documents. Test them in isolation. Refine them. Notice how the quality improves when you're deliberate about what goes into context versus when you're improvising. Microsoft has adopted Agent Skills in VS Code and GitHub Copilot [10], so this isn't abstract theory. It's the direction the tooling is heading.
Run: Connect skills into agent workflows. This is where Microsoft Agent Framework becomes powerful. But by now you understand what each agent needs in its context, how to design clean information handoffs, and how to debug when something goes wrong. You've done the localhost work. The framework is infrastructure. You already know what runs on it.
I built my own multi-agent systems this way. Not by starting with an orchestration framework and hoping for the best, but by spending months getting good at single-context interactions, packaging repeatable work into skills, and then wiring them together. The agent orchestration was the last step, not the first.
The Uncomfortable Question
If you can't make a single LLM interaction work well, if your prompts are vague, your context is unstructured, and you're relying on the model to figure out what you want, why do you think adding more agents will help?
More agents means more context windows to manage. More handoffs between them. More places for information to get lost or garbled. If you don't have the fundamentals, an agent framework just multiplies your problems.
Learn to build on localhost first. The production deployment will make a lot more sense afterward.
Matthew Kruczek is Managing Director at EY, leading Microsoft domain initiatives within Digital Engineering. Connect with Matthew on LinkedIn to discuss context engineering and agent architecture for your organization.
References
- Enterprise AI adoption studies, various sources including McKinsey and Gartner, 2024-2025
- Anthropic. "Measuring AI Agent Autonomy in Practice." February 2026. anthropic.com
- Liu et al. "Lost in the Middle: How Language Models Use Long Contexts." Stanford/UC Berkeley, 2023. arxiv.org
- Kruczek, M. "From Prompt to Performance: Context Engineering for Enterprise." matthewkruczek.ai, 2025
- Kruczek, M. "Expert Advice for Developers in the AI Era." matthewkruczek.ai, 2025
- Kruczek, M. "The Agent-First Enterprise: Why Skills Are the Missing Link." matthewkruczek.ai, 2025
- Anthropic. "Introducing Agent Skills." Open standard at agentskills.io, December 2025
- Kruczek, M. "Implementing Multi-Agent Systems in Microsoft Stack." matthewkruczek.ai, January 2026
- Kruczek, M. "The Agent-First Enterprise: How Do I Get Started?" matthewkruczek.ai, 2025
- Microsoft. "Agent Skills Support in VS Code and GitHub Copilot." 2025
