If you only have a minute, here's what you need to know.
- Your developers got faster. Your company did not. The hard part used to be writing code. Now it is checking and combining all the code the agents produce, and most org charts are still built for the old hard part.
- I would not start with an org chart. I would start with the smallest team that can ship something on its own: four people working with a large, flexible set of agents.
- The scarce skill is no longer writing code. It is being clear about what to build, directing the agents, and judging whether what comes back is actually right.
- Checking the work becomes a real job with its own headcount and its own number to hit, not a quick review at the end. AI-coauthored pull requests carry roughly 1.7x more review findings.
- You grow this org by adding small teams, not management layers. The pyramid gets flatter, the shared platform underneath gets deeper, and you start counting cost per result instead of cost per engineer.
- The career ladder and the performance reviews break first. If you change what people do but still grade them the old way, the new structure quietly slides back to the old one.
The problem hiding in every engineering org right now
AI coding tools make individual developers faster. They do not make the company faster. Developers ship more code, more quickly, and the business as a whole barely moves. That gap is the most important fact in engineering today, and almost no one has redesigned their teams around it.
By 2025, 84% of developers were using or planning to use AI coding tools, and about half use them every day. People finish tasks quicker, pull request cycle time has dropped by roughly a third in production studies, and developers save a few hours a week. Every small measure looks great. The whole system stays flat.
The reason is simple once you say it plainly. The hard part moved, and the org chart did not move with it.
For thirty years we built engineering organizations on one belief: writing correct code is the slow, expensive step. Team sizes, manager ratios, review steps, hiring, the career ladder, all of it was tuned to get more good code out of people. That belief is now wrong. Code is cheap and there is plenty of it. What is scarce is deciding what to build, saying it clearly, and confirming the result is right. When you flood a structure built to ration code with an endless supply of cheap code, you get exactly what the numbers show: fast in the small, stuck in the large.
Developers using AI merge far more pull requests. Company-level delivery metrics show no matching gain. And AI-coauthored pull requests average about 1.7x more review findings than human-only ones.
— Faros AI, The AI Productivity Paradox (2025); CodeRabbit, AI vs Human Code Generation (2025)
If I were building an engineering organization from scratch today, knowing the people in it would work with agents rather than code by hand, I would not redecorate the old structure. I would build a different shape. Here is the shape.
Start with the small team, not the org chart
Most leaders design from the top down: how many VPs, how many directors, who reports to whom. I would design from the bottom up, starting with the smallest team that can ship value on its own. Get that team right and the rest of the org is just copies of it arranged sensibly.
That team is a pod: four people working with a large, flexible set of agents.
Anatomy of an agentic pod: four people, a flexible set of agents, one result.
Four people, and only four, because the limit is no longer how many hands are on keyboards. It is how many streams of work a small group can direct and check before quality slips. In the old model, adding a fifth and sixth person added output. Here it mostly adds coordination. The agents are the output. The people are the judgment.
The roles inside the pod are not the roles you hire for today, and the difference is real.
The Pod Lead directs the work. They take the request, break it into pieces the agents can work on at the same time, and own the result. They are not the best coder. They are the best at turning a vague ask into a clear plan and keeping the whole picture in their head while a dozen agents work under them. In the old org this person was a tech lead who still wrote a third of the code. Here they write almost none and are responsible for all of it.
Two engineers do what used to be called development. The work is now design, writing clear specs, directing agents, and combining the results. Each one runs several agents at once on different pieces and stitches the output into something that holds together. The skill that matters is system design and knowing the difference between an answer that looks right and one that is right. That second skill is the whole job, and it is the one the tools cannot do for you.
One quality engineer owns checking the work as a real job, not a step at the end. They build and maintain the tests and automatic checks that every piece of agent output has to pass before it counts as done. More and more shipped code is machine-written, and that code carries more defects: the share of code reverted or reworked soon after it lands has roughly doubled since 2021. This is the role that keeps speed from turning into a mess. Most companies do not have this role today. They have an underfunded QA team and a code review step everyone rushes through. In an agentic org, checking the work is not a checkbox. It is a person with a number to hit.
The four roles, what they own, and what they are measured on.
The pod works from clear, testable specs, not long how-to documents. The request comes in as a plain statement of what to build and how to prove it works. People and agents use the same document. This is the real change underneath everything else: the thinking moves from doing the work to defining the work, from how to do it to what "done" means.
How a feature actually moves through the pod
Structure is abstract until you watch work move through it, so follow one feature.
A request comes in: the catalog team needs to support product bundles that share inventory. In the old org, an engineer picks up the ticket, writes the code over a few days, opens a pull request, waits for review, fixes the comments, and merges. The critical path is one person writing one thing.
In the pod, the Pod Lead first turns the request into a spec: the data change, the API, the edge cases, and most importantly the acceptance tests that do not exist yet. That spec is the unit of work. The two engineers then split it up. One directs agents on the database and the migration while the other has agents drafting the API and the client changes, all at the same time. What used to happen in order now happens at once, because running another agent costs almost nothing.
Then the work hits the gate. The quality engineer has already turned the acceptance tests into automatic checks, so agent output is tested against real rules and the quality bar before a person spends any attention on it. This is the part that makes the model work: agents produce a lot in parallel, the checking is automatic and constant, and people spend their limited attention only on the things that pass the machine and still need a human call.
I run my own work this way through the agentic harness I built. Agents produce work in the background and it shows up in an approval feed, where a person approves, rejects, or edits each item before it moves on. That feed is the whole org idea in miniature: people are not watching every keystroke, they sit at the gate where their judgment is the value. A pod is that same idea with four people instead of one.
Build the org in three layers, not a pyramid
Once the pod is the building block, the org is just how you arrange pods. I would build it in three layers.
The agentic engineering org in three layers: direction, stream-aligned pods, and platform.
The top layer is direction, and it stays small on purpose. Engineering leadership, architecture, risk levels, and the quality bar live here. Their job is to set the standard, not run the work. They decide which work is high-risk and needs a human sign-off on every change, and which is low-risk and can run with lighter checks. They define the quality bar the quality engineers enforce. In the old org this layer grows because getting teams to work together is expensive. When pods ship their own work on their own, you need much less of it, which is why the top of this org is thin where old orgs are heavy.
The middle layer is the pods, and this is where value ships. Each pod owns a part of the product from end to end: payments, identity, catalog, whatever fits your business. There are no separate component teams that everyone else has to wait on, because the shared platform and the agents cover most of what those teams used to do. You add capacity by adding pods, not by adding managers between the work and the people setting direction.
The bottom layer is the shared platform, and it runs deep. This is the part most companies underbuild, and it decides whether the whole thing works. The agent platform, the shared skills and tools, the connections to your systems, the testing setup, the data, and the guardrails that keep agents inside their limits. Every bit of extra speed a pod gets comes from this layer. A skill written once and shared makes every pod faster on that task forever. A testing setup built once gives every quality engineer a head start. Build this as a real product with its own team and roadmap, or every pod will build a worse version of it on its own and you will pay for it four times over.
This is Team Topologies adapted for agents, not thrown out. The patterns still hold. What changes is team size. SAFe guidance, built on years of human coordination, puts an agile team at five to eleven people. With agents carrying the volume, I would run pods of four that own the scope that used to take a team of nine, and put the headcount I saved into the platform layer instead. That last part is my bet, not settled practice: there is not yet hard data on the right size for an AI-augmented team.
The career ladder breaks, and you have to rebuild it
This is the failure I would worry about most, because you cannot see it until it has already undone the structure. You can change what people do and still grade them on the old things, and if you do, the org slides back.
The old ladder rewards writing code and owning more code. Senior means you write the hard parts. Staff means you write the hard parts and review everyone else's. None of that fits a pod. If your promotion rules still reward how much code someone writes, your best people will keep writing code by hand to look busy, and they will skip the directing and checking skills the org actually needs.
So the ladder has to change with the structure. People should move up for taking bigger, vaguer problems and turning them into specs a set of agents can build correctly. Reward the judgment to catch the answer that only looks right, the design that keeps a product area stable, and for quality engineers, the bug that never shipped because the check caught it. The role chart in this article is not just an org diagram. It is the start of a pay conversation, because what you measure is what you get.
The money is why this is a leadership decision
This is not a productivity story. It is a cost story, and that is what moves it from the CTO's tool budget to how the business runs.
In the old model, cost rises roughly in step with headcount. More output means more engineers, more managers, more coordination, a pyramid that grows in both directions. The agentic pod breaks that line. A pod of four can run more than twenty agents per person. You add capacity by adding agents to a fixed core of people, and you add scope by standing up another pod, not another layer of management. Cost stops rising in step with output.
Engineering has always needed tight manager ratios of 1:4 to 1:10 because supervising hard technical work is hard. Agents do not let one manager oversee more people. They let a small group direct far more work.
That difference matters. You are not asking a manager to watch more people. You are giving a small group control over a much larger amount of work through agents. The number you manage to stops being cost per engineer and becomes cost per result shipped at the quality bar you set. The quality bar is the key part of that sentence, and it is why the quality engineer is not optional: the moment you start counting cost per result, the gate is what keeps bugs from quietly running that cost back up.
Run the math on a made-up mid-size product group. Sixty engineers in the old shape, split into nine teams with the managers and coordination that implies. The same scope in the new shape might be eight pods of four, so thirty-two people in the product areas, a thin direction layer, and a real platform team of eight. Fewer people, more output, and most of the saved headcount put back into the platform that makes every pod faster. The exact numbers are not the point. The point is that the shape changes what a given headcount can produce.
Where this breaks
I am describing the goal, not pretending the path is clean. The honest failure modes are worth naming.
Skipping the checks. If you scale up agent output before the testing setup and the quality engineer are real, you ship the wave of bugs the data warns about and spend your gains fixing them. Build the gate before you open the floodgates.
Treating the platform as an afterthought. Starve the platform layer and pods will each build their own, the shared speed disappears, and you get the worst of both worlds: agents everywhere and no shared benefit.
The judgment gap. This model needs people who can define the work clearly and judge the output. That skill is genuinely scarce today, and you cannot hire your way out of it overnight. What slows you down is not agent licenses. It is people who can direct and judge, which is the same shortage the productivity numbers keep showing.
Regulated and legacy reality. Not every part of the business can become a clean pod tomorrow. High-risk and heavily regulated work needs more human sign-off, and old systems carry coordination costs that a fresh start does not. The three-layer shape still applies. The ratios and the checks change.
What I would do this quarter
You may not be able to rebuild your org from scratch. You can still run the experiment that proves the shape.
Stand up one real pod, not a pilot. Pick a contained part of the product with clear success criteria. Staff it with four people in the roles above and give it a real set of agents. Have it ship something that matters, then study what actually broke.
Hire or name your first quality engineer now. Before you scale up agent output, build the role that checks it. If checking stays a step instead of a job, the wave of bugs will eat your speed, and the data already shows it happening across the industry.
Fund the platform on purpose. Name an owner for the shared agent platform, skills, and testing setup, and treat it as a product with a roadmap. If every pod builds its own, you have no shared speed, just scattered cost.
Rewrite one rung of the ladder. Change the promotion rules for one level to reward clear specs, directing agents, and checking work instead of code volume. Watch where your best people put their effort once the reward moves.
Change one number. Stop reporting velocity and output as wins. Start measuring cost per result shipped at your quality bar. The first number will be uncomfortable. That discomfort is the gap between being busy and creating value, and closing it is the whole job.
The technology to build this org exists today. What is scarce is the willingness to stop optimizing the old hard part. Your developers already got faster. Whether your company does is an org design question now, not a tooling one.
Matthew Kruczek is Managing Director at EY, leading Microsoft domain initiatives within Digital Engineering. This article is part of "The Agent-First Enterprise" series. Connect with Matthew on LinkedIn to discuss restructuring engineering for the agent era.
References
- Faros AI. "The AI Productivity Paradox." faros.ai
- CodeRabbit. "State of AI vs Human Code Generation." 2025. coderabbit.ai
- GitClear. "AI Assistant Code Quality: 2025 Research." gitclear.com
- Stack Overflow. "2025 Developer Survey: AI." survey.stackoverflow.co
- "Intuition to Evidence: Measuring AI's True Impact on Developer Productivity." arXiv:2509.19708, 2025. arxiv.org
- Scaled Agile Framework. "Agile Teams." framework.scaledagile.com
- McKinsey. "How to Identify the Right Spans of Control for Your Organization." mckinsey.com
- Skelton, Matthew. "Team Topologies as the Infrastructure for Agency with Humans and AI." QCon London 2026. infoq.com