When one agent beats your multi-agent system

You've got a job for an AI agent, and in 2026 the reflex is to reach for a team of them. Split the work, give each agent a role, wire them together. It feels like more agents should mean more capability.

In April, a Stanford team — Dat Tran and Douwe Kiela — tested that reflex, and the result is worth sitting with. Give a single agent the same compute budget as a whole team of agents, and the single agent matched or beat the team — and spent less doing it (arXiv 2604.02460, April 2026). Most of the extra agents were just moving the same information around.

That doesn't mean multi-agent is wrong. It means "more agents" is a design decision with a cost, not a free upgrade. This issue is about how to make that decision — when one agent is enough, when a team earns its keep, and what a "deep agent" actually is. The Reels this week give you the 60-second version; here's the depth.

Three ways to set up agents:

Deciding whether to split a job across agents is a lot like deciding whether to break one function into several. Sometimes splitting clarifies. Sometimes it just adds call overhead and gives bugs new places to hide at the boundaries.

There are three setups, and it helps to hold them as three categories of the same thing:

Single agent. One agent does the whole job, start to finish.
Multi-agent. More than one agent — and this comes in two flavors that get lumped together but behave very differently. Parallel: independent jobs run at the same time, and you combine the results. Chain: one job's output is passed from agent to agent in sequence, each one building on the last.
Deep agent. One agent stays in charge. It plans the job, hands pieces to sub-agents (helper agents that each work in their own clean context and report back), and keeps track of the whole thing.

The one question that sorts almost every case: is this one connected chain of reasoning, or independent jobs? A connected chain is work where each step needs the answer from the step before — find the user's plan, use that to find their limit, use that to explain the charge. Independent jobs are pieces that don't need each other at all.

Walk it through: an AI code reviewer:

Say you want an agent to review a pull request. It checks three things: duplicate code, security holes, and messy formatting.

As a single agent, one agent runs all three checks itself. Simple. It works — it's just splitting its attention across three concerns.

As parallel multi-agent, you notice those three checks don't depend on each other. So you run three agents at once — one for security, one for formatting, one for duplicate code — and combine what they find. This is a good split. The jobs are independent, each agent gets a focused context, and they finish in the time of the slowest one rather than all three added up.

As a chain, you take one of those checks — fixing the duplicate code — and split its steps across agents: one finds the copies, passes them to one that picks which to keep, passes that to one that rewrites, passes that to one that re-tests. Here's the problem. That's a single connected job, and every pass between agents can only lose information, never add it. The agent that rewrites never sees the full reasoning the finder had; it sees a summary. Detail leaks at every step.

This is exactly what Stanford measured. Across five different multi-agent setups and three model families, once they held the compute budget equal, the single agent matched or beat the chain — because the chain spent its budget on passing information around instead of reasoning.

And it's not just an efficiency story; it's a reliability one. Berkeley's MAST study read 1,600-plus failed runs across seven multi-agent frameworks and sorted why they broke (arXiv 2503.13657). About a third of failures — 32.3% — came from what they call inter-agent misalignment: context lost in a handoff, mismatched formats, agents talking past each other. The single largest bucket, 44.2%, was poor design and decomposition — splitting the work badly in the first place. Put those together and the lesson is blunt: most multi-agent pain lives in the wiring, not the agents.

When multi-agent actually wins:

So when is a team worth it? Three clear cases — and they're the mirror image of what breaks above.

When the work is genuinely independent. The three code-review checks are the model here. Independent jobs parallelize cleanly: you get specialization (each agent tuned to one concern), and you get wall-clock speed because they run at once. No chain, no handoff, nothing to lose between steps.

When one agent should plan and direct — a deep agent. For a big, open-ended job, a deep agent plans the work, spins up sub-agents for pieces, and keeps the thread. The quiet advantage is context isolation: each sub-agent digs through its own pile of detail in its own context window and returns only the part that matters, so the main agent's context doesn't fill up with noise. That's the opposite of a chain — the sub-agents do independent digging, they don't relay one growing conversation down a line.

When a single agent's context degrades. If one agent has to hold so much that it starts losing the thread, splitting the context across focused agents can recover quality. Stanford's own authors name this as the case where multi-agent earns its place.

The cost is real and worth stating plainly. A multi-agent setup commonly runs 3 to 10 times the tokens of a single agent for the same task, and coordination overhead climbs fast past three or four agents. If you do need a team, a hierarchical setup helps: cheap, fast models for the sub-agents and one strong model as the planner recovers most of the accuracy at a fraction of the cost. The point isn't "never use a team." It's "make the team carry its weight."

What to do this week:

Before you add a second agent to anything, run the one question: is this one connected chain, or independent jobs?

Then audit one agent system you already have. Find any part you've called "multi-agent" and ask whether it's really one connected job you've chopped into a chain of handoffs. If it is, collapse it back into a single agent and measure — same task, equal budget — the way Stanford did. Keep the agents you split off for the work that's actually independent, or for a deep agent that needs to plan and direct.

You'll often find the team you built was solving a problem you didn't have, while quietly adding the one you now do

What did you think of today's email?
Your feedback helps me create better emails for you! comment down 👇
Loved It 😊
It was ok 🙂
Could be better 🤔

Until next time - Teja Derangula,
Create while it’s easy

One agent beats multi agent

When one agent beats your multi-agent system

Three ways to set up agents:

Walk it through: an AI code reviewer:

When multi-agent actually wins:

What to do this week:

Reply

Keep Reading

NextGen