
Claude Code subagents are isolated sub-sessions you spawn with the Task tool. Each one runs in its own context window, gets its own tool budget, and returns a single summarized result to the parent. You use them for three reasons: parallel work that would block a single thread, context-window protection on large codebases, and specialized roles (research, planning, review) that benefit from a clean slate.
That is the answer. The rest of this post is when subagents pay off, when they actively hurt you, and the patterns that the strongest Claude Code users have settled into through 2026.
In Claude Code, a subagent is a fresh Claude session launched from the parent session via the Task tool with a subagent_type (usually general-purpose, sometimes a specialized type like Explore or Plan if your harness exposes them). The parent hands the subagent a prompt and a set of tool permissions. The subagent runs autonomously, may call tools, may spawn its own children, and eventually returns one message of text.
The parent sees only that final message. It does not see the subagent's tool calls, its scratch reasoning, or the 40 files it grepped through. This is the entire point. The parent's context stays clean; the subagent does the messy work.
Three things to internalize:
That third point is why subagents are not free. They are a tool, not a default.
There are exactly three situations where a subagent earns its keep.
The parent is single-threaded. If you ask Claude to run three independent investigations in series, you wait for all three end to end. Spawn three subagents in one tool batch and you wait for the slowest one.
Real examples:
Rule of thumb: if the work decomposes cleanly into 2 to 5 independent chunks, parallel subagents are the right tool. Above 5, you start hitting rate limits and the orchestration overhead eats the wins.
The parent session is your durable workspace. Once the parent's context fills up, you lose the ability to keep planning, to remember decisions made an hour ago, and to track the diff you have been building. Burning that context on a one-shot investigation is wasteful.
When the parent needs to know one thing ("does this codebase already have a retry helper?") but answering it requires reading 30 files, hand the question to a subagent. The subagent burns its own context, returns "yes, see lib/retry.ts," and the parent stays lean.
This is what the Anthropic team calls the agent-as-context-sandbox pattern. The subagent is a disposable scratchpad. The parent is the long-running plan. We will come back to this pattern below; it is the most underused move in Claude Code.
Sometimes you want the model to forget what it just did. A code reviewer subagent works better when it has no idea what the author was trying to do. A planner subagent works better when it does not see the half-built implementation. A red-team subagent works better when it has not been told the design is good.
The clean-slate effect is real. Subagents pull the model back to first principles because they have no prior commitment to defend.
This is where most people go wrong. Subagents are over-prescribed because they feel sophisticated. They are not free.
Skip the subagent when:
Honest version: roughly 70% of the subagent calls we see in real Claude Code transcripts could have been a direct tool call from the parent. The other 30% are load-bearing. Learn to tell which is which.
| Strategy | Use when | Avoids | Cost vs direct |
|---|---|---|---|
| Single parent, no subagents | Task is linear, fits in context, one engineer-equivalent | Token doubling, orchestration overhead | 1.0x baseline |
| Fan-out research (parallel) | 2 to 5 independent investigations | Wall-clock waste of serial reads | 1.5 to 2.5x tokens, 0.4x wall time |
| Context-sandbox lookup | Parent needs an answer to a costly question, not the artifact | Polluting the parent's working memory | 1.3x tokens, but parent context stays small |
| Specialized role (review, plan, red-team) | You want a fresh-slate opinion | Author bias polluting the critique | 1.4x tokens, often catches what parent missed |
| Recursive subagents (children spawn children) | Massive decomposition (codemod across a large monorepo) | Single-thread serialization on huge work | 3 to 6x tokens, high risk of drift |
Recursive subagents are a real foot-gun. The deeper the tree, the more lossy the summaries become. By depth 3, the root has almost no insight into what the leaves actually did. We use depth 1 by default, depth 2 only when the parent's plan explicitly partitions the work.
These are the workflows where subagents pay for themselves.
The parent's job is to ship the change. Before shipping, it needs facts: what does the codebase already have, what does the API actually return, what is the convention for this kind of file.
Spawn 2 to 3 research subagents in parallel. Each one investigates one question. The parent reads the three summaries, builds a plan, then implements directly. The implementation step belongs to the parent because the parent will iterate.
This is the workhorse pattern. It maps cleanly to the Explore then Plan then Implement loop that Anthropic recommends in the Claude Code docs.
Same as above, but the trigger is different. You are deep into a long session. The parent's context is at 60%. A small question comes up that would cost 20 to 30 file reads to answer.
Hand it to a subagent with a tight prompt: "Find every place in this repo that calls stripe.checkout.sessions.create and return the file paths, the surrounding 5 lines, and a one-sentence summary of what each call is doing." The subagent returns 200 tokens of useful summary. The parent paid 50 tokens for the dispatch and got a clean answer.
The 30 file reads happened in someone else's context window. This is the entire game.
The parent writes the implementation. Before declaring it done, the parent spawns a general-purpose subagent and hands it the spec plus the diff. The subagent's only job is to find what is wrong.
The subagent has not been emotionally invested in any of the decisions, so it is a sharper critic. The parent then triages the critique. Most things get fixed; some get explained away. We have caught real bugs this way that the implementing session was blind to.
This pairs naturally with the multi-step prompt-ladder discipline covered in our AI-assisted refactoring playbook. Refactors are exactly where a clean-slate reviewer earns its tokens, because the implementer is too close to the change.
Subagents are one of three tools Claude Code engineers reach for to extend the parent. The others are MCP servers (long-running tool integrations) and slash commands (canned prompts).
Pick by job to be done:
Mixing them up is the most common mistake. A subagent that needs to talk to Linear should be granted MCP access, not given a one-off REST integration in its prompt. A slash command that always launches the same subagent should still be a slash command (because it captures the shortcut), with the subagent as the action.
The harnesses that work in production look like Cursor for inline edits, Claude Code for multi-file work with subagents and MCP, and Codex or Devin for fully autonomous long-running runs. Cadence engineers tend to live in Claude Code for serious work and switch to Cursor's agent mode for production refactors when the change spans many files at once.
A bad subagent prompt is the single biggest source of wasted tokens in a Claude Code session. Three rules.
path:line, one-sentence summary." Free-form returns are 5x longer and worse to parse.These three lines at the top of the prompt cut subagent token use roughly in half in our internal tests.
If you have never used subagents, start with the fan-out research pattern on a real task. Pick a question with 3 independent sub-questions. Spawn 3 subagents in one Task batch. Compare wall-clock against doing it serially in the parent.
If you have been using them everywhere, audit your last 5 sessions. For each Task call, ask: did the parent need to do this directly? In most sessions, 1 in 3 subagent calls was a habit, not a need.
If you are evaluating an engineer's Claude Code chops in a hiring loop, the strongest signal is not whether they use subagents. It is whether they can explain why they did not use one on a task where it would have been the obvious choice. That is the discipline we screen for in the AI-native engineering evaluation rubric every Cadence engineer passes.
Every engineer on Cadence is AI-native by default, vetted on Cursor, Claude Code, and Copilot fluency before they unlock bookings. Subagent discipline (when to spawn, when to skip, how to write the prompt) is part of the rubric. If you want a Mid at $1,000/week or Senior at $1,500/week who is already fluent in subagent orchestration, the decide tool will tell you in 2 minutes whether to build, buy, or book.
If you are deciding between hiring full-time for AI tooling fluency or booking on-demand, run the numbers once. A Senior at $1,500/week with subagent and MCP discipline often outpaces a $180k full-time hire who learned Cursor last quarter. Cadence gives you a 48-hour free trial to verify before you commit to a single week.
Use a subagent when the work parallelizes cleanly into 2 to 5 chunks, when answering a question would burn 20+ file reads in the parent's context, or when you want a fresh-slate reviewer or planner that has no prior commitment to defend.
No. Subagents are short-lived sub-sessions you spawn with the Task tool; they get one prompt and return one result. MCP servers are long-running tool integrations the parent calls repeatedly across a session. Subagents are for work decomposition; MCP is for tool access.
In practice, 2 to 5 in one Task batch. Above 5, orchestration overhead and rate limits start eating the parallelism win. Recursive subagents (children spawning grandchildren) work for depth 1 to 2; deeper trees lose too much context in the summary handoffs.
No. Every subagent starts with a blank context window. You must include any state the subagent needs in the prompt. This is also why subagents protect the parent's context: the parent sees only the subagent's final summary, not its work.
It is the practice of using a subagent purely to keep an expensive lookup out of the parent's context window. The parent dispatches a question, the subagent does the 30 file reads, and the parent gets a clean 200-token summary back. The parent's working memory stays available for the actual plan.
Senior frontend developer at withRemote. Writes on React, Next.js, performance budgets, and modern web tooling.