Claude Code subagents: when and how to use them

Claude Code subagents are isolated sub-sessions you spawn with the Task tool. Each one runs in its own context window, gets its own tool budget, and returns a single summarized result to the parent. You use them for three reasons: parallel work that would block a single thread, context-window protection on large codebases, and specialized roles (research, planning, review) that benefit from a clean slate.

That is the answer. The rest of this post is when subagents pay off, when they actively hurt you, and the patterns that the strongest Claude Code users have settled into through 2026.

What a subagent actually is

In Claude Code, a subagent is a fresh Claude session launched from the parent session via the Task tool with a subagent_type (usually general-purpose, sometimes a specialized type like Explore or Plan if your harness exposes them). The parent hands the subagent a prompt and a set of tool permissions. The subagent runs autonomously, may call tools, may spawn its own children, and eventually returns one message of text.

The parent sees only that final message. It does not see the subagent's tool calls, its scratch reasoning, or the 40 files it grepped through. This is the entire point. The parent's context stays clean; the subagent does the messy work.

Three things to internalize:

Subagents do not share memory with the parent. Anything the parent knows, you must put in the prompt.
Subagents cannot ask the user questions mid-run. They get one prompt in, one result out.
Subagents charge tokens twice. You pay for the subagent's full transcript plus the parent's summary of it.

That third point is why subagents are not free. They are a tool, not a default.

When to reach for a subagent

There are exactly three situations where a subagent earns its keep.

1. Parallel work

The parent is single-threaded. If you ask Claude to run three independent investigations in series, you wait for all three end to end. Spawn three subagents in one tool batch and you wait for the slowest one.

Real examples:

Auditing a monorepo for three different concerns (security, dead code, type drift) where each investigation reads different files.
Researching three competing libraries to pick one. Each subagent gets one library, returns a recommendation with pros and cons.
Backfilling test coverage across five unrelated modules. One subagent per module.

Rule of thumb: if the work decomposes cleanly into 2 to 5 independent chunks, parallel subagents are the right tool. Above 5, you start hitting rate limits and the orchestration overhead eats the wins.

2. Context-window protection

The parent session is your durable workspace. Once the parent's context fills up, you lose the ability to keep planning, to remember decisions made an hour ago, and to track the diff you have been building. Burning that context on a one-shot investigation is wasteful.

When the parent needs to know one thing ("does this codebase already have a retry helper?") but answering it requires reading 30 files, hand the question to a subagent. The subagent burns its own context, returns "yes, see lib/retry.ts," and the parent stays lean.

This is what the Anthropic team calls the agent-as-context-sandbox pattern. The subagent is a disposable scratchpad. The parent is the long-running plan. We will come back to this pattern below; it is the most underused move in Claude Code.

3. Specialized roles or fresh-slate prompting

Sometimes you want the model to forget what it just did. A code reviewer subagent works better when it has no idea what the author was trying to do. A planner subagent works better when it does not see the half-built implementation. A red-team subagent works better when it has not been told the design is good.

The clean-slate effect is real. Subagents pull the model back to first principles because they have no prior commitment to defend.

When NOT to use a subagent

This is where most people go wrong. Subagents are over-prescribed because they feel sophisticated. They are not free.

Skip the subagent when:

The task is linear and small. "Add a console.log to this function" does not need a subagent. The parent can do it in one tool call.
You need to ask follow-up questions. Subagents are one-shot. If the answer might prompt "and what about the edge case?", do the work in the parent.
The result is the work. If the goal is to write code that ends up in the diff, the parent should do it. Subagents are bad at handing back working code because the parent then has to re-read it, re-validate it, and often re-write it. The double-token problem becomes a triple-token problem.
State matters across steps. A debugging session where each test result changes the next hypothesis must stay in the parent. A subagent will not remember what it learned in step 2 by the time it gets to step 5 (because each subagent is a single shot).
You have not written the prompt carefully. A vague subagent prompt produces a vague summary. The parent then has less information than it would have gotten by doing the work directly.

Honest version: roughly 70% of the subagent calls we see in real Claude Code transcripts could have been a direct tool call from the parent. The other 30% are load-bearing. Learn to tell which is which.

Comparison: subagent strategies side by side

Strategy	Use when	Avoids	Cost vs direct
Single parent, no subagents	Task is linear, fits in context, one engineer-equivalent	Token doubling, orchestration overhead	1.0x baseline
Fan-out research (parallel)	2 to 5 independent investigations	Wall-clock waste of serial reads	1.5 to 2.5x tokens, 0.4x wall time
Context-sandbox lookup	Parent needs an answer to a costly question, not the artifact	Polluting the parent's working memory	1.3x tokens, but parent context stays small
Specialized role (review, plan, red-team)	You want a fresh-slate opinion	Author bias polluting the critique	1.4x tokens, often catches what parent missed
Recursive subagents (children spawn children)	Massive decomposition (codemod across a large monorepo)	Single-thread serialization on huge work	3 to 6x tokens, high risk of drift

Recursive subagents are a real foot-gun. The deeper the tree, the more lossy the summaries become. By depth 3, the root has almost no insight into what the leaves actually did. We use depth 1 by default, depth 2 only when the parent's plan explicitly partitions the work.

Three patterns that earn their tokens

These are the workflows where subagents pay for themselves.

Research, then synthesize, then implement

The parent's job is to ship the change. Before shipping, it needs facts: what does the codebase already have, what does the API actually return, what is the convention for this kind of file.

Spawn 2 to 3 research subagents in parallel. Each one investigates one question. The parent reads the three summaries, builds a plan, then implements directly. The implementation step belongs to the parent because the parent will iterate.

This is the workhorse pattern. It maps cleanly to the Explore then Plan then Implement loop that Anthropic recommends in the Claude Code docs.

Agent-as-context-sandbox

Same as above, but the trigger is different. You are deep into a long session. The parent's context is at 60%. A small question comes up that would cost 20 to 30 file reads to answer.

Hand it to a subagent with a tight prompt: "Find every place in this repo that calls stripe.checkout.sessions.create and return the file paths, the surrounding 5 lines, and a one-sentence summary of what each call is doing." The subagent returns 200 tokens of useful summary. The parent paid 50 tokens for the dispatch and got a clean answer.

The 30 file reads happened in someone else's context window. This is the entire game.

Spec-driven implementation with a review subagent

The parent writes the implementation. Before declaring it done, the parent spawns a general-purpose subagent and hands it the spec plus the diff. The subagent's only job is to find what is wrong.

The subagent has not been emotionally invested in any of the decisions, so it is a sharper critic. The parent then triages the critique. Most things get fixed; some get explained away. We have caught real bugs this way that the implementing session was blind to.

This pairs naturally with the multi-step prompt-ladder discipline covered in our AI-assisted refactoring playbook. Refactors are exactly where a clean-slate reviewer earns its tokens, because the implementer is too close to the change.

How this fits into a working Claude Code stack

Subagents are one of three tools Claude Code engineers reach for to extend the parent. The others are MCP servers (long-running tool integrations) and slash commands (canned prompts).

Pick by job to be done:

Subagents for parallel work, context protection, or fresh-slate roles.
MCP servers for tools the parent uses repeatedly across a session: database queries, Linear tickets, Sentry errors, Figma exports. See our deep dive on Claude MCP servers and how to wire them for the full setup.
Slash commands for prompts you find yourself typing over and over: "/review", "/ship", "/cleanup". A slash command is a saved prompt, not a subagent (though a slash command can launch a subagent if you want).

Mixing them up is the most common mistake. A subagent that needs to talk to Linear should be granted MCP access, not given a one-off REST integration in its prompt. A slash command that always launches the same subagent should still be a slash command (because it captures the shortcut), with the subagent as the action.

The harnesses that work in production look like Cursor for inline edits, Claude Code for multi-file work with subagents and MCP, and Codex or Devin for fully autonomous long-running runs. Cadence engineers tend to live in Claude Code for serious work and switch to Cursor's agent mode for production refactors when the change spans many files at once.

How to write a subagent prompt that does not waste tokens

A bad subagent prompt is the single biggest source of wasted tokens in a Claude Code session. Three rules.

State the deliverable in one sentence at the top. "Return the file path and exact line number of every Stripe webhook handler in this repo." Not "investigate Stripe."
Constrain the tool budget. If the answer is in 10 files, say "read at most 15 files." Subagents are happy to run forever; you have to give them a stop condition.
Specify the return format. "Return a markdown list. Each item: path:line, one-sentence summary." Free-form returns are 5x longer and worse to parse.

These three lines at the top of the prompt cut subagent token use roughly in half in our internal tests.

What to do next

If you have never used subagents, start with the fan-out research pattern on a real task. Pick a question with 3 independent sub-questions. Spawn 3 subagents in one Task batch. Compare wall-clock against doing it serially in the parent.

If you have been using them everywhere, audit your last 5 sessions. For each Task call, ask: did the parent need to do this directly? In most sessions, 1 in 3 subagent calls was a habit, not a need.

If you are evaluating an engineer's Claude Code chops in a hiring loop, the strongest signal is not whether they use subagents. It is whether they can explain why they did not use one on a task where it would have been the obvious choice. That is the discipline we screen for in the AI-native engineering evaluation rubric every Cadence engineer passes.

Every engineer on Cadence is AI-native by default, vetted on Cursor, Claude Code, and Copilot fluency before they unlock bookings. Subagent discipline (when to spawn, when to skip, how to write the prompt) is part of the rubric. If you want a Mid at $1,000/week or Senior at $1,500/week who is already fluent in subagent orchestration, the decide tool will tell you in 2 minutes whether to build, buy, or book.

If you are deciding between hiring full-time for AI tooling fluency or booking on-demand, run the numbers once. A Senior at $1,500/week with subagent and MCP discipline often outpaces a $180k full-time hire who learned Cursor last quarter. Cadence gives you a 48-hour free trial to verify before you commit to a single week.

FAQ

When should I use a Claude Code subagent instead of just asking the parent?

Use a subagent when the work parallelizes cleanly into 2 to 5 chunks, when answering a question would burn 20+ file reads in the parent's context, or when you want a fresh-slate reviewer or planner that has no prior commitment to defend.

Are Claude Code subagents the same as MCP servers?

No. Subagents are short-lived sub-sessions you spawn with the Task tool; they get one prompt and return one result. MCP servers are long-running tool integrations the parent calls repeatedly across a session. Subagents are for work decomposition; MCP is for tool access.

How many subagents should I spawn at once?

In practice, 2 to 5 in one Task batch. Above 5, orchestration overhead and rate limits start eating the parallelism win. Recursive subagents (children spawning grandchildren) work for depth 1 to 2; deeper trees lose too much context in the summary handoffs.

Do subagents share memory with the parent?

No. Every subagent starts with a blank context window. You must include any state the subagent needs in the prompt. This is also why subagents protect the parent's context: the parent sees only the subagent's final summary, not its work.

What is the agent-as-context-sandbox pattern?

It is the practice of using a subagent purely to keep an expensive lookup out of the parent's context window. The parent dispatches a question, the subagent does the 30 file reads, and the parent gets a clean 200-token summary back. The parent's working memory stays available for the actual plan.

Akashdeep Singh

Senior Frontend Developer

Senior frontend developer at withRemote. Writes on React, Next.js, performance budgets, and modern web tooling.

All posts