Claude Opus vs Sonnet vs Haiku: when to use which

Use Haiku 4.5 ($1/$5 per million tokens) for classification, routing, and structured extraction. Use Sonnet 4.6 ($3/$15) as your production default for code generation, agent loops, and RAG. Use Opus 4.7 ($5/$25) only for the hardest reasoning: deep planning, novel research, complex multi-file refactors. The right answer is rarely one model. It is a router that sends each step to the cheapest tier that can handle it.

The three tiers, at a glance

Anthropic ships three Claude tiers in 2026, each priced and tuned for a different job. Picking one and using it everywhere is the most common mistake we see in production code.

Model	Input $/Mtok	Output $/Mtok	Context	SWE-bench Verified	Best at
Haiku 4.5	$1.00	$5.00	200K (1M beta)	mid-60s	classification, routing, extraction, summarization
Sonnet 4.6	$3.00	$15.00	200K (1M beta)	79.6%	production default, agent loops, code generation, RAG
Opus 4.7	$5.00	$25.00	1M	87.6%	hardest reasoning, multi-file refactor, deep planning

The Opus-to-Haiku gap is 5x on input tokens and 5x on output. On most agent loops the per-step cost difference compounds fast: a 50-step research agent can run on Sonnet for under a dollar and on Opus for north of five.

The trick is that you almost never need a single tier across the whole loop. The classifier step does not need Opus. The planner step usually does not need Haiku. Mixing is the win.

What actually changed in 2026

Three concrete shifts make tier-routing more useful than it used to be.

First, Haiku got smart enough to be a real worker. Haiku 4.5 holds its own on structured extraction, intent classification, and tool-call routing. Two years ago this kind of small-model work needed prompt-engineering gymnastics. In 2026 it just works, and at $1 per million input tokens it is roughly a fifth of Sonnet's cost.

Second, Sonnet 4.6 closed most of the gap to Opus on coding. SWE-bench Verified puts Sonnet at 79.6% and Opus 4.7 at 87.6%. The 8-point gap matters on the hardest tickets but not on the median PR. For most production code-generation work, Sonnet is the new default and Opus is the escalation lane.

Third, prompt caching matured. With proper cache breakpoints, you can pay 90% less on the part of your prompt that does not change. That changes the cost calculus for long system prompts, codebase context, and few-shot examples. Cached input is usually cheaper on Sonnet than uncached input on Haiku.

Trait 1: Route by step, not by app

Most teams pick one model and call it from every code path. Stop doing that. Picking model picking is a step-level decision, not an app-level decision.

A typical agent loop looks like this:

Parse user message into intent and entities.
Decide which tool to call next.
Call the tool, get a result.
Decide whether to continue, replan, or answer.
Synthesize the final response.

Steps 1 and 2 are routing problems. Haiku handles them. Step 3 is deterministic code, no model. Step 4 is mostly Sonnet, occasionally Opus when the loop is stuck. Step 5 is Sonnet by default, Opus when the user is asking for a deep analysis.

A simple TypeScript router, the kind we use in production at Cadence, looks like:

async function pickModel(step: AgentStep) {
  if (step.kind === "classify" || step.kind === "extract") return "claude-haiku-4-5";
  if (step.kind === "plan" && step.complexity === "high") return "claude-opus-4-7";
  return "claude-sonnet-4-6";
}

That two-line function plus a complexity heuristic typically cuts API spend 60 to 70% versus running everything on Sonnet, with no measurable quality drop on user-facing outputs.

Trait 2: Use Haiku 4.5 where you used regex

Anywhere you currently maintain brittle regex, intent rules, or hand-tuned extraction code is a place Haiku now beats your existing logic on accuracy and rivals it on cost.

Concrete jobs Haiku handles well:

Intent classification. "Is this user message a billing question, a feature request, or a bug?" Haiku, one prompt, structured JSON output, sub-second latency.
Tool-call routing. "Given this user request and these 12 available tools, which one fires?" Haiku can read the tool descriptions in cache and route accurately.
Structured extraction. "Pull every email, phone number, and order ID from this transcript." Haiku is roughly as accurate as Sonnet here at a fifth of the cost.
Summarization at scale. Daily standups, support ticket digests, log triage. Cheap enough to run on every record.
Moderation and safety filtering. First-pass content checks before the message ever reaches an expensive model.

The mental model: Haiku is a fast, cheap, JSON-mode-reliable function call. Treat it like one.

Trait 3: Sonnet 4.6 is the production default

For most engineering teams shipping with Claude in 2026, Sonnet is the answer 70 to 80% of the time. It is the right floor for code generation, RAG synthesis, agent-loop reasoning, and customer-facing chat.

Specifically, Sonnet wins on:

Code generation against a real codebase. With the codebase in cache and a focused diff request, Sonnet 4.6 produces production-grade patches at scale. This is where teams using Claude Code for production engineering live by default.
Tool-using agents. Sonnet is the right default in agent loops that chain tool calls without going off the rails.
RAG synthesis. When you have hybrid retrieval feeding 10 to 30 chunks, Sonnet writes coherent answers grounded in the chunks. This pairs with the patterns covered in building production RAG.
Mid-complexity refactors. Renames, extract-method, type-tightening, dead-code removal. Sonnet does these without supervision.

If you are starting a new Claude integration today and unsure which tier to begin with, start on Sonnet. Add Haiku for the obvious cheap steps. Reserve Opus for the steps where Sonnet visibly fails.

Trait 4: Reserve Opus 4.7 for the top 10 to 15%

Opus is the right answer for a narrow band of work. Use it when the cost of a wrong answer materially exceeds the price of the call.

Real cases where Opus pays for itself:

Multi-file refactor planning. "Migrate this Express app to Fastify, list every file affected, the order of changes, and the rollback plan." This is reasoning over a large surface area, not pattern-matching.
Architectural review. Reading a system diagram and pointing out the load-bearing assumptions, the failure modes, the missing observability. Sonnet sometimes misses the second-order effects. Opus usually does not.
Novel debugging. Bugs that span four services, a queue, and a race condition. Opus's deeper reasoning catches the chain.
Compliance and brand voice. Legal language, policy edits, public statements. The 5x cost is worth it when "close enough" is not.
Hard scientific or analytical reasoning. Math-heavy proofs, paper analysis, novel research questions where the model needs to actually think rather than retrieve.

Most consumer apps will hit Opus on under 10% of calls. Most engineering teams will hit it on 15 to 20% of agent steps, almost all of them planning steps. Outside those bands, the spend is wasted.

Trait 5: Cache aggressively across all three tiers

Prompt caching is the single biggest cost lever in 2026, and it works on all three Claude tiers. With cache breakpoints set correctly, the static parts of your prompt (system instructions, tool definitions, codebase chunks, few-shot examples) cost about 90% less on cache hits than on the first call.

Practical caching patterns:

System prompt cache. Put your full agent prompt, all tool definitions, and your style guide in one cache block. This block is identical across every call from a given agent.
Codebase cache. For Claude Code workflows, cache the relevant subtree (the package, the related types, the schema). The first call costs full. Every subsequent call in the session is on cache discount.
Few-shot cache. Long lists of input/output examples cache cleanly and keep classification accuracy stable.

Cached Sonnet input lands at roughly $0.30 per million tokens, cheaper than uncached Haiku at $1. So if you are running a long-context agent loop with stable system prompts, Sonnet on cache often beats Haiku without cache. Run the math.

The Batch API is the second lever, halving costs on any non-realtime workload. Nightly summarization, weekly digests, async backfills, evals. If the user does not need the answer in the next 5 seconds, batch it.

A worked example: per-call costs on a realistic agent

Imagine a coding agent that handles 1,000 user requests a day. Each request:

1 classification step (200 input tokens, 50 output)
1 retrieval step (deterministic, no model)
4 reasoning steps with 8K cached system prompt and 2K fresh tokens (8K cached + 2K input, 500 output each)
1 final synthesis (10K input, 1.5K output)

Strategy	Daily cost	Notes
Everything on Opus 4.7	~$78	naive baseline
Everything on Sonnet 4.6	~$47	drop-in switch
Sonnet + caching	~$19	90% off cached input on reasoning steps
Routed (Haiku for classify, Sonnet for reasoning, Opus for hard plan) + caching	~$12	60 to 70% under naive Sonnet

Numbers shift with prompt shape, but the ranking holds. Routing plus caching is roughly 4 to 6x cheaper than running Opus on everything, and the user-visible quality is indistinguishable on most tasks.

How to evaluate this in your team

You can tell a team has internalized tier routing by the answers they give to two questions.

Ask "which Claude model do you call?" If the answer is one model name, they have not done the work. If the answer is "Sonnet by default, Haiku for these three steps, Opus for these two," they have.

Ask "what is your prompt cache hit rate?" If they don't know, caching is not deployed. If it is above 70% on agent loops, they are running an efficient stack.

This is the same instinct we score on the Cadence voice interview. Every Cadence engineer is AI-native by baseline, vetted on Cursor / Claude / Copilot fluency, prompt-as-spec discipline, and verification habits before they unlock bookings. Tier-routing judgment shows up in the same conversation: candidates who default to Opus for everything score lower than candidates who can articulate when Haiku is the right call. Across the 12,800-engineer pool, the cohort that thinks in routers ships faster and cheaper. Median time to first commit on Cadence is 27 hours, partly because engineers do not over-spend on the wrong tier when scaffolding.

Why it matters now

Anthropic's pricing has not dropped meaningfully in 18 months, and probably won't soon. The cost lever you control is routing and caching, not list price. Teams that get this right ship more features per dollar of API spend. Teams that don't end up rationing AI usage or running a smaller, weaker default model.

The same logic shapes vendor strategy. If you are still picking between Anthropic and the alternatives, the framework in choosing between OpenAI, Anthropic, and Google maps to the same routing instinct: pick the right vendor per workload, not one vendor per app.

What to do this week

Three concrete moves:

Audit your top 5 Claude calls. List input tokens, output tokens, and which model. Sort by daily spend.
For each, ask: could Haiku do this? Could caching cut 70% of the input cost? Could Opus be replaced by Sonnet?
Ship a router. Even a 10-line function that switches model by step type will pay back inside a week.

If you are scoping a Claude integration and want a build/buy/book recommendation before you write the router, decide your next feature with the Cadence Build/Buy/Book tool. It will tell you whether to ship in-house, use an off-the-shelf product, or book a Cadence engineer for the week.

If you are deploying a Claude integration this quarter and want an AI-native engineer who already routes between Haiku, Sonnet, and Opus in production, every Cadence engineer is vetted on this exact judgment. Book a Mid at $1,000/week or a Senior at $1,500/week with a 48-hour free trial. Replace the engineer any week, no notice.

FAQ

Which Claude model is best for coding?

Sonnet 4.6 is the default for most coding work. It scores 79.6% on SWE-bench Verified at $3/$15 per million tokens, which is the right floor for production code generation. Escalate to Opus 4.7 (87.6% SWE-bench Verified) only on multi-file refactors or hard architectural work where the extra reasoning visibly pays off.

Is Haiku 4.5 good enough for production?

Yes, for the right jobs. Haiku is a strong production fit for classification, routing, structured extraction, summarization, and moderation. It is not the right pick for open-ended code generation or multi-step planning. Treat it as a fast, cheap, JSON-reliable function call.

How much can prompt caching actually save?

Cached input tokens on Anthropic's API cost roughly 90% less than uncached input tokens. On a long-context agent loop with a stable system prompt, that often means cached Sonnet costs less per call than uncached Haiku. Cache hit rates above 70% on agent loops are realistic and worth designing for.

When does Opus 4.7 actually pay for itself?

Use Opus when the cost of a wrong answer materially exceeds the 5x price premium over Sonnet. That usually means deep planning steps, multi-file refactor design, novel debugging across services, compliance-critical writing, and hard analytical reasoning. For most apps that band is 10 to 15% of calls.

Can I switch models inside one agent run?

Yes, and you should. The Anthropic API lets you call any tier per step. A typical pattern: Haiku classifies the intent, Sonnet does the reasoning steps, Opus handles the one hard planning step, Sonnet synthesizes the final answer. Routing per step is the single biggest cost optimization available in 2026.

All posts