
Use Haiku 4.5 ($1/$5 per million tokens) for classification, routing, and structured extraction. Use Sonnet 4.6 ($3/$15) as your production default for code generation, agent loops, and RAG. Use Opus 4.7 ($5/$25) only for the hardest reasoning: deep planning, novel research, complex multi-file refactors. The right answer is rarely one model. It is a router that sends each step to the cheapest tier that can handle it.
Anthropic ships three Claude tiers in 2026, each priced and tuned for a different job. Picking one and using it everywhere is the most common mistake we see in production code.
| Model | Input $/Mtok | Output $/Mtok | Context | SWE-bench Verified | Best at |
|---|---|---|---|---|---|
| Haiku 4.5 | $1.00 | $5.00 | 200K (1M beta) | mid-60s | classification, routing, extraction, summarization |
| Sonnet 4.6 | $3.00 | $15.00 | 200K (1M beta) | 79.6% | production default, agent loops, code generation, RAG |
| Opus 4.7 | $5.00 | $25.00 | 1M | 87.6% | hardest reasoning, multi-file refactor, deep planning |
The Opus-to-Haiku gap is 5x on input tokens and 5x on output. On most agent loops the per-step cost difference compounds fast: a 50-step research agent can run on Sonnet for under a dollar and on Opus for north of five.
The trick is that you almost never need a single tier across the whole loop. The classifier step does not need Opus. The planner step usually does not need Haiku. Mixing is the win.
Three concrete shifts make tier-routing more useful than it used to be.
First, Haiku got smart enough to be a real worker. Haiku 4.5 holds its own on structured extraction, intent classification, and tool-call routing. Two years ago this kind of small-model work needed prompt-engineering gymnastics. In 2026 it just works, and at $1 per million input tokens it is roughly a fifth of Sonnet's cost.
Second, Sonnet 4.6 closed most of the gap to Opus on coding. SWE-bench Verified puts Sonnet at 79.6% and Opus 4.7 at 87.6%. The 8-point gap matters on the hardest tickets but not on the median PR. For most production code-generation work, Sonnet is the new default and Opus is the escalation lane.
Third, prompt caching matured. With proper cache breakpoints, you can pay 90% less on the part of your prompt that does not change. That changes the cost calculus for long system prompts, codebase context, and few-shot examples. Cached input is usually cheaper on Sonnet than uncached input on Haiku.
Most teams pick one model and call it from every code path. Stop doing that. Picking model picking is a step-level decision, not an app-level decision.
A typical agent loop looks like this:
Steps 1 and 2 are routing problems. Haiku handles them. Step 3 is deterministic code, no model. Step 4 is mostly Sonnet, occasionally Opus when the loop is stuck. Step 5 is Sonnet by default, Opus when the user is asking for a deep analysis.
A simple TypeScript router, the kind we use in production at Cadence, looks like:
async function pickModel(step: AgentStep) {
if (step.kind === "classify" || step.kind === "extract") return "claude-haiku-4-5";
if (step.kind === "plan" && step.complexity === "high") return "claude-opus-4-7";
return "claude-sonnet-4-6";
}
That two-line function plus a complexity heuristic typically cuts API spend 60 to 70% versus running everything on Sonnet, with no measurable quality drop on user-facing outputs.
Anywhere you currently maintain brittle regex, intent rules, or hand-tuned extraction code is a place Haiku now beats your existing logic on accuracy and rivals it on cost.
Concrete jobs Haiku handles well:
The mental model: Haiku is a fast, cheap, JSON-mode-reliable function call. Treat it like one.
For most engineering teams shipping with Claude in 2026, Sonnet is the answer 70 to 80% of the time. It is the right floor for code generation, RAG synthesis, agent-loop reasoning, and customer-facing chat.
Specifically, Sonnet wins on:
If you are starting a new Claude integration today and unsure which tier to begin with, start on Sonnet. Add Haiku for the obvious cheap steps. Reserve Opus for the steps where Sonnet visibly fails.
Opus is the right answer for a narrow band of work. Use it when the cost of a wrong answer materially exceeds the price of the call.
Real cases where Opus pays for itself:
Most consumer apps will hit Opus on under 10% of calls. Most engineering teams will hit it on 15 to 20% of agent steps, almost all of them planning steps. Outside those bands, the spend is wasted.
Prompt caching is the single biggest cost lever in 2026, and it works on all three Claude tiers. With cache breakpoints set correctly, the static parts of your prompt (system instructions, tool definitions, codebase chunks, few-shot examples) cost about 90% less on cache hits than on the first call.
Practical caching patterns:
Cached Sonnet input lands at roughly $0.30 per million tokens, cheaper than uncached Haiku at $1. So if you are running a long-context agent loop with stable system prompts, Sonnet on cache often beats Haiku without cache. Run the math.
The Batch API is the second lever, halving costs on any non-realtime workload. Nightly summarization, weekly digests, async backfills, evals. If the user does not need the answer in the next 5 seconds, batch it.
Imagine a coding agent that handles 1,000 user requests a day. Each request:
| Strategy | Daily cost | Notes |
|---|---|---|
| Everything on Opus 4.7 | ~$78 | naive baseline |
| Everything on Sonnet 4.6 | ~$47 | drop-in switch |
| Sonnet + caching | ~$19 | 90% off cached input on reasoning steps |
| Routed (Haiku for classify, Sonnet for reasoning, Opus for hard plan) + caching | ~$12 | 60 to 70% under naive Sonnet |
Numbers shift with prompt shape, but the ranking holds. Routing plus caching is roughly 4 to 6x cheaper than running Opus on everything, and the user-visible quality is indistinguishable on most tasks.
You can tell a team has internalized tier routing by the answers they give to two questions.
Ask "which Claude model do you call?" If the answer is one model name, they have not done the work. If the answer is "Sonnet by default, Haiku for these three steps, Opus for these two," they have.
Ask "what is your prompt cache hit rate?" If they don't know, caching is not deployed. If it is above 70% on agent loops, they are running an efficient stack.
This is the same instinct we score on the Cadence voice interview. Every Cadence engineer is AI-native by baseline, vetted on Cursor / Claude / Copilot fluency, prompt-as-spec discipline, and verification habits before they unlock bookings. Tier-routing judgment shows up in the same conversation: candidates who default to Opus for everything score lower than candidates who can articulate when Haiku is the right call. Across the 12,800-engineer pool, the cohort that thinks in routers ships faster and cheaper. Median time to first commit on Cadence is 27 hours, partly because engineers do not over-spend on the wrong tier when scaffolding.
Anthropic's pricing has not dropped meaningfully in 18 months, and probably won't soon. The cost lever you control is routing and caching, not list price. Teams that get this right ship more features per dollar of API spend. Teams that don't end up rationing AI usage or running a smaller, weaker default model.
The same logic shapes vendor strategy. If you are still picking between Anthropic and the alternatives, the framework in choosing between OpenAI, Anthropic, and Google maps to the same routing instinct: pick the right vendor per workload, not one vendor per app.
Three concrete moves:
If you are scoping a Claude integration and want a build/buy/book recommendation before you write the router, decide your next feature with the Cadence Build/Buy/Book tool. It will tell you whether to ship in-house, use an off-the-shelf product, or book a Cadence engineer for the week.
If you are deploying a Claude integration this quarter and want an AI-native engineer who already routes between Haiku, Sonnet, and Opus in production, every Cadence engineer is vetted on this exact judgment. Book a Mid at $1,000/week or a Senior at $1,500/week with a 48-hour free trial. Replace the engineer any week, no notice.
Sonnet 4.6 is the default for most coding work. It scores 79.6% on SWE-bench Verified at $3/$15 per million tokens, which is the right floor for production code generation. Escalate to Opus 4.7 (87.6% SWE-bench Verified) only on multi-file refactors or hard architectural work where the extra reasoning visibly pays off.
Yes, for the right jobs. Haiku is a strong production fit for classification, routing, structured extraction, summarization, and moderation. It is not the right pick for open-ended code generation or multi-step planning. Treat it as a fast, cheap, JSON-reliable function call.
Cached input tokens on Anthropic's API cost roughly 90% less than uncached input tokens. On a long-context agent loop with a stable system prompt, that often means cached Sonnet costs less per call than uncached Haiku. Cache hit rates above 70% on agent loops are realistic and worth designing for.
Use Opus when the cost of a wrong answer materially exceeds the 5x price premium over Sonnet. That usually means deep planning steps, multi-file refactor design, novel debugging across services, compliance-critical writing, and hard analytical reasoning. For most apps that band is 10 to 15% of calls.
Yes, and you should. The Anthropic API lets you call any tier per step. A typical pattern: Haiku classifies the intent, Sonnet does the reasoning steps, Opus handles the one hard planning step, Sonnet synthesizes the final answer. Routing per step is the single biggest cost optimization available in 2026.