Claude Opus vs Sonnet vs Haiku: Pricing 2026

Book a Free Strategy Call

Skip the read: talk to Walid in 30 min.

Free strategy call. We map your AI engineering team, you keep the notes.

TL;DR

The decision rule for Opus vs Sonnet vs Haiku: Sonnet for most production work and everyday coding, Opus for the hardest reasoning (deep refactors, complex planning, code review), Haiku for high-volume, low-latency work like classification and routing. As of mid-2026, Claude ships in three tiers and the right pick depends entirely on the task, not the brand.

Opus is the heavyweight: best reasoning, best long-horizon agent behavior, slow, and the most expensive token-per-token. Sonnet is the default workhorse most teams run all day: strong code, strong tool use, fast enough for interactive work, priced for real production. Haiku is the budget engine: near-instant, cheap enough to throw at high-volume classification, routing, batch enrichment, and front-line tool calls.

The wrong default, usually "always Opus" or "always Sonnet," is where AI infra bills spiral. The right pattern in 2026 is a routed stack: Haiku triages, Sonnet builds, Opus reviews. This guide shows when each model wins and how we wire that routing inside production agents.

Claude's family has split further apart in 2026. A year ago, picking between Opus and Sonnet was largely a "do I have budget?" question: the capability gap on most everyday tasks was small. That has changed. Opus has pulled ahead on multi-step agent loops, complex code refactors, and tasks with deep dependency chains. Sonnet has gotten dramatically faster and cheaper while holding its ground on day-to-day coding and tool use. Haiku has quietly become the most underrated model in the lineup; it now handles a surprising amount of work that teams reflexively send to Sonnet.

The hard part is that Anthropic's own marketing flattens the distinction. The pricing page lists three tiers, but it does not tell you that running Opus for a 30-step background agent costs roughly 5x what Sonnet costs for the same workflow, or that Haiku can route 80% of inbound classifier traffic with no measurable quality drop. Most teams find this out by looking at their bill.

This guide compares Claude Opus, Sonnet, and Haiku in 2026. Real specs where they are publicly documented (hedged "as of mid-2026" because Anthropic updates these without notice), honest tradeoffs, head-to-head matchups per task, and the cost-optimization routing patterns we run in production at AY Automate.

At a glance: the 3-way comparison table

The table below uses Anthropic's publicly listed pricing and capabilities as of mid-2026. Pricing is API list; Bedrock, Vertex, and enterprise contracts vary.

Dimension	Opus	Sonnet 5	Haiku
Price / 1M input tokens	~$15	~$2 (intro)	~$0.80
Price / 1M output tokens	~$75	~$10 (intro)	~$4
Output speed	Slow (20-60 tok/s)	Fast (60-120 tok/s)	Very fast (150-250+ tok/s)
Context window	200K (1M tier available)	200K (1M tier available)	200K
Reasoning depth	Best in family	Strong	Functional, shallow
Code generation	Best on complex refactors	Excellent (default for coding)	OK for small edits, weak on architecture
Tool use / agents	Best on long horizons	Strong, production-ready	Reliable for narrow tools
Vision	Yes	Yes	Yes
Best agent fit	Planner, reviewer, complex orchestrator	Builder, coder, primary agent	Triage, classification, sub-agent worker
Sweet spot	Hard problems, low volume	Most production work	High-volume, low-latency tasks

A few notes on the table. Token prices have drifted downward across 2025 and into 2026; assume Anthropic will keep cutting them. Speed numbers are observational, not contractual; expect variance by region and load. Context windows of 1M tokens are gated behind a separate tier on the API as of mid-2026, not the default 200K experience.

Free weekly brief

Steal our production automations

The exact n8n flows, Claude Code setups, and prompts we ship for clients, broken down step by step. No spam, unsubscribe anytime.

Opus deep dive

Opus is the model you reach for when the cost of a wrong answer is higher than the cost of the call. It is the strongest reasoner in the Claude family, the best at staying coherent across long multi-step plans, and the most capable when a task requires holding many constraints in mind simultaneously: large refactors, architecture decisions, security review, complex agent planning loops.

The tradeoff is real. Opus is roughly 5x the price of Sonnet on input tokens and 5x on output. It is also slower, sometimes noticeably so on long completions. For interactive use (a developer chatting with an IDE assistant), Opus often feels heavy. For background work (an agent running for 10 minutes to refactor a service), the speed cost evaporates and the quality matters.

Where Opus wins hardest in 2026:

Deep refactors across many files. Sonnet will sometimes lose the thread on the 8th file edit. Opus holds the architectural intent.
Plan-then-execute agents. Using Opus as the planner and Sonnet/Haiku as executors is a pattern we run in production. The planner only fires occasionally; the executors do the volume.
Code review. Opus catches subtle bugs Sonnet misses: race conditions, off-by-one errors in async code, security issues that require chaining multiple observations.
Complex retrieval-augmented reasoning. When the answer requires synthesizing 12 documents into one coherent response with cited evidence, Opus is more reliable.

Where Opus is the wrong choice: anything you want to run at high volume, anything that needs sub-second responses, anything where Sonnet's answer would be 95% as good for 20% of the price. For most "production chatbot" work, Opus is overkill.

Sonnet deep dive

Sonnet is the model you build your product on. It is the default for almost every team running Claude in production, and there is a reason: it hits a price-performance ratio that is hard to beat in 2026. Coding quality is excellent, only marginally behind Opus on most real tasks. Tool use is rock solid. Context handling is mature. Latency is good enough for interactive UIs.

The 2026 version of Sonnet has closed a meaningful chunk of the gap to Opus on everyday work. Where the gap shows is on the hardest 5-10% of tasks: very large refactors, deep architectural reasoning, multi-hop planning with shifting constraints. For everything else (writing a feature, fixing a bug, generating tests, calling tools, drafting copy, summarizing documents), Sonnet is the right call.

Sonnet's superpower is that it runs Claude Code well. The IDE assistant, the agent SDK, the background coding work: Sonnet is what most teams ship into production for these use cases. It is fast enough for autocomplete-adjacent work, cheap enough to run all day, and good enough that you rarely have to escalate to Opus.

Where Sonnet is the right default in 2026:

Claude Code as the daily driver. This is the workflow our Claude Code agency practice deploys most often. Sonnet handles 90%+ of work in the IDE; the other 10% gets escalated.
Customer-facing chatbots. The latency and cost profile fits real product UX.
Agent SDK builds. Sonnet's tool-call reliability is excellent. Production agents we ship usually run Sonnet for the main loop.
Document analysis and RAG. Strong synthesis, fast enough to feel responsive.

Where Sonnet is wrong: when you need the absolute best reasoning (use Opus) or when the task is so simple and high-volume that Haiku would do the job at a quarter of the price. Sonnet is the default, but it is not the only answer.

Haiku deep dive

Haiku is the most underused model in the Claude family. Most teams reflexively send everything to Sonnet because "Haiku is the small one." That instinct costs them money. In 2026, Haiku is fast, often 150+ tokens per second, and cheap enough that you can route enormous traffic through it without flinching at the bill.

Haiku's strengths are narrow but real. It is excellent at classification, intent detection, routing, short-form generation, and tool calls that follow a constrained schema. It is reliable for "is this email spam or not", "which of these 12 categories does this ticket belong to", "extract the company name from this paragraph", "summarize this in one sentence". For these tasks, Haiku's quality is essentially indistinguishable from Sonnet's at a fraction of the cost and a fraction of the latency.

What Haiku is not: a general-purpose coder, an architect, a long-horizon agent. Push it into those tasks and the quality drop is sharp. The error rate climbs, the reasoning gets shallow, and the model starts hallucinating tool calls. The mistake teams make is using Haiku for the wrong thing, declaring "Haiku is bad", and going back to Sonnet for everything, when the right answer was Haiku for the narrow task and Sonnet for the broad one.

Where Haiku wins in 2026:

Triage and routing. First-line classification before handing off to a heavier model.
Batch enrichment. Cleaning, labeling, normalizing thousands of records.
Sub-agent workers. In a multi-agent system, the orchestrator runs Sonnet or Opus; the workers run Haiku.
Tight tool calls. When the schema is fixed and the task is "call function X with these arguments", Haiku is reliable and fast.
Real-time UX. Autocomplete-style features, instant suggestions, anywhere latency dominates quality.

Where Haiku breaks: complex code, multi-step plans, long context synthesis, anything requiring deep reasoning. Use the right tool for the job.

Head-to-head: which task each wins

A practical map of which model to reach for, task by task, as of mid-2026.

Writing a new feature in a real codebase. Sonnet wins. Opus is overkill unless the feature touches 10+ files or has gnarly architectural implications. Haiku is too shallow.

Refactoring a complex module across many files. Opus wins on the hardest cases. Sonnet wins on most everyday refactors. Haiku does not compete here.

Writing unit tests for an existing module. Sonnet wins. Haiku can do simple cases. Opus is wasted budget.

Code review on a non-trivial PR. Opus wins. Sonnet is good; Opus catches more. Haiku misses too much.

Classifying an incoming support ticket into one of 20 categories. Haiku wins. Sonnet is wasteful. Opus is comical.

Long-running autonomous coding agent (30+ steps). Opus wins on planning quality. Sonnet wins on cost. The right answer is usually a routed stack: Opus plans, Sonnet executes.

Customer-facing chat assistant. Sonnet wins. The latency-quality-cost profile is right.

Extracting structured data from 50,000 PDFs. Haiku wins. The volume makes anything else uneconomic.

RAG question-answering over a 200-document corpus. Sonnet wins for most queries. Escalate to Opus for the hardest 5%.

Voice assistant where latency matters more than depth. Haiku wins. Speed is the product.

Generating production-quality marketing copy. Sonnet wins. Opus on the hero pages where every word counts.

Building an internal tool with Claude Code over a week. Sonnet for the daily work, Opus on demand for the hard parts. This is the pattern most of our Claude Code agency engagements settle into.

Cost optimization patterns

The single biggest waste of money in 2026 Claude deployments is sending every request to the same model. The teams that pay 5x what they need to are the ones running Opus by default. The teams that ship 3x slower than they could are the ones running Haiku for everything. The right pattern is routing.

Pattern 1: Triage → Build → Review. Haiku reads incoming work and decides what kind of task it is. Sonnet does the actual work. Opus reviews the output before it ships. This pattern is right for any high-stakes agent: coding assistants, customer ops automation, content generation pipelines. Haiku is essentially free at this volume; Sonnet does the bulk; Opus only fires when the review step actually catches something.

Pattern 2: Planner / Executor split. Opus generates the plan. Sonnet executes each step. Haiku handles the narrow tool calls within each step. The planner only runs once or twice per task; the executor runs many times. Net cost lands close to Sonnet-only pricing but the plan quality is closer to Opus-only. This is the pattern we run for long-horizon coding agents.

Pattern 3: Confidence-gated escalation. Start every request on Haiku. If the model's confidence is below a threshold (or if a downstream validator catches an error), escalate to Sonnet. If Sonnet's confidence is still low, escalate to Opus. Most traffic terminates at Haiku. The remaining 10-20% gets the heavier model. This works extremely well for classification and extraction at scale.

Pattern 4: Time-of-day routing. During business hours when latency matters, route to Sonnet. Overnight, when batch jobs run, route the same workload to Opus for higher quality. Cost evens out because nighttime traffic is lower volume.

Pattern 5: Prompt-caching layered on top. Independent of model choice, prompt caching cuts repeated-context costs by ~90% on cache hits. Any of the patterns above gets dramatically cheaper once caching is wired in.

The teams that hit Claude's pricing wall are almost always running pattern zero: "everything goes to Sonnet, or everything goes to Opus." A 20-minute routing refactor commonly cuts AI infra costs by 60-80% with no quality loss.

Real-world routing examples

A few concrete patterns we have shipped or seen ship in 2026.

Customer support triage at a SaaS company. Inbound tickets first hit Haiku for category classification, urgency scoring, and sentiment. Roughly 70% are routine and get a Haiku-generated draft response a human reviews. The remaining 30% get escalated to Sonnet for nuanced replies. Tickets flagged as legal, contractual, or escalation-risk go to a human queue with an Opus-generated brief attached. Net cost per ticket: ~$0.02. Time-to-first-response: under 30 seconds.

Codebase migration agent. Migrating a 200-file TypeScript codebase from one framework to another. Opus generates the migration plan once. Sonnet does the per-file rewrites, one file at a time, in parallel. Haiku handles narrow tasks like "rename this import everywhere it appears" and "extract the function signatures from this file." The agent runs unattended for ~6 hours. Opus tokens: maybe 200K total. Sonnet tokens: tens of millions. Haiku tokens: also tens of millions, at near-free pricing.

Sales enrichment pipeline. A list of 50,000 companies needs to be enriched with industry classification, employee-count buckets, and a sentence summary. Haiku handles all of it. Opus is never called. Sonnet is called only for the ~500 edge cases where Haiku flagged low confidence. Total cost: a few hundred dollars instead of a few thousand.

Multi-agent research assistant. Orchestrator runs Opus and decides which sub-agents to spawn. Each sub-agent runs Sonnet and goes off to research one angle. Sub-agents have narrow Haiku-powered tools for things like fetching pages, parsing PDFs, and extracting tables. The Opus orchestrator only fires a few times per session; the long tail is Sonnet and Haiku. Quality stays at Opus-level because the planning is Opus-level; cost stays close to Sonnet-level because the volume is Sonnet-and-below.

Real-time coding assistant inside an IDE. Sonnet handles all interactive completions and chat. When the user explicitly invokes a "deep refactor" or "review my PR" command, the request is routed to Opus. Haiku powers a separate ambient feature (fast inline suggestions and small edit predictions) that runs constantly without breaking the budget.

Pick the pattern that matches your workload. Default-routing is the trap.

Want help routing Claude in production?

We design and ship Claude-native systems for teams that are past the "let's try ChatGPT" phase and need a routed, cost-optimized stack in production. That includes picking the right model per task, wiring prompt caching, building agent loops with the right orchestrator-executor split, and shipping it inside Claude Code workflows your team can maintain. If you are sitting on a Claude bill that is bigger than it should be, or an agent that is slower than it should be, talk to our AI agent development and Claude Code agency team. We will scope the right routing pattern, ship it, and hand off the playbook. Book a consultation and we will look at your current setup.

FAQ

What is the actual difference between Opus, Sonnet, and Haiku?

They are three tiers of the same Claude family, trained by Anthropic, sharing the same safety and tool-use foundations, but tuned to different points on the price-speed-quality curve. Opus is the strongest reasoner, slowest, most expensive. Sonnet is the balanced default. Haiku is the smallest, fastest, and cheapest. As of mid-2026, prices are roughly $15/$75 per million input/output tokens for Opus, $2/$10 (intro, until Aug 2026) for Sonnet 5, and $0.80/$4 for Haiku.

Should I just always use Opus to be safe?

No. For most production work, Sonnet is 90-95% as good for 20% of the cost and 2-3x the speed. Running Opus by default is the most common way teams blow their AI budget in 2026. Reserve Opus for genuinely hard tasks: deep refactors, complex planning, security-sensitive review.

Is Haiku good enough for production?

For the right tasks, yes, and it is excellent. Classification, routing, batch enrichment, narrow tool calls, real-time UX. For complex code or long-horizon agent work, no. Use Haiku where its strengths fit; do not push it into reasoning-heavy work and then conclude "Haiku is bad." That is using the wrong tool.

Which model should I use inside Claude Code?

Sonnet is the default for everyday coding work inside Claude Code. Escalate to Opus on hard problems: large refactors, architecture decisions, PR review. Haiku does not really fit as a primary Claude Code model; it shows up more in agent-side tooling and sub-tasks.

How do I cut my Claude bill without losing quality?

Three levers. First, route: send the right model to the right task instead of running everything on one tier. Second, cache: prompt caching cuts repeated-context costs by ~90% on cache hits. Third, batch: for non-realtime work, Anthropic's batch API offers significant discounts. Combined, most teams cut costs 50-80% with no measurable quality drop.

Is Claude better than GPT for coding in 2026?

Sonnet and Opus are very competitive on coding work and ahead on many real-world agent and refactor tasks as of mid-2026, especially when run inside Claude Code. The gap on raw chat-style completion is narrower.

Do all three models support tool use and vision?

Yes, as of mid-2026 all three tiers support tool use and vision. Reliability and complexity-handling differ: Opus is best on multi-step tool chains, Sonnet is excellent and production-ready, Haiku is reliable on narrow, well-defined tool calls but degrades when the tool-use loop gets long or branching.

Can I switch models without changing my code?

Mostly yes. The Anthropic API uses the same request shape across all three models: you change a model string and that is usually it. Prompts may need light tuning when moving between tiers (Haiku often benefits from more explicit instructions; Opus tolerates looser prompting). Building a routed system that picks the right model per request is a separate architecture concern, and it is the highest-ROI work most teams skip. Our AI agent development service and Claude Code agency practice does this routing design as a standard part of production builds; book a consultation if you want help.

Book a Free Strategy Call

Building this in production?

Walid runs a 30-min call to map your AI engineering team. Free, no slides.

Or send us a brief →

Free weekly brief

Steal our production automations

The exact n8n flows, Claude Code setups, and prompts we ship for clients, broken down step by step. No spam, unsubscribe anytime.

Share this article

About the Author

Boulanouar Walid

Founder & CEO

Walid founded AY Automate to help businesses ship AI workflows that actually move revenue. He leads strategy and oversees every client engagement end-to-end.

Full Bio →