What is Claude Sonnet? Anthropic Mid-Tier Model

Why Sonnet is the production default

Sonnet hits the sweet spot of the cost-quality-speed triangle. It scores within a few percentage points of Opus on most benchmarks (MMLU, HumanEval, MATH) at one-fifth the price, with roughly half the latency. For 90% of business applications — RAG-grounded Q&A, document extraction, summarization, code review, agent reasoning — Sonnet is the right model. Anthropic's own product team uses Sonnet as the default model in Claude.ai for paid users.

Where Sonnet leads

Long-context tasks: 200K token window handles entire codebases or document sets.
Tool use and agents: high tool-call accuracy at a price point that makes agentic loops affordable.
Customer-facing chat: latency is fast enough for real-time conversation.
Code generation: paired with Claude Code, Sonnet handles the bulk of dev work.

Pricing in 2026

Approximately $3 per million input tokens and $15 per million output tokens. With prompt caching, repeated context drops to about $0.30 per million input tokens — a 10x cost reduction for workloads with stable system prompts and document context. This is what makes RAG-heavy customer support and document-extraction workloads economically viable.

When to escalate to Opus or drop to Haiku

Escalate to Opus when the task is multi-step reasoning with compounding errors (legal, financial, complex debugging) or long-horizon agents. Drop to Haiku for high-volume classification, simple extraction, or anywhere latency below 500ms matters more than quality. Anthropic explicitly recommends a model-routing pattern: a cheap model triages, a more capable model handles hard cases.

What it means for your business

When evaluating an AI vendor, ask which Claude tier they default to. Sonnet is the responsible answer for production SMB workloads. Opus-everywhere is wasteful; Haiku-everywhere is brittle.

Claude Opus — Claude Opus is Anthropic's most capable model, tuned for deep reasoning, long context, and agentic coding. Definition, pricing, and when to use it.
Claude Haiku — Claude Haiku is Anthropic's fastest, cheapest model — built for high-volume, low-latency workloads. Definition, pricing, and when to use it.
Large Language Model (LLM) — A Large Language Model is a transformer-based neural network trained on trillions of tokens to predict the next token. Definition, key models, and business use.
Claude Code — Claude Code is Anthropic's terminal-based AI coding agent — reads your repo, runs commands, edits files, and ships PRs. Definition, pricing, and use cases.
Prompt Engineering — Prompt engineering is the practice of writing instructions to LLMs to get reliable, structured output. Definition, techniques, and when to stop optimizing.