I run demand gen at Docket, and my team automates a lot of GTM work with Claude: lead research, list enrichment, content drafts, classification, internal tooling. At that volume the model you pick is not a detail. It is a line item. And the most expensive mistake I see, over and over, is teams defaulting to the biggest model for everything because it feels safe.
It is not safe. It is just expensive. A classification job that runs fine on Haiku does not get more correct on Opus 4.8. You just pay five times more on output for the same yes-or-no answer. The skill is not picking the smartest model. The skill is matching the model to the job, and most teams never build that habit.
This guide lays out the Claude models you can actually use as of June 2026: what each is for, the real differences in context, output, price, speed, and reasoning, and a decision framework you can apply to any task. It also covers Fable 5, which was briefly the ceiling and has since been withdrawn from availability, so you know where it fit and why you cannot reach for it today. Pricing and models move fast, so treat the numbers here as the source of truth for June 2026 and always check the official models and pricing pages before you wire anything up.
The lineup at a glance
Here is the whole roster, with API pricing per million tokens (input / output) as of June 2026. These four are the models you will actually choose between for most work.
Read the price columns as a pair, not a single number. Output tokens cost five times what input tokens cost on every model. That ratio matters more than the headline price, and I will come back to it in the cost math section.
- Claude Opus 4.8 (claude-opus-4-8): 1M context, 128K output, $5 / $25. The current ceiling for complex reasoning and agentic coding.
- Claude Fable 5 (claude-fable-5): was 1M context, up to 128K output, $10 input / $50 output, and briefly the most capable model. Anthropic has withdrawn it from availability under a US government order, so you cannot call it today.
- Claude Sonnet 4.6 (claude-sonnet-4-6): 1M context, 64K output, $3 / $15. Best balance of speed and intelligence. The everyday workhorse.
- Claude Haiku 4.5 (claude-haiku-4-5): 200K context, 64K output, $1 / $5. Fastest and cheapest, built for scale and low latency.
- Claude Mythos 5 (claude-mythos-5): limited availability, focused on defensive cybersecurity. Not a general-purpose pick, so do not reach for it for normal GTM work.
TipBigger context windows do not carry a long-context premium on these models. A 900K-token prompt on Sonnet 4.6 is billed at the same per-token rate as a 5K-token one. The cost comes from the token count, not from crossing a context threshold.
Claude Fable 5: briefly the ceiling, now withdrawn
For a few days, Fable 5 sat above Opus as the most capable model Anthropic shipped. It carried a 1M context window, up to 128K output tokens, and priced at $10 input / $50 output. Adaptive thinking was always on, which meant the model decided how much to reason per request rather than you setting a fixed thinking budget.
That window closed fast. Anthropic pulled Fable 5 from general availability under a US government order, only days after it launched. As of June 2026 you cannot call it: its status is withdrawn, not deprecated, and there is no API access today. So when you see Fable 5 referenced as the top tier, treat that as historical.
Because it was live for only a few days, nobody has meaningful production experience with Fable 5, including me. Treat any claim of deep Fable 5 expertise, mine or anyone else's, with skepticism. The practical takeaway is simple: Opus 4.8 is the real, usable ceiling, and that is where your hardest work should go.
TipFable 5 is currently unavailable. Anthropic withdrew it from general availability under a US government order within days of launch, so the specs above are for reference only. Plan around Opus 4.8 as the top tier you can actually use.
Claude Opus 4.8: complex reasoning and agentic coding
Opus 4.8 is the top usable tier: 1M context, 128K output, $5 input / $25 output. It is built for complex reasoning and agentic coding, and it is the model I reach for whenever a task is genuinely hard. Fable 5 was briefly more capable, but with it withdrawn from availability, Opus 4.8 is the current ceiling in practice.
The agentic coding angle is the clearest case. When Claude is driving a tool loop, reading files, running commands, and editing code across many steps, the quality of each decision compounds. A model that picks the right next action saves you the tokens and time of recovering from a wrong one. That is where Opus 4.8 pays for itself: long-horizon work where coherence across steps matters more than the cost of a single call.
Opus 4.8 uses adaptive thinking and supports an effort setting up to xhigh, which is the sweet spot for most coding and agentic use. For one-shot questions, classification, or content drafts, Opus is overkill. The intelligence is real, but you are paying for reasoning depth that a simpler task never uses.
Claude Sonnet 4.6: the everyday workhorse
Sonnet 4.6 is the model my team runs by default, and I think it should be yours too. It gives you 1M context, 64K output, and $3 input / $15 output: the best balance of speed and intelligence in the lineup. It is fast enough for interactive work, smart enough for the large majority of real tasks, and priced so that high volume does not bankrupt you.
The mental shift I push on people is this: do not start by asking whether Sonnet is good enough. Start on Sonnet and only move when you have evidence it is not. For lead research, drafting outbound copy, summarizing call transcripts, extracting structured data, rewriting content, and routing tickets, Sonnet 4.6 handles the job at a fraction of Opus or Fable cost.
When you do hit a ceiling, the answer is often not a bigger model. Raise the effort setting, add better examples to the prompt, or split the task. I move to Opus only after Sonnet has demonstrably failed on a representative sample, not on a hunch that the task feels important.
TipSonnet 4.6 supports the effort setting and adaptive thinking. If Sonnet output feels shallow on a hard task, try effort high before you assume you need Opus. It is a much cheaper experiment.
Claude Haiku 4.5: scale and low latency
Haiku 4.5 is the fastest and cheapest model: 200K context, 64K output, $1 input / $5 output. It is the only model with a smaller context window, but 200K is still enormous for the jobs Haiku is right for.
Haiku earns its place on two axes: volume and latency. If you are classifying tens of thousands of records, scoring leads, tagging content, or doing any high-frequency simple task, Haiku does it at one-third the input cost of Sonnet and one-tenth the cost of Opus. And because it is the fastest model, it is the right pick anywhere a human is waiting on a response in real time and the task is not intellectually demanding.
The trap with Haiku is using it for work that needs more reasoning than it has. It will give you an answer either way. For nuanced judgment, multi-step logic, or anything where a subtle mistake is costly, Haiku is false economy. The fix is to test it on a sample and look at the error rate, not to assume cheap means good enough.
How to choose: a decision framework
Here is the framework I actually use. Walk it top to bottom and stop at the first model that clears the bar.
Start with three questions about the task: How hard is the reasoning? How much does a wrong answer cost? Is a human waiting on it? Those three answers point you at a model far more reliably than gut feel.
- High-volume, simple work: classification, extraction, routing, tagging Haiku 4.5
- Most real work: drafting, research, analysis, summaries, rewriting Sonnet 4.6
- Genuinely hard reasoning and long agentic runs where being right the first time pays for itself Opus 4.8
- Simple, high-volume, or latency-sensitive (classification, tagging, scoring, extraction at scale): use Haiku 4.5. Test the error rate on a sample first.
- Most real work (research, drafting, summarization, structured extraction, routing): use Sonnet 4.6. This is your default. Tune effort before upgrading.
- Hard reasoning or agentic coding where step-to-step coherence matters and a wrong turn is expensive: use Opus 4.8. This is the current ceiling. Fable 5 was briefly more capable, but it has been withdrawn from availability, so it is not an option today.
- Default to the cheapest model that passes your quality bar on a representative sample, not the smartest model you can afford.
TipRun a real eval before you commit a model to a job. Take 50 to 100 representative inputs, run them on the cheaper model, and check the output yourself. Most teams skip this and overpay on a guess for months.
The cost math, worked through
Let me put numbers on why this matters. Say you classify 100,000 support tickets a month. Each ticket is about 500 input tokens and the model returns about 20 output tokens (a short label). That is 50M input tokens and 2M output tokens monthly.
On Haiku 4.5: 50M input at $1 per million is $50, plus 2M output at $5 per million is $10. Total: $60 a month. On Sonnet 4.6: $150 input plus $30 output, total $180. On Opus 4.8: $250 input plus $50 output, total $300. Same job, and the Opus bill is five times the Haiku bill for a simple label with no quality difference that justifies it. For reference, while Fable 5 was briefly available it would have run $500 input plus $100 output, total $600, but you cannot buy it today.
Now flip the shape. Say you generate 10,000 long-form drafts a month, each 2,000 input tokens and 4,000 output tokens. That is 20M input and 40M output. On Sonnet: $60 input plus $600 output, total $660. Notice the output dominates because it is 5x the input rate and there is more of it. This is the rule I keep hammering: output is where the money goes, so a model that writes tighter answers can be cheaper at a higher sticker price.
Two more levers stack on top of model choice. Prompt caching cuts the cost of repeated context by about 90 percent, so if you send the same large system prompt or document on every call, cache it. And the Batch API runs at about 50 percent of standard price for work that is not latency-sensitive. Combine batch processing with the right model and you can cut a bill in half again without touching quality.
TipBefore you optimize the model, count your output-to-input ratio. If you generate a lot of text per call, output pricing drives your bill and a more concise model or a tighter prompt beats a model downgrade.
How to actually select a model
In the apps at claude.ai, you pick a model from a menu. The Free tier has the most limited access; paid tiers open up the frontier models. In Claude Code, you switch with the /model command. In the API you set the model field on the request, and the controls below ride alongside it.
Two controls change cost and quality without changing the model. Adaptive thinking lets the model decide how much to reason per request, and the effort setting (low, medium, high, xhigh, max) trades thoroughness against token spend. Lower effort means fewer tool calls and terser output; higher effort means deeper reasoning at more cost. Reach for these before you upgrade the model.
The terminal below shows a real API call that pins the model and effort. Secrets stay in environment variables, never in code.
Common mistakes that cost you money
These are the patterns I see most often, and every one of them is a real bill paid for no quality gain.
Fix them in order. The first two alone usually cut a Claude bill by a third or more.
- Defaulting to the biggest model for everything. The most common and most expensive mistake. Start on Sonnet and prove you need more.
- Never running an eval. Picking a model by feel instead of by measured error rate on real inputs means you overpay or under-deliver indefinitely.
- Ignoring the effort setting. Teams upgrade the whole model when raising effort on a cheaper one would have closed the gap for far less.
- Not caching repeated context. Sending the same system prompt or document on every call without prompt caching throws away roughly 90 percent savings on that portion.
- Running latency-tolerant jobs at full price. Anything that does not need a fast answer should go through the Batch API at about half cost.
- Optimizing input when output drives the bill. If you generate a lot of text, a tighter prompt or a more concise model beats squeezing input tokens.
TipAudit your model usage quarterly. Pricing and models change, your workloads change, and a config that was right six months ago is probably leaving money on the table now.
Frequently asked questions
Which Claude model is best?
Opus 4.8 is the top usable tier as of June 2026. Fable 5 was briefly more capable, but Anthropic withdrew it from availability under a US government order days after launch, so it is not an option today. Best for a given task is not the same as most capable, though. For most real work, Sonnet 4.6 is the best default because it balances intelligence, speed, and cost. Move up to Opus 4.8 only when the reasoning is genuinely hard and a wrong answer is expensive.
What is the cheapest Claude model?
Haiku 4.5 is the cheapest at $1 input and $5 output per million tokens, and it is also the fastest. It is ideal for high-volume simple tasks like classification, tagging, scoring, and extraction, and for anything latency-sensitive. Test it on a sample of your real inputs first, because cheap is only good value if the error rate is acceptable for the job.
How big is Claude's context window?
As of June 2026, Opus 4.8 and Sonnet 4.6 each have a 1M token context window, and Haiku 4.5 has 200K. (Fable 5 also had 1M, but it has been withdrawn from availability.) None of these carry a long-context premium, so a large prompt is billed at the same per-token rate as a small one. The cost comes from the total token count, not from how full the context window is.
Should I use Opus or Sonnet?
Use Sonnet 4.6 by default. It handles the large majority of real tasks at $3 input and $15 output per million, which is well below Opus 4.8 at $5 and $25. Move to Opus only when the task needs complex multi-step reasoning or agentic coding where coherence across steps matters and a wrong turn is costly. Before upgrading, try raising Sonnet's effort setting, which is a much cheaper experiment.
What is Claude Fable 5?
Fable 5 (model id claude-fable-5) was briefly the most capable Claude model, but as of June 2026 it is unavailable. Anthropic withdrew it from general availability under a US government order, only days after it launched, so you cannot call it today. For reference, while it was live it had a 1M context window, up to 128K output tokens, prices of $10 input and $50 output per million, and adaptive thinking always on. Because it was available for only a matter of days, nobody has meaningful production experience with it. Plan around Opus 4.8 as the usable ceiling instead.
How much does Claude cost per million tokens?
As of June 2026, API pricing per million tokens (input / output) is: Opus 4.8 at $5 / $25, Sonnet 4.6 at $3 / $15, and Haiku 4.5 at $1 / $5. Fable 5 carried a $10 / $50 list price while it was briefly available, but it is currently unavailable, withdrawn under a US government order, so you cannot buy it today. Output costs five times input on every model. Prompt caching cuts repeat-context cost by about 90 percent and the Batch API runs at about half price. Always check the official pricing page, since numbers change.
How do I switch Claude models?
In the apps at claude.ai you pick a model from a menu, where the Free tier has the most limited access and paid tiers open up the frontier models. In Claude Code you switch with the /model command. In the API you set the model field on each request, for example claude-sonnet-4-6, and you can tune cost and quality further with the effort setting and adaptive thinking.
What is the effort setting and should I use it?
Effort (low, medium, high, xhigh, max) controls how much the model reasons and spends per request. Lower effort means fewer tool calls and terser output at lower cost; higher effort means deeper reasoning at more cost. It is one of the best levers for getting more quality from a cheaper model. Raise effort on Sonnet before you upgrade to Opus, since it often closes the gap for much less money.
Sources & further reading
Claude ships fast. This page was last reviewed Jun 16, 2026; verify time-sensitive details against the official docs above before relying on them.