Anthropic API pricing: Claude Sonnet, Haiku, Opus per-token costs

Anthropic publishes its API pricing on a single page at anthropic.com/pricing, and the structure is mercifully simple: pay per million tokens, with a separate rate for input and output, and discounts for cached prompts and batch jobs. The traps aren’t in the headline rates — they’re in the assumptions a founder makes when modeling cost. Output tokens are roughly five times more expensive than input on the flagship model. Vision inputs cost more than text. And without prompt caching, a chatty multi-turn agent burns budget on the same system prompt over and over. This guide breaks the per-token rates, the discounts, and the realistic monthly bills for three solo-SaaS personas.

Methodology. Per-token rates, the prompt-caching discount structure, and Batch API terms come from anthropic.com/pricing and Anthropic’s public API documentation, last reconciled in May 2026. Per-token pricing changes; treat this as a snapshot. The model lineup referenced (Opus 4.7, Sonnet 4.6, Haiku 4.5) reflects the public lineup at time of writing — check the pricing page for the current generation before you build a budget on these numbers.

TL;DR

Three models, three price points. Opus is the premium tier for hard reasoning, Sonnet is the workhorse middle, Haiku is the cheap-and-fast option for high-volume tasks.
Output tokens are roughly 5× the price of input tokens. The math punishes models that talk too much — design prompts that ask for terse answers when you can.
Prompt caching cuts 90% off cached input. If your system prompt or document context repeats across calls, this is the single biggest cost lever you have.
Batch API is 50% off, 24-hour SLA. If your workload is async — nightly summarization, bulk classification, embeddings prep — you should be running it through Batch.
The margin math at $20/month per user is forgiving for chat, brutal for code-gen. The persona walkthrough below shows where each business model falls.

Per-token pricing — the headline numbers

Anthropic charges in dollars per million tokens, separated by direction. A token is roughly three-quarters of an English word, so a million tokens is approximately 750,000 words — about ten typical novels of throughput.

Model	Input ($/MTok)	Output ($/MTok)	Cached input
Claude Opus 4.7	$15	$75	$1.50 (90% off)
Claude Sonnet 4.6	$3	$15	$0.30 (90% off)
Claude Haiku 4.5	$0.80	$4	$0.08 (90% off)

Three observations worth internalizing before you write a single line of integration code.

First, the input/output split matters a lot. Output is 5× input on Opus and Sonnet. If you’re paying $15/MTok in and $75/MTok out, every token your model writes that the user doesn’t need is money lit on fire. Practical implication: instruct the model to be concise, set max_tokens tightly, and never use a chatty model for a one-word classification task.

Second, the gap between Opus and Haiku is roughly 19× on input and 19× on output. Sonnet sits comfortably in the middle at 5× Haiku and one-fifth of Opus. The right routing strategy is “use the cheapest model that’s smart enough,” which for most solo-SaaS chat workloads is Haiku or Sonnet, with Opus reserved for the genuinely hard reasoning calls.

Third, the cached-input rate is identical across the lineup at 10% of the standard input rate. Caching is not a niche optimization — it’s the default playbook for any production application that sends repeated context.

Input vs output tokens — why the split exists

Input tokens are everything you send the model: the system prompt, the conversation history, any tool definitions, and the user’s current message. Output tokens are everything the model generates back. The pricing asymmetry exists because generating tokens is computationally expensive in a way that ingesting them is not — each output token requires a full forward pass through the network, while input tokens get processed in parallel.

For a typical chatbot turn, the input might be 2,000 tokens (system prompt + history + new user message) and the output 200 tokens (a short helpful reply). On Sonnet, that’s 2,000 × $3/MTok = $0.006 in, plus 200 × $15/MTok = $0.003 out, for a total of ~$0.009 per turn. The split flips for code generation: a 4K-token input followed by a 2K-token output costs $0.012 in and $0.030 out, with output dominating the bill.

The lesson: model your costs by direction, not by total token count. A doubling of average output length is far more expensive than a doubling of input.

Prompt caching — the 90% discount most apps should use

Anthropic’s prompt caching lets you mark portions of your input as cacheable. On a cache hit, you pay 10% of the normal input rate for those tokens. The default cache lifetime is five minutes; an extended one-hour cache is available as a beta feature with slightly different write economics.

The mechanics matter for getting savings:

Cache writes cost more than normal input — typically 1.25× the base rate for a five-minute cache. The first request that writes the cache pays a small premium; subsequent requests inside the window pay 10%.
Order matters. The cacheable content needs to come first in the prompt — system prompt, large documents, tool definitions. The user’s changing message goes at the end.
Five minutes of inactivity invalidates the cache. For a low-traffic app, you might not get many hits; for a busy chatbot, the hit rate approaches 100% during peak hours.
One-hour cache trades a higher write premium for a longer window — useful when traffic is bursty or when the cached content is genuinely static across a workday.

The economics: an app sending a 4,000-token system prompt with 200 cacheable tokens of context plus 100 tokens of fresh user input, on Sonnet, would normally pay (4,300 × $3) / 1M = $0.0129 per call. With caching, the 4,200 cached tokens cost (4,200 × $0.30) / 1M = $0.00126, and only the 100 fresh tokens cost full price. Total drops to roughly $0.0016 — an 87% reduction on the input side.

Batch API — 50% off for asynchronous work

If you don’t need an answer in seconds, the Batch API delivers the same result at half the price. Anthropic processes batched requests within 24 hours, with most batches finishing significantly faster. The discount applies to both input and output tokens, and stacks cleanly with the rest of your usage tier.

Workloads that fit Batch:

Nightly content summarization across an entire user database
Bulk classification of support tickets, emails, or product reviews
Embedding-related preprocessing (chunking, summarizing) for a vector store
Generating SEO meta descriptions for thousands of pages
Backfilling AI features across historical data after launch

Workloads that don’t fit Batch: anything user-facing, anything where latency under a few seconds matters, and anything where you need to chain calls based on prior outputs.

Vision pricing — image tokens

Claude’s vision capability is priced through the same input-token meter. Each image gets converted to a token count based on its dimensions, and that count gets billed at the model’s standard input rate. Anthropic’s rule of thumb is that a typical image consumes around 1,500–1,600 tokens of input budget — closer to 1,000 for small images, more for large or high-resolution ones.

The practical math: on Sonnet at $3/MTok, an image costs roughly $0.005 to process. On Opus, it’s closer to $0.024. For a vision-heavy product (think: receipt parsing, screenshot QA, document extraction), this is the dominant cost line. Worth modeling carefully before you commit to a monthly subscription price.

Realistic monthly bills for solo SaaS

Three personas, all running on Sonnet (the default workhorse), with prompt caching enabled. Numbers assume a 70% cache hit rate on the input side, which is a conservative assumption for any app with a stable system prompt.

Persona 1 — AI chatbot SaaS, 100 DAU

~$8–$12/month

100 daily active users, 5 messages each, 500 input + 200 output tokens per message. That’s 500 turns/day or ~15,000/month. Input: 15K turns × 500 tokens = 7.5M tokens, blended (30% full + 70% cached) = ~$10. Output: 15K × 200 = 3M tokens at $15/MTok = $45. Wait — output dominates. Adjusted total: roughly $50–$55/month for the messages, plus light overhead. Per-user cost lands around $0.50/month.

Persona 2 — Code-generation SaaS, 1K DAU

~$2,400/month

1,000 DAU, 2 generations/day, 4K input + 2K output tokens. 60K generations/month. Input: 240M tokens, blended with caching = ~$216. Output: 120M tokens at $15/MTok = $1,800. Plus tool-use overhead and retry logic in real systems pushes this toward $2,400–$2,800. At a $20/user subscription with 1K DAU and ~3K paid users, that’s roughly $0.80–$1 in API cost per paid user per month — survivable but tight after Stripe fees and infrastructure.

Persona 3 — Content-writing SaaS, 10K DAU

~$8,500/month

10K DAU but only one generation per user per week, at 2K input + 1K output tokens. ~40K generations/month. Input: 80M tokens blended = ~$72. Output: 40M tokens at $15/MTok = $600. Sounds cheap — but in practice content workflows multi-turn (draft, refine, alternate versions), pushing the real cost 5–10× higher. A realistic bill lands at $5K–$10K depending on iteration depth. Use Haiku for first-draft work and Sonnet only for polish to cut this in half.

The margin math at $20/month

The implicit ceiling on AI cost-per-user is set by your subscription price minus payment processing minus your other infrastructure costs minus a profit margin. At $20/month, after Stripe’s ~$0.90, hosting at ~$1, and a 50% target gross margin, you have roughly $8 of API budget per paying user per month.

What that buys you on Sonnet:

$8 of output at $15/MTok = ~530K output tokens. At 200 tokens/reply, that’s ~2,650 messages/month per user. Plenty for a chatbot.
$8 of output at 1K tokens/generation = 800 generations/month. Plenty for content tools.
$8 of output at 2K tokens/code-gen = 400 generations/month. Tight if a power user generates 20/day.

The chatbot model has comfortable margins. Code-gen and content products with heavy power users need usage caps, tiered pricing, or Haiku-by-default routing. The pattern most successful AI SaaS adopt is a free tier on Haiku, paid tier on Sonnet, premium tier on Opus — with each tier capped at usage levels that keep API cost under 30–40% of revenue. Our true cost of running an AI SaaS piece walks through the full unit-economics framework.

API vs Claude.ai subscription — when to use each

The Claude.ai consumer subscription (Pro at $20/month, Team and Max plans at higher tiers) is a flat monthly fee for using Claude through Anthropic’s own chat interface. It does not include API access in the developer sense — you can’t programmatically call Claude with a Pro subscription credential.

The right frame:

Use Claude.ai when you’re the user. Drafting, research, coding assistance via the chat interface or Claude Code — anything that’s about your own productivity.
Use the API when you’re building a product that calls Claude on behalf of your users. Anything programmatic — chatbots, code-gen, content tools, agents — runs through the API and gets billed per token.

Most solo founders end up with both: Claude.ai for personal use, the API for whatever they’re shipping. The two bills are independent.

Rate limits and the tier system

Anthropic gates API capacity through a tier system that scales with cumulative spend and account age. New developers start in the Build tier with low requests-per-minute and tokens-per-minute caps. As you spend, you move up:

Build T1 — entry tier, modest RPM and TPM caps, sufficient for prototyping.
Build T2 — unlocked after a small spending threshold and a wait period; higher caps.
Build T3 — meaningfully higher limits, suitable for early production traffic.
Build T4 — production-grade caps, reached after sustained spending.
Custom / scale tier — negotiated capacity for high-volume customers.

The exact dollar thresholds and time requirements between tiers are documented on anthropic.com/pricing and shift over time. The practical implication for solo founders: budget 1–2 weeks at the start of a launch to climb tiers if you expect spiky traffic. You can also request a manual tier upgrade by contacting Anthropic support if your product has predictable demand. Compared to OpenAI’s tier system, Anthropic’s tends to be slightly more conservative at the entry levels — worth knowing if you’re benchmarking. For a side-by-side, see our OpenAI API pricing breakdown.

Bottom line

Default to Sonnet, route low-stakes calls to Haiku, escalate to Opus only when reasoning quality matters, and turn on prompt caching from day one. Those four moves cover roughly 80% of the cost optimization available to a solo SaaS. Batch API closes another chunk for any async workload. The remaining margin work is product-side: capping power-user usage, charging tiered prices, and building features that don’t require multi-turn loops when a single shot will do.

For deeper integration help, our how to build an AI chatbot SaaS with Claude walkthrough shows the full caching pattern, and the how to build a SaaS with Claude guide covers the broader stack. If you want to model your own costs, the AI token cost calculator gives you a per-call number across both major providers, and the best AI tools for solo SaaS founders roundup puts Claude in context with the rest of the toolchain.

Anthropic API pricing for Claude Sonnet, Haiku, and Opus

TL;DR

Per-token pricing — the headline numbers

Input vs output tokens — why the split exists

Prompt caching — the 90% discount most apps should use

Batch API — 50% off for asynchronous work

Vision pricing — image tokens

Realistic monthly bills for solo SaaS

The margin math at $20/month

API vs Claude.ai subscription — when to use each

Rate limits and the tier system

Bottom line

Related reading

Get one SaaS build breakdown every week