The per-million-token costs for Claude Opus 4.7, Sonnet 4.6, and Haiku 4.5, the 90% prompt-caching discount, the 50% Batch API discount, and what your monthly bill actually looks like at solo-SaaS scale. Numbers verified against anthropic.com/pricing.
Anthropic publishes its API pricing on a single page at anthropic.com/pricing, and the structure is mercifully simple: pay per million tokens, with a separate rate for input and output, and discounts for cached prompts and batch jobs. The traps aren’t in the headline rates — they’re in the assumptions a founder makes when modeling cost. Output tokens are roughly five times more expensive than input on the flagship model. Vision inputs cost more than text. And without prompt caching, a chatty multi-turn agent burns budget on the same system prompt over and over. This guide breaks the per-token rates, the discounts, and the realistic monthly bills for three solo-SaaS personas.
Methodology. Per-token rates, the prompt-caching discount structure, and Batch API terms come from anthropic.com/pricing and Anthropic’s public API documentation, last reconciled in May 2026. Per-token pricing changes; treat this as a snapshot. The model lineup referenced (Opus 4.7, Sonnet 4.6, Haiku 4.5) reflects the public lineup at time of writing — check the pricing page for the current generation before you build a budget on these numbers.
Anthropic charges in dollars per million tokens, separated by direction. A token is roughly three-quarters of an English word, so a million tokens is approximately 750,000 words — about ten typical novels of throughput.
| Model | Input ($/MTok) | Output ($/MTok) | Cached input |
|---|---|---|---|
| Claude Opus 4.7 | $15 | $75 | $1.50 (90% off) |
| Claude Sonnet 4.6 | $3 | $15 | $0.30 (90% off) |
| Claude Haiku 4.5 | $0.80 | $4 | $0.08 (90% off) |
Three observations worth internalizing before you write a single line of integration code.
First, the input/output split matters a lot. Output is 5× input on Opus and Sonnet. If you’re paying $15/MTok in and $75/MTok out, every token your model writes that the user doesn’t need is money lit on fire. Practical implication: instruct the model to be concise, set max_tokens tightly, and never use a chatty model for a one-word classification task.
Second, the gap between Opus and Haiku is roughly 19× on input and 19× on output. Sonnet sits comfortably in the middle at 5× Haiku and one-fifth of Opus. The right routing strategy is “use the cheapest model that’s smart enough,” which for most solo-SaaS chat workloads is Haiku or Sonnet, with Opus reserved for the genuinely hard reasoning calls.
Third, the cached-input rate is identical across the lineup at 10% of the standard input rate. Caching is not a niche optimization — it’s the default playbook for any production application that sends repeated context.
Input tokens are everything you send the model: the system prompt, the conversation history, any tool definitions, and the user’s current message. Output tokens are everything the model generates back. The pricing asymmetry exists because generating tokens is computationally expensive in a way that ingesting them is not — each output token requires a full forward pass through the network, while input tokens get processed in parallel.
For a typical chatbot turn, the input might be 2,000 tokens (system prompt + history + new user message) and the output 200 tokens (a short helpful reply). On Sonnet, that’s 2,000 × $3/MTok = $0.006 in, plus 200 × $15/MTok = $0.003 out, for a total of ~$0.009 per turn. The split flips for code generation: a 4K-token input followed by a 2K-token output costs $0.012 in and $0.030 out, with output dominating the bill.
The lesson: model your costs by direction, not by total token count. A doubling of average output length is far more expensive than a doubling of input.
Anthropic’s prompt caching lets you mark portions of your input as cacheable. On a cache hit, you pay 10% of the normal input rate for those tokens. The default cache lifetime is five minutes; an extended one-hour cache is available as a beta feature with slightly different write economics.
The mechanics matter for getting savings:
The economics: an app sending a 4,000-token system prompt with 200 cacheable tokens of context plus 100 tokens of fresh user input, on Sonnet, would normally pay (4,300 × $3) / 1M = $0.0129 per call. With caching, the 4,200 cached tokens cost (4,200 × $0.30) / 1M = $0.00126, and only the 100 fresh tokens cost full price. Total drops to roughly $0.0016 — an 87% reduction on the input side.
If you don’t need an answer in seconds, the Batch API delivers the same result at half the price. Anthropic processes batched requests within 24 hours, with most batches finishing significantly faster. The discount applies to both input and output tokens, and stacks cleanly with the rest of your usage tier.
Workloads that fit Batch:
Workloads that don’t fit Batch: anything user-facing, anything where latency under a few seconds matters, and anything where you need to chain calls based on prior outputs.
Claude’s vision capability is priced through the same input-token meter. Each image gets converted to a token count based on its dimensions, and that count gets billed at the model’s standard input rate. Anthropic’s rule of thumb is that a typical image consumes around 1,500–1,600 tokens of input budget — closer to 1,000 for small images, more for large or high-resolution ones.
The practical math: on Sonnet at $3/MTok, an image costs roughly $0.005 to process. On Opus, it’s closer to $0.024. For a vision-heavy product (think: receipt parsing, screenshot QA, document extraction), this is the dominant cost line. Worth modeling carefully before you commit to a monthly subscription price.
Three personas, all running on Sonnet (the default workhorse), with prompt caching enabled. Numbers assume a 70% cache hit rate on the input side, which is a conservative assumption for any app with a stable system prompt.
100 daily active users, 5 messages each, 500 input + 200 output tokens per message. That’s 500 turns/day or ~15,000/month. Input: 15K turns × 500 tokens = 7.5M tokens, blended (30% full + 70% cached) = ~$10. Output: 15K × 200 = 3M tokens at $15/MTok = $45. Wait — output dominates. Adjusted total: roughly $50–$55/month for the messages, plus light overhead. Per-user cost lands around $0.50/month.
1,000 DAU, 2 generations/day, 4K input + 2K output tokens. 60K generations/month. Input: 240M tokens, blended with caching = ~$216. Output: 120M tokens at $15/MTok = $1,800. Plus tool-use overhead and retry logic in real systems pushes this toward $2,400–$2,800. At a $20/user subscription with 1K DAU and ~3K paid users, that’s roughly $0.80–$1 in API cost per paid user per month — survivable but tight after Stripe fees and infrastructure.
10K DAU but only one generation per user per week, at 2K input + 1K output tokens. ~40K generations/month. Input: 80M tokens blended = ~$72. Output: 40M tokens at $15/MTok = $600. Sounds cheap — but in practice content workflows multi-turn (draft, refine, alternate versions), pushing the real cost 5–10× higher. A realistic bill lands at $5K–$10K depending on iteration depth. Use Haiku for first-draft work and Sonnet only for polish to cut this in half.
The implicit ceiling on AI cost-per-user is set by your subscription price minus payment processing minus your other infrastructure costs minus a profit margin. At $20/month, after Stripe’s ~$0.90, hosting at ~$1, and a 50% target gross margin, you have roughly $8 of API budget per paying user per month.
What that buys you on Sonnet:
The chatbot model has comfortable margins. Code-gen and content products with heavy power users need usage caps, tiered pricing, or Haiku-by-default routing. The pattern most successful AI SaaS adopt is a free tier on Haiku, paid tier on Sonnet, premium tier on Opus — with each tier capped at usage levels that keep API cost under 30–40% of revenue. Our true cost of running an AI SaaS piece walks through the full unit-economics framework.
The Claude.ai consumer subscription (Pro at $20/month, Team and Max plans at higher tiers) is a flat monthly fee for using Claude through Anthropic’s own chat interface. It does not include API access in the developer sense — you can’t programmatically call Claude with a Pro subscription credential.
The right frame:
Most solo founders end up with both: Claude.ai for personal use, the API for whatever they’re shipping. The two bills are independent.
Anthropic gates API capacity through a tier system that scales with cumulative spend and account age. New developers start in the Build tier with low requests-per-minute and tokens-per-minute caps. As you spend, you move up:
The exact dollar thresholds and time requirements between tiers are documented on anthropic.com/pricing and shift over time. The practical implication for solo founders: budget 1–2 weeks at the start of a launch to climb tiers if you expect spiky traffic. You can also request a manual tier upgrade by contacting Anthropic support if your product has predictable demand. Compared to OpenAI’s tier system, Anthropic’s tends to be slightly more conservative at the entry levels — worth knowing if you’re benchmarking. For a side-by-side, see our OpenAI API pricing breakdown.
Default to Sonnet, route low-stakes calls to Haiku, escalate to Opus only when reasoning quality matters, and turn on prompt caching from day one. Those four moves cover roughly 80% of the cost optimization available to a solo SaaS. Batch API closes another chunk for any async workload. The remaining margin work is product-side: capping power-user usage, charging tiered prices, and building features that don’t require multi-turn loops when a single shot will do.
For deeper integration help, our how to build an AI chatbot SaaS with Claude walkthrough shows the full caching pattern, and the how to build a SaaS with Claude guide covers the broader stack. If you want to model your own costs, the AI token cost calculator gives you a per-call number across both major providers, and the best AI tools for solo SaaS founders roundup puts Claude in context with the rest of the toolchain.
The stack, prompts, pricing, and mistakes to avoid — for solo founders building with AI.