An AI SaaS founder told us last month they were charging $19/month for unlimited AI chat. Their actual token bill from Anthropic? $34 per active user. They were losing $15 every time someone signed up — and they only noticed three months in, when the credit-card statement landed. This calculator exists to make sure that doesn’t happen to you. Plug in realistic usage, pick the model, and see whether your pricing is mathematically possible.

Methodology. Token prices reflect public Anthropic and OpenAI API rate cards as of May 2026. Real-world bills include retries, system prompts, tool-use overhead, and cache reads — this calculator gives you the floor, not the ceiling. How we research.

Daily active users

Avg messages per user per day

Avg input tokens per message

Avg output tokens per message

Model

Your price per message ($, optional)

Estimated monthly token cost

—

Adjust the inputs above to see how the bill shifts.

Real bills run 15–40% higher than the floor: retries, system prompts, tool calls, and cache reads add up. Build in headroom.

How AI SaaS pricing math actually works

Most non-AI SaaS has near-zero variable cost per user. You pay a flat $25/month for Vercel and another $35 for Supabase, and whether you serve 100 users or 1,000 users the bill barely moves. AI SaaS breaks that model. Every message costs you real money, paid out to Anthropic or OpenAI in real time. Variable cost is back, and it can eat your margin alive if you don’t price for it.

The mental model that works: think of every active user as a small monthly expense, not a margin contributor. A user generating 150 messages a month at 800 input + 400 output tokens on Sonnet 4.5 costs you roughly $1.27 per month in tokens. That’s your floor — before retries, system prompts, tool overhead, or cache reads. To make any margin at all, you need to charge meaningfully more than that.

Why your margin requires charging 3–5x token cost

Token cost is not your only cost. Add hosting, database, transactional email, payment-processing fees (Stripe takes ~3%), refunds, free-trial users who never convert, and your own time supporting the product. By the time you net everything out, the typical AI SaaS founder needs 3–5x token cost as their price point just to clear the same gross margin a non-AI SaaS gets at 1.1x infrastructure cost.

The 3x floor: at this multiple you’re probably hitting 50–60% gross margin, which is the lower bound of what acquirers and investors consider a real SaaS. Below 50% you’re a services business with subscription billing.

The 5x ceiling on the multiplier: above this and customers start feeling overcharged relative to the underlying compute. Cursor and Claude Code both sit roughly here — Anthropic charges Cursor for tokens, Cursor charges users at roughly 4–5x markup, and the model just barely works. We unpack the structure in cursor pricing explained.

If you can’t price at 3x token cost — because the market won’t bear it — the right move is almost always to switch to a smaller model (Haiku, GPT-4o-mini) or to add caching. Trying to absorb the loss with the hope of “raising prices later” is how AI SaaS founders go broke.

Caching and RAG: where the real savings live

The single biggest lever after model choice is prompt caching. Anthropic’s prompt cache reduces the cost of cached input tokens by 90%. If your system prompt is 5,000 tokens and you serve it on every request, caching it cuts that fixed overhead from $15/M to $1.50/M — potentially 50–70% of your bill at low message volumes where the system prompt dominates.

The catch: caching only helps when the same prefix repeats across requests within the cache TTL window. Multi-user products where each request has different context get less benefit. Single-user agents (like coding assistants reusing project context) get massive benefit. Read the architecture pattern in how to build an AI chatbot SaaS with Claude.

RAG — retrieval-augmented generation — helps in a different way. Instead of stuffing your full knowledge base into context on every call, you embed it once and retrieve only the relevant chunks at query time. This typically cuts input tokens 5–10x for products with large reference data. The tradeoff is added complexity (embedding pipeline, vector store, retrieval tuning) and lower answer quality if your retrieval is bad.

The practical hierarchy of cost optimizations, in order:

Cap output length via max_tokens. Output tokens cost 3–5x more than input. Forcing concise responses is the cheapest optimization that actually works.
Pick the smallest model that works. Most chatbot work runs fine on Haiku 4.5 or GPT-4o-mini. Reaching for Sonnet/GPT-4o by default is the most expensive mistake AI founders make.
Cache long prefixes. System prompts, few-shot examples, retrieved context that doesn’t change.
RAG over context-stuffing when your reference data exceeds 10K tokens.
Use streaming so users feel the response immediately and don’t retry, which doubles your bill.

Why model choice matters most for high-volume products

Run the calculator with 1,000 DAU and 10 messages each on Sonnet 4.5: you’ll see roughly $2,520/month. Switch to Haiku 4.5: $840. Switch to GPT-4o-mini: $108. The same product spec, three orders of magnitude apart in cost. At 10K DAU the gap is the difference between a profitable business and a venture-funded burn machine.

The decision isn’t “which model is best” — it’s “which model is good enough at the cheapest price point.” For most chatbot use cases (customer support, FAQ, internal Q&A, simple agents), Haiku and GPT-4o-mini are good enough. For coding, deep reasoning, or complex multi-step workflows, you need Sonnet or GPT-4o, and you’ll pay for it.

The smart pattern most AI SaaS converge on: route by complexity. Use Haiku for simple queries, Sonnet for complex ones, decide which is which with either a small classifier or a heuristic on input length. This typically cuts token bills 60–80% with no measurable quality drop.

How to use this calculator in practice

Three workflows we recommend:

Workflow 1: pre-launch sanity check. Before you announce pricing, plug in your projected DAU at 6 months and your realistic message-per-user pattern. The output is your monthly token bill at that scale. Multiply by 3 to get your minimum viable monthly subscription price per user. If that price is higher than what your market will pay, you have a pricing problem you need to solve before you launch.

Workflow 2: per-message pricing check. If you’re billing per-message (or per-credit), enter your price per message in the optional input. The calculator shows your gross margin. Below 60%? You’ll struggle to ever be profitable. Below 30%? You’re paying customers to use your product.

Workflow 3: model selection. Hold all inputs constant and toggle through the four models. The cost ratio between the cheapest and most-expensive option is usually 20–25x. Your answer to “can I afford to use Sonnet?” is right there in dollars per month.

Edge cases this calculator doesn’t fully model

A few things worth flagging:

Free trial users. If 80% of trial users never convert and they each generate 50 messages while trying you out, your effective cost-per-paying-customer is 5x what this calculator shows. Cap free trial usage hard.
Power users. Token usage follows a power law. Your top 10% of users will generate 50%+ of your bill. Average DAU misses this. Run the calculator at 2x your average to stress-test.
Image and document inputs. Vision and PDF inputs cost meaningfully more in tokens than plain text. If your product accepts uploads, multiply input tokens by 3–5x.
Tool use and agentic loops. Multi-turn agents make 5–20x as many model calls per user-visible action as a single chat turn. Budget accordingly.

Bottom line

AI SaaS economics are fundamentally different from traditional SaaS economics. Your variable costs are real, they scale with usage, and they will sink your margin if you don’t engineer for them. Pick the smallest model that works, cap output length, cache aggressively, and price at 3–5x your token cost. Do those four things and the math works. Skip them and your credit card statement will surprise you in six months.

AI token costs: how much your SaaS will spend