Estimate your monthly Claude or OpenAI bill before you ship. Plug in users, messages, and token sizes — see what each model would cost you and whether your pricing leaves any margin.
An AI SaaS founder told us last month they were charging $19/month for unlimited AI chat. Their actual token bill from Anthropic? $34 per active user. They were losing $15 every time someone signed up — and they only noticed three months in, when the credit-card statement landed. This calculator exists to make sure that doesn’t happen to you. Plug in realistic usage, pick the model, and see whether your pricing is mathematically possible.
Methodology. Token prices reflect public Anthropic and OpenAI API rate cards as of May 2026. Real-world bills include retries, system prompts, tool-use overhead, and cache reads — this calculator gives you the floor, not the ceiling. How we research.
Most non-AI SaaS has near-zero variable cost per user. You pay a flat $25/month for Vercel and another $35 for Supabase, and whether you serve 100 users or 1,000 users the bill barely moves. AI SaaS breaks that model. Every message costs you real money, paid out to Anthropic or OpenAI in real time. Variable cost is back, and it can eat your margin alive if you don’t price for it.
The mental model that works: think of every active user as a small monthly expense, not a margin contributor. A user generating 150 messages a month at 800 input + 400 output tokens on Sonnet 4.5 costs you roughly $1.27 per month in tokens. That’s your floor — before retries, system prompts, tool overhead, or cache reads. To make any margin at all, you need to charge meaningfully more than that.
Token cost is not your only cost. Add hosting, database, transactional email, payment-processing fees (Stripe takes ~3%), refunds, free-trial users who never convert, and your own time supporting the product. By the time you net everything out, the typical AI SaaS founder needs 3–5x token cost as their price point just to clear the same gross margin a non-AI SaaS gets at 1.1x infrastructure cost.
The 3x floor: at this multiple you’re probably hitting 50–60% gross margin, which is the lower bound of what acquirers and investors consider a real SaaS. Below 50% you’re a services business with subscription billing.
The 5x ceiling on the multiplier: above this and customers start feeling overcharged relative to the underlying compute. Cursor and Claude Code both sit roughly here — Anthropic charges Cursor for tokens, Cursor charges users at roughly 4–5x markup, and the model just barely works. We unpack the structure in cursor pricing explained.
If you can’t price at 3x token cost — because the market won’t bear it — the right move is almost always to switch to a smaller model (Haiku, GPT-4o-mini) or to add caching. Trying to absorb the loss with the hope of “raising prices later” is how AI SaaS founders go broke.
The single biggest lever after model choice is prompt caching. Anthropic’s prompt cache reduces the cost of cached input tokens by 90%. If your system prompt is 5,000 tokens and you serve it on every request, caching it cuts that fixed overhead from $15/M to $1.50/M — potentially 50–70% of your bill at low message volumes where the system prompt dominates.
The catch: caching only helps when the same prefix repeats across requests within the cache TTL window. Multi-user products where each request has different context get less benefit. Single-user agents (like coding assistants reusing project context) get massive benefit. Read the architecture pattern in how to build an AI chatbot SaaS with Claude.
RAG — retrieval-augmented generation — helps in a different way. Instead of stuffing your full knowledge base into context on every call, you embed it once and retrieve only the relevant chunks at query time. This typically cuts input tokens 5–10x for products with large reference data. The tradeoff is added complexity (embedding pipeline, vector store, retrieval tuning) and lower answer quality if your retrieval is bad.
The practical hierarchy of cost optimizations, in order:
Run the calculator with 1,000 DAU and 10 messages each on Sonnet 4.5: you’ll see roughly $2,520/month. Switch to Haiku 4.5: $840. Switch to GPT-4o-mini: $108. The same product spec, three orders of magnitude apart in cost. At 10K DAU the gap is the difference between a profitable business and a venture-funded burn machine.
The decision isn’t “which model is best” — it’s “which model is good enough at the cheapest price point.” For most chatbot use cases (customer support, FAQ, internal Q&A, simple agents), Haiku and GPT-4o-mini are good enough. For coding, deep reasoning, or complex multi-step workflows, you need Sonnet or GPT-4o, and you’ll pay for it.
The smart pattern most AI SaaS converge on: route by complexity. Use Haiku for simple queries, Sonnet for complex ones, decide which is which with either a small classifier or a heuristic on input length. This typically cuts token bills 60–80% with no measurable quality drop.
Three workflows we recommend:
Workflow 1: pre-launch sanity check. Before you announce pricing, plug in your projected DAU at 6 months and your realistic message-per-user pattern. The output is your monthly token bill at that scale. Multiply by 3 to get your minimum viable monthly subscription price per user. If that price is higher than what your market will pay, you have a pricing problem you need to solve before you launch.
Workflow 2: per-message pricing check. If you’re billing per-message (or per-credit), enter your price per message in the optional input. The calculator shows your gross margin. Below 60%? You’ll struggle to ever be profitable. Below 30%? You’re paying customers to use your product.
Workflow 3: model selection. Hold all inputs constant and toggle through the four models. The cost ratio between the cheapest and most-expensive option is usually 20–25x. Your answer to “can I afford to use Sonnet?” is right there in dollars per month.
A few things worth flagging:
AI SaaS economics are fundamentally different from traditional SaaS economics. Your variable costs are real, they scale with usage, and they will sink your margin if you don’t engineer for them. Pick the smallest model that works, cap output length, cache aggressively, and price at 3–5x your token cost. Do those four things and the math works. Skip them and your credit card statement will surprise you in six months.
The stack, prompts, pricing, and mistakes to avoid — for solo founders building with AI.