A line-by-line P&L for an AI SaaS at $5K MRR, why margins land at 30–50% rather than the 80% SaaS norm, and the four levers that meaningfully move the needle.
Methodology. This deep dive synthesizes public pricing pages (Anthropic, OpenAI, Vercel, Supabase) and founder-published cost reports as of May 2026. Numbers are illustrative for a typical solo-built AI SaaS — your mileage will vary based on prompt design, model choice, and traffic patterns. Compare against the typical $1K MRR SaaS cost stack and the AI token cost calculator for your own numbers.
The pattern is consistent across cost reports posted by AI founders on Indie Hackers and X through 2024 and 2025: the founder builds a tool, uses it for a few weeks, projects unit economics from their own usage, and ships at a price that looks comfortably profitable. Then real customers arrive. They ask longer questions, retry when the first answer isn’t quite right, paste entire documents instead of single paragraphs, and the token bill comes in 3–5x what the founder modeled.
This isn’t a model-selection problem — it’s a data-quality problem in pre-launch projections. You modeled a power user. The actual distribution has a long tail of heavy users who consume an outsized share of tokens. The same dynamic exists in any usage-based business, but it lands harder in AI because per-call cost is so visible.
Consider a hypothetical AI chatbot SaaS — the kind of product where users have ongoing conversations with an LLM about their domain (writing assistant, coding helper, customer-support agent, etc). The pricing is $19.99/month and the business has reached 250 paying customers, putting it at approximately $5,000 MRR. What does the monthly P&L actually look like?
| Line item | Monthly cost | Notes |
|---|---|---|
| Anthropic API (Claude Sonnet) | $1,400 | ~250 users × ~5.6M tokens each blended |
| Vercel Pro | $20 | Plus modest usage overage |
| Vercel function/bandwidth overage | $60 | Streaming responses are bandwidth-heavy |
| Supabase Pro | $25 | Postgres, auth, conversation history |
| Stripe fees (2.9% + 30¢) | $220 | ~$5K processed at 4.4% effective |
| Resend (transactional email) | $20 | Pro tier for 50K emails/month |
| PostHog (analytics) | $50 | Product + session replay at this scale |
| Sentry (errors) | $26 | Team plan with reasonable transaction limit |
| Domain, monitoring, misc | $25 | Domain, BetterStack/UptimeRobot, etc |
| Marketing / SEO | $300 | Programmatic SEO tooling, content, ads |
| Total monthly cost | $2,146 | Net margin: ~57% |
That looks like a reasonable business — until you look at the line items in proportion. The Anthropic bill alone is 28% of revenue, swallowing more than the founder typically expects. Stripe fees, the “quiet 4%” that vanilla SaaS founders barely think about, are another 4.4% of gross. Marketing is the only other meaningful line, and it’s the variable a founder has the most control over. Everything else is rounding error.
Compare this to a vanilla SaaS at the same MRR — a no-AI productivity tool, say. The infrastructure stack is nearly identical, but the API bill is replaced by a roughly $0 line. That same $5K MRR business runs at maybe $500–$700 in monthly costs and 86–90% gross margins. The AI version sits at 57%. That gap is structural, not a function of optimization effort.
The core reason is that you are reselling a commodity. The tokens flowing through your product come from Anthropic or OpenAI at published per-token prices anyone can read. Your value-add is the wrapper — prompts, UX, integrations, data — but the underlying compute is paid through to a third party at near-spot rates.
Vanilla SaaS doesn’t have this. When a project management tool charges $19/month per user, the marginal cost of serving that user is fractions of a cent. AI SaaS has the same engineering leverage on the wrapper, but every additional usage-heavy customer adds real, unavoidable token cost. You can’t optimize away the floor.
The “AI margin will improve as model costs drop” argument is also only half right. Prices for equivalent capability have fallen every year, but customer expectations rise in lockstep. As the floor gets cheaper, customers expect longer context, more sophisticated outputs, and more retries — net token spend per customer often stays flat.
There are exactly four production levers worth pulling to improve gross margin on an AI product. None of them are silver bullets — the right answer is to pull all four, hard, and accept the resulting 60–70% margins as the new normal.
Anthropic offers prompt caching at 90% off cache hits with a 5-minute default TTL (1-hour on a higher tier). For any product that repeatedly sends the same system prompt, documentation, or conversation history, this is a 5–10x cost reduction on the cached portion. Mark the static prefix with a cache-control breakpoint; subsequent requests within the TTL pay the cache-read rate.
The catch is that caching only helps if traffic clusters within the TTL window. A live chatbot session keeps a conversation hot. A backend job that processes one document per hour gets zero benefit. Audit your traffic pattern before assuming caching solves your bill.
The Anthropic and OpenAI batch APIs offer a 50% discount with a 24-hour turnaround. Any work that doesn’t need a live response — nightly summarization, bulk classification, weekly reports — should run through batch. It’s a pure pricing mechanic with no quality trade-off.
If 30% of your token spend is non-realtime, moving it to batch cuts your effective bill by 15%. On the $1,400 line above, that’s $210/month, more than enough to justify a half-day of queue plumbing.
Not every request needs the smartest model. Classification, formatting cleanup, short-text extraction, and yes/no decisions run accurately on Haiku-class models at a fraction of Sonnet pricing. Hard reasoning, multi-step planning, and long-context analysis still earn the more expensive model.
Add a small router that classifies requests by complexity. Hand-coded rules work; using Haiku itself to classify is more robust. For a chatbot where 60% of turns are short follow-ups, this can cut the API bill by 30–50% with no user-perceptible quality loss.
Output tokens are 4–5x more expensive than input tokens. Letting the model ramble is the most expensive bug in AI SaaS. Three concrete moves: cap max_tokens at the actual cap your UI can render, use structured outputs (JSON schema or tool-call constraints) for tight payloads, and write system prompts that explicitly request brevity.
Founders who audit their output distribution often find a long tail of 3,000-token responses to questions that needed 200. Capping at the median is a quiet 15–25% per-call cost reduction.
The single biggest pricing decision in AI SaaS is whether to pass usage variability through to the customer or absorb it. Pure usage-based pricing (charge per token, per query, or per credit) sounds appealing — it caps your downside and grows with the customer. In practice, most solo founders should not use it, for one reason: sticker shock.
Customers want predictable bills. A landing page that says “$0.05 per 1,000 tokens” is incomprehensible to anyone who isn’t already an AI insider. Even if the math works out cheaper than a competitor’s flat tier, the perceived risk — “what if I use too much and get a $400 bill?” — kills conversions before the customer even tries the product. This is a well-documented pattern from the broader cloud industry that AI is now rediscovering at smaller scale.
Pure flat-rate ($19.99/month, unlimited use) has the opposite problem: a small number of heavy users consume a disproportionate share of tokens, and the founder either eats negative margins on those users or quietly throttles them. Both options damage the business.
The hybrid pattern that works for most solo founders looks like this: $X/month includes Y credits, additional credits at $Z each. The flat base sets the expected bill and clears the sticker-shock hurdle. The included quota is sized to cover the median user comfortably (think 80–90% of users never look at the meter). The overage rate exists primarily to handle the long tail without subsidizing them, and the unit is something humans understand — “100 conversations” or “500 documents” rather than raw tokens. The complete guide to SaaS pricing walks through specific tier-design templates.
If your AI margin caps at 60–70%, the only way to lift the blended business margin is to add features that cost almost nothing per user. Almost every AI product accumulates these naturally: saved templates, search history, integrations, sharing and collaboration, dashboards, settings. Each is a one-time engineering cost spread across all users at near-zero marginal compute.
The strategic implication is that an AI SaaS at scale isn’t really an AI SaaS — it’s a SaaS with an AI feature, where the AI is the wedge and everything else carries the margin. Heavy users pay for themselves through credit overage; light users pay flat and subsidize the platform features they barely use.
This is also why pure “wrapper” products struggle to defend pricing. If 100% of the value is the AI call, customers can compute the token cost themselves and feel overcharged. The wrapper has to do real, accumulating work the underlying API doesn’t.
Whether to commit to one model provider or use a meta-router like OpenRouter is a recurring debate. Both have a case.
Single-provider commitment buys access to the provider’s best features — prompt caching, batch APIs, structured outputs — without compatibility friction. A product built around Anthropic’s caching has a real cost moat. The downside is no fallback when capacity is tight or prices change.
Multi-provider routing buys leverage and resilience but costs you in feature lock-in: caching is usually the first casualty because it doesn’t cleanly portable across providers. For most solo founders, commit to one provider for the first 12–18 months, optimize hard against its features, and only layer in routing past roughly $20K MRR when negotiating leverage starts to matter. See Anthropic API pricing explained and OpenAI API pricing explained.
Not every revenue level looks the same. Margins generally improve with scale because the fixed costs of the rest of the stack (Vercel, Supabase, monitoring) get amortized across more revenue, and because the four levers above start paying back the engineering effort to implement them.
AI SaaS is a real business if priced right. It is not a 90% gross margin business and never will be, because you are reselling commodity tokens with a wrapper. The companies you read about hitting traditional SaaS margins are either building above the wrapper layer (data, distribution, integrations carrying the value) or moving to dedicated infrastructure where compute economics are different. For a solo founder shipping their first AI product, the realistic target is 60–70% gross margin at maturity, with hybrid pricing that covers the long tail of heavy users.
The good news is that 60–70% is still a real business. It is enough to fund growth, pay yourself, and compound. It is just not the “build a product, watch the margins approach 100” story that vanilla SaaS founders have been telling for the last decade. AI requires a more honest conversation about cost structure — and the founders who price like they understand the structure outlast the ones who don’t.
If you want to model your own scenario, the AI token cost calculator walks through the math at any MRR level. For the build itself, how to build an AI chatbot SaaS with Claude covers the architecture decisions that compound into the cost structure analyzed here.
Cited references: Anthropic pricing, OpenAI pricing, Vercel pricing, Supabase pricing, and aggregated cost reports from public founder posts on Indie Hackers and X throughout 2024–2026.
The stack, prompts, pricing, and mistakes to avoid — for solo founders building with AI.