Per-million-token costs across the GPT and o-series lineup, the cached-input discount, vision and embedding rates, DALL·E per-image pricing, and the 50% Batch API discount — with realistic monthly bills for solo SaaS. Numbers verified against openai.com/api/pricing.
OpenAI’s pricing surface is broader than Anthropic’s — chat models, reasoning models with hidden output tokens, image generation through DALL·E, embeddings, and an audio API. The published rates live at openai.com/api/pricing, split per million tokens for LLMs, per image for DALL·E, and per character or minute for audio. The trap for solo founders is treating “OpenAI API” as one cost line. In practice you’re budgeting four or five different meters, and the o-series reasoning models generate hidden tokens that don’t show up in your prompt but very much show up on the bill.
Methodology. Per-token rates, the cached-input discount structure, DALL·E and embedding pricing, and Batch API terms come from openai.com/api/pricing and OpenAI’s public API documentation, last reconciled in May 2026. Pricing changes frequently; treat this as a snapshot. Model names referenced (GPT-4.1, 4o, o1, o3-mini) reflect the public lineup at time of writing — check the pricing page for the current generation before you build a budget on these numbers.
OpenAI ships in three pricing families that map to three jobs.
| Model | Input ($/MTok) | Cached input | Output ($/MTok) |
|---|---|---|---|
| GPT-4.1 | $2.00 | $0.50 | $8.00 |
| GPT-4.1-mini | $0.40 | $0.10 | $1.60 |
| GPT-4o | $2.50 | $1.25 | $10.00 |
| GPT-4o-mini | $0.15 | $0.075 | $0.60 |
| o1 | $15.00 | $7.50 | $60.00 |
| o1-mini | $1.10 | $0.55 | $4.40 |
| o3-mini | $1.10 | $0.55 | $4.40 |
The headline cheat-sheet:
OpenAI prices output at 4× input on most models, slightly tighter than Anthropic’s 5× ratio but conceptually identical. The same advice applies: model your costs by direction, instruct your prompts to be terse where you can, and set max_tokens defensively.
One subtlety on the o-series: the output cost includes reasoning tokens the model generates internally and never shows you. A simple math problem might use 500 visible output tokens and another 2,000 reasoning tokens. You pay for all 2,500. For an o1 prompt with 500 visible output and 2,000 hidden reasoning tokens, you pay 2,500 × $60/MTok = $0.15 per call — not the $0.03 you’d estimate from visible output alone.
OpenAI’s prompt caching kicks in automatically when you send a prompt with a prefix that matches a recently-seen prompt. Cached portions get billed at roughly 50% of standard input rate — less aggressive than Anthropic’s 90% discount, but it requires no API changes and applies retroactively to repeated prefixes.
What this means in practice:
The o-series handles problems where deeper internal computation produces better answers — competitive math, multi-step logic, complex code refactoring, tricky planning. The price for that depth is twofold: a higher per-token output rate (o1 at $60/MTok) and higher token consumption per problem because the model produces internal reasoning chains.
Practical pattern: never call o1 from a user-facing latency-sensitive path. Reasoning calls take seconds to tens of seconds and cost an order of magnitude more than 4.1 calls. The right architecture is a router: send each request to 4o-mini or 4.1-mini first, and escalate to o1 or o3-mini only when the simpler model flags low confidence. The cheaper reasoning options (o1-mini, o3-mini) at $1.10 / $4.40 give you reasoning behavior at a fraction of o1’s price — for many workloads they’re the sweet spot.
OpenAI charges for image inputs through the same input-token meter, with a twist: you can specify detail: "low" for a flat ~85-token cost per image, or detail: "high" which tiles the image and bills based on tile count (roughly 170 tokens per 512×512 tile plus a base).
The split matters:
For a vision-heavy product (receipts, invoices, screenshot QA), high detail is non-negotiable. On 4o, a high-detail 1024×1024 image costs about $0.002. Across thousands of users that’s a meaningful line item but rarely a dominant one.
OpenAI’s embedding models price per million input tokens with no separate output meter (embeddings produce a fixed vector, not generated text):
To put this in context: indexing a 100,000-page knowledge base at ~500 tokens per page is 50M tokens. On the small model that’s $1. On the large model, $6.50. Embeddings are the cheapest line item in the entire OpenAI bill for most products. Don’t over-engineer to avoid them.
Image generation is priced per image, with rates that depend on resolution and quality tier:
| Model / setting | Resolution | Per image |
|---|---|---|
| DALL·E 3 Standard | 1024×1024 | $0.04 |
| DALL·E 3 Standard | 1024×1792 / 1792×1024 | $0.08 |
| DALL·E 3 HD | 1024×1024 | $0.08 |
| DALL·E 3 HD | 1024×1792 / 1792×1024 | $0.12 |
For a SaaS shipping generated thumbnails, marketing imagery, or AI avatars, this is the budget driver. At $0.04 per image, a product generating 100,000 images/month spends $4,000 on DALL·E alone. Cache aggressively, dedupe by prompt hash, and stick with standard resolution unless HD is genuinely required.
Same playbook as Anthropic’s Batch API: submit jobs, OpenAI returns results within 24 hours, and you pay 50% of the standard rate on both input and output. The discount stacks with cached input. For nightly digests, bulk SEO meta generation, review classification, embedding refresh, or evaluation runs over historical traffic, Batch is a no-brainer. It doesn’t fit anything user-facing or workflows that depend on a prior call’s output.
Three personas, all running on GPT-4.1-mini (the equivalent default workhorse), with cached input enabled. The math mirrors our Anthropic API pricing breakdown so you can compare side by side.
100 DAU, 5 messages/day, 500 input + 200 output tokens. ~15K turns/month. Input: 7.5M tokens with ~70% cache hit rate, blended cost roughly $1.60. Output: 3M tokens at $1.60/MTok = $4.80. Total around $6–$7. Per-user cost lands near $0.07/month — chatbot economics on 4.1-mini are extremely friendly. On full GPT-4.1 the same workload runs about $25.
1K DAU, 2 generations/day, 4K input + 2K output. 60K generations/month. On GPT-4.1: input 240M blended ~$200, output 120M at $8/MTok = $960. Total roughly $1,150 before optimization — lower if you route easy generations to 4.1-mini. Code-gen quality usually demands 4.1 or better; using o1 for the same workload would 7–8× the bill due to reasoning tokens. Budget aggressively.
10K DAU, 1 generation/week, 2K input + 1K output. ~40K generations/month. On GPT-4.1: input 80M blended ~$67, output 40M at $8/MTok = $320 — roughly $390 base. In practice content workflows multi-turn (rewrites, alternate versions, tone adjustments), pushing real cost 5–10× higher. Use 4.1-mini for first drafts and 4.1 only for polish to keep this under $2K. Add DALL·E imagery and the bill grows accordingly.
Same framing as the Anthropic page. At $20/month, after Stripe fees of ~$0.90, hosting at ~$1, and a 50% gross margin target, you have roughly $8 of API budget per paying user per month.
What that buys you on GPT-4.1 (output at $8/MTok):
On 4.1-mini (output at $1.60/MTok), each line above gets 5× more headroom — chatbots become trivially profitable. On o1 (output at $60/MTok), $8 buys you only ~133K output tokens, or roughly 25 reasoning calls at 5K tokens each. Reasoning models are not a paid-tier’s default; they’re a premium-tier escape hatch.
OpenAI gates capacity through five usage tiers tied to a combination of cumulative spend and time since first payment. New developers start in Tier 1 with low requests-per-minute and tokens-per-minute caps. As you spend and time passes, you graduate:
The exact thresholds shift over time and are documented on openai.com/api/pricing and in the rate-limit docs. For solo founders, the practical implication: budget time at the start of a launch to climb tiers if you expect spiky traffic. Submit a tier-upgrade request through the dashboard if you have predictable demand.
Both providers ship excellent models at competitive prices. The choice comes down to four questions.
The pragmatic answer is “use both.” Wire your code through a thin abstraction so you can route workloads to whichever provider serves the use case best. Our best AI tools for solo SaaS founders roundup covers the broader landscape, and the Anthropic API pricing breakdown sits alongside this page for direct comparison.
Default to GPT-4.1-mini for general workloads, 4o-mini for cost-sensitive multimodal, and 4.1 when quality matters. Reserve o1 / o3-mini for genuinely hard reasoning and route to them via a confidence-threshold pattern, not by default. Turn on cached input by structuring prompts with stable content first, run async work through Batch for the 50% discount, and watch the o-series hidden-token math carefully.
The OpenAI bill is rarely one line; it’s four or five meters running in parallel. Modeling each independently — chat, embeddings, vision, DALL·E, audio — is the difference between a product that hits margin and one that doesn’t. The AI token cost calculator helps you build a per-user cost model, and our true cost of running an AI SaaS deep-dive covers the full unit-economics framework.
The stack, prompts, pricing, and mistakes to avoid — for solo founders building with AI.