Research-based methodology. This guide draws on Anthropic’s vision API documentation, AWS Textract and Google Document AI evaluations, the Veryfi public benchmarks, QuickBooks and Xero developer docs, and our own builds with Claude. Where we have first-person experience we say so; otherwise we’re working from public sources. How we research.

Why a receipt scanner SaaS in 2026

Two things changed in 2024–2025 that make this category newly attractive for solo founders. First, vision-capable LLMs (Claude, GPT-4o, Gemini) closed the gap on receipt OCR accuracy that AWS Textract and Google Document AI used to dominate — and they did it at one-tenth the per-image cost. Second, Expensify and Concur drifted hard upmarket: Expensify’s small business plan is $20/user/month with annoying minimums, and the freelance use case is left to a long tail of mediocre apps. The gap is real: a freelance designer collecting 30 receipts a month for tax season has no good $5/month option that auto-categorizes, exports clean QuickBooks CSVs, and works on the phone they’re standing at the register with.

This guide is for someone who wants to ship a paid receipt-scanner product in 3–5 weeks using Claude as their primary thinking partner and Claude vision as the extraction engine. We’ll spend most of our time on the parts that are load-bearing: the vision prompt that produces clean JSON, the categorization layer that learns from the user’s edits, and the export pipeline that doesn’t produce bookkeeping rework. If you want the broader build playbook first, our How to build a SaaS with Claude guide covers the general scaffolding workflow this one builds on top of.

Why receipts are harder than they look

The visible product is “snap a photo, get a tidy expense.” The actual product is a small AI pipeline that has to work on a crumpled thermal-paper receipt photographed under fluorescent lighting at a 30-degree angle.

OCR accuracy is the moat, not the UI

If your extraction returns “Strarbucks” or misreads a $12.47 as $12.41, your user fixes it once, twice, and then churns. The single most important engineering decision is the vision prompt and its JSON output schema. Get this right and the rest of the product is conventional CRUD. Get it wrong and you have a categorization tool with bad data.

Currency detection matters more than you think

A US user expensing a London business trip has receipts in GBP. A digital nomad has receipts in five currencies in the same month. The vision model has to detect the currency from the receipt itself (symbol, country, language), and your DB has to store both the original currency and an FX-converted USD amount as of the receipt date.

Line items are different from totals

For a personal-expense use case, the total is enough. For a business or tax use case, line items matter: a Costco run for $284 might be $200 of office supplies (deductible) and $84 of groceries (not). If your data model is just amount on a receipt, you’ve excluded the entire small-business segment.

Categorization improves with feedback or it’s static

The first version of your category prediction will be a Claude prompt with examples. The second version has to learn: if a user repeatedly recategorizes “Adobe” from “Software” to “Subscriptions,” your system should remember that for that user. This is one of the few places where you genuinely need a small per-user training mechanism, even if it’s as simple as a vendor-to-category map.

Step 1 — Data model with extracted-JSON schema

The schema for a receipt scanner is small but the JSON discipline is everything. You need workspaces, users, receipts (raw image + extracted JSON + reviewed status), line_items, categories, vendor_overrides, and exports.

Prompt 1 — Receipt data model with extracted JSON
I'm building a receipt scanner SaaS targeted at US freelancers and
1-3-person agencies. Receipts get photographed on phone, sent to Claude
vision for extraction, and reviewed by the user before being exportable
to QuickBooks/Xero.

Design the Postgres schema:

- workspaces: id, name, owner_id, default_currency, fiscal_year_start_month
- users: id, email, workspace_id, role
- categories: id, workspace_id, name, parent_id (optional, 1-level
  nesting), accounting_code (the QB or Xero category code), tax_deductible
  boolean, tax_pct (default 100, can be 50 for meals)
  Seed with default categories: Office Supplies, Software, Meals,
  Travel, Equipment, Phone, Internet, Education, Marketing, Other
- receipts: id, workspace_id, user_id, image_url (Supabase Storage),
  thumbnail_url, status (pending_extraction, extracted, reviewed,
  exported, archived), extracted_at, reviewed_at,
  vendor_name, vendor_normalized (lowercased + stripped),
  transaction_date, total_amount numeric(12,2), currency_code,
  total_amount_usd numeric(12,2), fx_rate numeric(12,6),
  payment_method (card_last4, cash, check),
  tax_amount, tip_amount, subtotal,
  raw_extraction jsonb -- full Claude output for audit
  category_id (the receipt-level category, may differ from line items)
  notes text, exported_to text[]
- line_items: id, receipt_id, description, quantity int, unit_price
  numeric, amount numeric, category_id
- vendor_overrides: id, workspace_id, vendor_normalized, category_id
  (the rule "this user always wants Adobe -> Subscriptions")
- exports: id, workspace_id, format (qb_csv, xero_csv, pdf, raw_json),
  receipt_ids uuid[], file_url, created_at, period_start, period_end

Indexes:
- (workspace_id, transaction_date DESC) on receipts -- the dashboard
- (workspace_id, vendor_normalized) on receipts -- vendor search
- (workspace_id, vendor_normalized) UNIQUE on vendor_overrides

Supabase RLS:
- Users can read/write only receipts in their workspace
- Image URLs in Storage are signed and scoped per workspace
- raw_extraction column is restricted (workspace owners only) since it
  contains full vendor info for audit

Output a single SQL file.

The non-obvious detail is raw_extraction jsonb. Always store the full Claude output verbatim. When a user disputes a charge or you want to retrain on a new prompt, you need the original extraction, not the reviewed-and-corrected version.

Step 2 — Claude vision extraction prompt

This is the engineering hot center of the product. The vision call needs to return strict JSON, never prose, with high accuracy on amounts and dates and reasonable accuracy on line items. The prompt design is the single biggest determinant of your customer’s extraction-error rate.

Prompt 2 — Claude vision receipt extraction
Write a Supabase Edge Function `extract_receipt(receipt_id)` that:

1. Loads the receipt's image_url and downloads the bytes
2. Sends it to Claude with the system prompt below
3. Validates the JSON output against a Zod schema
4. On schema failure, retries once with a stricter follow-up
5. On second failure, sets status='extraction_failed' and stores the
   error in raw_extraction with an error_text key
6. On success, populates the receipt fields and inserts line_items rows

System prompt for Claude (use claude-3-5-sonnet, max_tokens=2000,
temperature=0):

"""
You extract structured data from receipt photos. You output STRICT JSON
matching the schema below and nothing else. No prose. No markdown fences.

Schema:
{
  "vendor_name": string,         // exact name as shown on receipt
  "transaction_date": "YYYY-MM-DD" | null,
  "currency_code": "USD" | "EUR" | "GBP" | "CAD" | ... (ISO 4217),
  "subtotal": number | null,
  "tax_amount": number | null,
  "tip_amount": number | null,
  "total_amount": number,        // REQUIRED
  "payment_method": {
    "type": "card" | "cash" | "check" | "other" | null,
    "card_last4": string | null
  },
  "line_items": [
    {
      "description": string,
      "quantity": number | null,
      "unit_price": number | null,
      "amount": number
    }
  ],
  "confidence": {
    "total_amount": "high" | "medium" | "low",
    "vendor_name": "high" | "medium" | "low",
    "date": "high" | "medium" | "low"
  },
  "warnings": string[]   // e.g. ["Receipt is partially cut off",
                         //       "Tip line is handwritten"]
}

Rules:
- All currency amounts are NUMBERS, not strings, in major units (12.47
  not 1247)
- If you cannot read the total with high confidence, set total_amount
  to your best guess and confidence.total_amount to 'low' and add a
  warning
- For currency_code, look at the symbol, country indicators, and
  language. Default to USD only if completely ambiguous.
- For thermal-paper receipts where line items are ambiguous, return at
  least the total and an empty line_items array. Do NOT invent line items.
- For tipped receipts (restaurants), distinguish subtotal vs total
  carefully. The largest written number is usually the total.
- Date format MUST be ISO. If only month/day visible, use the current
  year. If completely illegible, return null.
"""

User message:
- The image as a base64-encoded image_url block
- A short text "Extract this receipt."

Then in the handler:
- If currency_code is non-USD, call an FX rate API (we use exchangerate.host)
  for transaction_date and store fx_rate + total_amount_usd
- If a vendor_overrides row matches vendor_normalized, set category_id
  from the override. Otherwise set category_id to null and let the
  categorize step run next.

Two engineering details Claude won’t volunteer: store every extraction’s token count and latency for cost analysis, and rate-limit per-workspace at the application layer (Anthropic’s rate limits are global to your account, so one heavy customer shouldn’t throttle the rest).

Step 3 — Line-item categorization

Categorization is the second AI call. It runs after extraction, takes the vendor + line items + workspace categories list, and returns a category assignment. Crucially, it considers the user’s history.

Prompt 3 — Line-item categorization with user history
Write a Supabase Edge Function `categorize_receipt(receipt_id)` that:

1. Loads the receipt + its line_items + the workspace's categories
2. Loads the workspace's last 50 categorized receipts (vendor_normalized
   + category) as historical context
3. Calls Claude with this user message:

"""
Categorize the following receipt for tax/accounting purposes.

AVAILABLE CATEGORIES (you must pick from this exact list):
- Office Supplies (id: cat_001)
- Software (id: cat_002)
- Meals (id: cat_003)
- Travel (id: cat_004)
- Equipment (id: cat_005)
- Phone (id: cat_006)
- Internet (id: cat_007)
- Education (id: cat_008)
- Marketing (id: cat_009)
- Other (id: cat_010)

USER HISTORY (use these as strong signal for similar vendors):
- Vendor "adobe" -> Software (cat_002)
- Vendor "starbucks" -> Meals (cat_003)
- Vendor "amazon" -> Office Supplies (cat_001)
- Vendor "uber" -> Travel (cat_004)
- ...

RECEIPT:
- Vendor: 
- Total:  
- Line items:
  -  -- 
  ...

Output STRICT JSON:
{
  "receipt_category_id": "cat_NNN",
  "line_item_categories": [
    { "line_item_index": 0, "category_id": "cat_NNN" },
    ...
  ],
  "rationale": "one short sentence explaining the choice",
  "confidence": "high" | "medium" | "low"
}

Rules:
- If line items are diverse (e.g., Costco run with office supplies and
  groceries), categorize line items individually
- If line items are all consistent (e.g., a SaaS subscription), set
  the receipt category and leave line_item_categories empty
- Default to Other only if truly ambiguous
- If a user history rule matches the vendor exactly, USE IT
"""

4. Apply the result: set receipt.category_id and update line_items.category_id
5. If receipt.category_id is set and vendor_normalized has 3+ prior
   receipts in the same category, INSERT into vendor_overrides so the
   next receipt skips Claude and uses the rule

Set temperature=0 for deterministic categorization. Cache by
(vendor_normalized, line_item_descriptions hash) to avoid re-paying
Claude for identical receipts.

The vendor_overrides shortcut is what makes the product feel smart over time. Month one, every Adobe charge goes through Claude. Month four, you’ve learned the user’s preference and skip the API call entirely. The user perceives this as “the app finally understands me.”

Step 4 — Monthly expense report PDF

Tax season is when your customer feels the value of your product the most. A clean monthly PDF, broken down by category with thumbnails of every receipt, is the artifact they hand to their accountant. Build it well and they’ll renew without thinking.

Prompt 4 — Monthly expense report PDF generation
Write a Next.js route /api/reports/generate that produces a monthly
expense report PDF.

Input: { workspace_id, period_start (YYYY-MM-01), period_end
(YYYY-MM-DD), include_receipts: boolean }

Steps:
1. Query receipts in workspace where transaction_date in [start, end]
   AND status in ('reviewed', 'exported')
2. Group by category, sum total_amount_usd per category
3. Generate the PDF using @react-pdf/renderer or Puppeteer (preferred:
   @react-pdf/renderer for Vercel compat)

PDF layout:
- Cover page:
  - Workspace name, period (e.g., "March 2026"), date generated
  - Total expenses, total deductible, total non-deductible
  - Receipt count
- Summary page:
  - Table: Category | Count | Total USD | Deductible USD
  - Bar chart of category totals (use a server-rendered SVG, not chart.js)
- Per-category pages:
  - Category name + total at top
  - Table of every receipt: Date, Vendor, Amount, Category notes
- Optional appendix (if include_receipts=true):
  - One page per receipt with the thumbnail image + extracted data
  - Useful for IRS audit-trail purposes

Special handling:
- Multi-currency: show original amount AND USD-converted amount
- Tax category (Meals, default 50% deductible) shown explicitly with
  the deductible vs full split
- Mileage placeholder if workspace has mileage tracking enabled (skip
  for v1)

Save the generated PDF to Supabase Storage at
/reports/[workspace_id]/[period].pdf, insert a row into the exports
table, and return a signed URL valid for 24 hours.

Email the user a notification with Resend: "Your March 2026 expense
report is ready" + download link.

The IRS audit-trail appendix is the feature that converts hesitant trial users. Most receipt apps store the photo and call it done. A PDF that includes the photo with extracted data side-by-side is what an accountant actually wants.

Step 5 — Bank export and reconciliation

The accounting export is where most receipt apps quietly fail. They produce a CSV that QuickBooks rejects, or that imports but creates duplicate entries because the user already has bank-feed transactions for the same charges.

Prompt 5 — QuickBooks CSV export with bank reconciliation
Write a Next.js route /api/exports/quickbooks that produces a
QuickBooks-compatible CSV from selected receipts.

Input: { workspace_id, receipt_ids: uuid[], match_bank_transactions:
boolean, bank_csv_file?: File }

Output: a CSV file with these EXACT columns (QuickBooks Online
3-column expense import format):
Date,Description,Amount

Plus optional extended format if user has selected QB Desktop:
Date,Vendor,Account,Amount,Memo,Class

Steps:

1. Validate all receipt_ids belong to workspace
2. For each receipt:
   - Date in MM/DD/YYYY format
   - Description = vendor_name + " - " + (notes or category name)
   - Amount = total_amount_usd as positive number with 2 decimals
   - Vendor = vendor_name (Desktop only)
   - Account = the category's accounting_code (Desktop only)
   - Memo = first 100 chars of line_items concatenated

3. If match_bank_transactions=true and a bank CSV is provided:
   - Parse the bank CSV (auto-detect Chase, Bank of America, AmEx,
     generic 3-col formats)
   - For each receipt, search for a matching bank transaction within
     +/- 3 days and amount within $0.01
   - Mark matched receipts with a flag in the CSV (extra column
     "Bank Match: TRX_ID") so QB import doesn't duplicate
   - Return a summary of matched vs unmatched

4. Save the CSV to Storage and return signed URL

5. Update receipts.exported_to to include 'quickbooks' and set
   exported_at = now()

Important: QuickBooks rejects CSVs with BOM-marked UTF-8 in some
imports. Write the file as plain UTF-8 without BOM. Test with QB Online
sandbox before going live.

Add Xero variant at /api/exports/xero with their slightly different
format (Date,Amount,Payee,Description,Reference,Bank Account).

Bank reconciliation is the feature accountants actually pay for. Without it, the exported receipts duplicate the bank-feed entries already in QuickBooks and the user manually deletes half of them. With it, the user trusts your export and uses it monthly. For broader payments architecture, our invoicing SaaS guide covers the inverse problem (issuing receipts) using similar patterns.

Pricing and monetization

Per-receipt or per-month tiered pricing both work. Choose based on the niche:

  • Freelancer tier — $9/mo for 50 receipts/month, single user, QuickBooks export, monthly PDF report.
  • Business tier — $19/mo for 200 receipts/month, up to 3 users, both QuickBooks and Xero exports, mileage tracking, custom categories.
  • Pro tier — $49/mo for unlimited receipts, unlimited users, multi-currency reporting, audit-ready PDF appendix, API access.

Avoid free plans with even 5 receipts/month. The Claude vision call costs you real money per extraction (about $0.01–$0.03 per receipt at current pricing), and free users churn before they hit anything that monetizes. A 14-day full-feature trial converts dramatically better. Storage of receipt images is essentially free on Supabase Storage at this scale — storage cost is not what you optimize for. For your own SaaS database choice, our Supabase vs Firebase comparison covers this exact stack tradeoff.

Where solo founders win against Expensify

You will not out-feature Expensify on enterprise concerns — multi-policy, approval workflows, GL coding, corporate cards. You can’t out-engineer them on team-of-50 use cases. Solo founders win in three places:

  • Niche tier pricing. The $9/mo solo freelancer is invisible to Expensify and its $20/user/month minimum. Pricing yourself for the freelancer or 1-3-person agency is real differentiation when the incumbent doesn’t want them.
  • AI extraction as the default, not an upsell. Expensify’s SmartScan is reasonable but slower than modern vision LLMs. Claude vision in 2026 extracts a receipt in under 2 seconds with comparable accuracy. Make this the always-on default, not a paid feature.
  • One deep accounting integration. Pick QuickBooks Online and make the export better than Expensify’s — with bank-transaction reconciliation, with category-code mapping that uses the user’s actual chart of accounts (not generic categories). Most competitors treat exports as a checkbox; the niche tool treats it as the product.

Each of these is a real AI SaaS idea with thousands of potential customers in the freelancer/SMB market. Pick the niche, talk to twenty potential users, and ship the “snap photo, get clean PDF” loop end-to-end before you build the dashboard.

Receipt scanner SaaS, in one paragraph
Vision prompt is the moat. Categorization learns from edits. QuickBooks export reconciles with bank.

A receipt scanner SaaS that nails the Claude vision extraction with strict JSON output, learns vendor categories from user edits, and produces accountant-ready exports already beats most of what freelancers currently use. Pick the niche, ship the camera-to-CSV loop first.

Related guides

Get one SaaS build breakdown every week

The stack, prompts, pricing, and mistakes to avoid — for solo founders building with AI.