Research-based overview. Sample-size math drawn from Statistics for Experimenters (Box, Hunter & Hunter) and Evan Miller's public sample-size calculator. How we research.

Definition
A/B testing is a controlled experiment in which two (or more) variants of a page, feature, email, or pricing element are randomly served to your users, with traffic split as evenly as possible between them. After enough exposures and outcomes are collected, you compare the variants on a single metric and decide which one was better.

The simplest way to picture it: imagine a fork in the road. Half the visitors who reach your site take the left fork (variant A) and half take the right fork (variant B). At the end of each fork there is a single yes/no event you care about — signed up, paid, clicked. You count how many people in each group reached the event. The one with the higher rate — assuming the difference is large enough to not be a coincidence — wins.

That coincidence question is where the entire intellectual content of A/B testing lives. The randomness in who lands in which group, plus the randomness in human behavior, means that any difference you see between two variants might be real, or it might just be noise. Statistics is the discipline of telling the two apart with a stated error tolerance.

What you actually need to run an A/B test

The minimum viable A/B test — not the textbook version, but the smallest honest version — requires four things specified before you start.

  1. A single metric. One number you care about, with no ambiguity. “Signup conversion rate measured as Stripe checkouts created divided by unique visitors to the pricing page.” Not “engagement.” Not “quality of signups.”
  2. A sample size estimate. Calculated up front, given your baseline conversion rate, the minimum effect you would care about detecting, and the statistical power you want (usually 80%). Evan Miller has the canonical free calculator at evanmiller.org/ab-testing/sample-size.html.
  3. A duration. Calculated by dividing your sample size by your traffic. If your test needs 8,000 visitors per variant and you get 500/week, that is 16 weeks per variant or 32 weeks total — not 2 weeks.
  4. A stopping rule. “I will run the test until either we hit our sample size or 4 weeks pass, whichever is later.” Not “I'll peek at it daily and stop when it looks significant,” which is the most common reason A/B tests in SaaS produce false positives.

Skip any of those four and you are not running an A/B test. You are running an extremely expensive coin flip and convincing yourself it told you something.

Why A/B testing is mostly USELESS for solo founders at low volume

This is the section the marketing-blog version of this article will not write. The honest answer is that for almost every solo SaaS at under $10k MRR, formal A/B testing is a waste of time. The math says so.

Suppose your landing page converts at 3% (industry-average for SaaS landing pages, per Unbounce's conversion benchmark report). You want to detect a 20% relative lift — meaning a new variant that converts at 3.6% instead of 3%. Plug that into Evan Miller's calculator at 80% power and 95% confidence and you need:

Baseline: 3.0% Minimum detectable: 3.6% (20% relative lift) Power: 80% Confidence: 95% → Required sample: ~7,900 visitors per variant → Total visitors: ~15,800

If your site gets 200 visitors a week, that is 79 weeks — over a year and a half — to run a single test that detects a substantial improvement. If you wanted to detect a 5% lift instead of 20%, the sample size grows to roughly 130,000 per variant, which at your traffic is more years than your business has runway for.

The unfortunate truth is that small sites should optimize beliefs, not test them. You are working with too little signal to distinguish real changes from noise. The right move is to make a defensible bet, ship it confidently, and only run a formal test when the volume justifies the experimental overhead.

When solo founders SHOULD A/B test

There are real situations where the math works out. They share one feature: high volume on the surface being tested.

SurfaceWhy testing works hereThreshold (rough)
Paid acquisition landing pagesTraffic is bought, controlled, and high-volume. Per-visitor cost is real, so a 10% lift is real money.$2k+/mo ad spend
Pricing pages with significant trafficPricing changes are high-leverage; even small lifts move ARR meaningfully.10K+ pricing-page visits/month
Signup form variantsBottom-of-funnel; small percentage changes matter; traffic is concentrated.5K+ signup-page visits/month
Onboarding flowsAffects every signup; even small lifts compound across all future users. See our onboarding playbook.200+ signups/month
Pricing experiments themselvesReal revenue impact justifies the design overhead. Covered in detail in our pricing experiments playbook.500+ trials/month

Even at these thresholds, you should resist running more than one test at a time on the same surface. Running three tests in parallel sounds productive but creates interaction effects you cannot untangle.

Tools that work at solo-founder scale

The tooling decision splits between “light enough to deploy on a Tuesday” and “enterprise platforms that assume a growth team.” For solo founders, the light end of the spectrum is everything you need.

Reasonable for solo founders

Overkill for solo founders

If you are still picking analytics in general, our SaaS metrics that matter guide covers which numbers a solo founder should track first — and which to ignore until later.

What to test instead at low volume

If formal A/B testing does not work at your scale, what should you do? The honest answer: place bets. Make changes, ship them confidently, and judge them on directional signal rather than statistical confidence. Three tactics work especially well at low volume.

The full landing page rewrite

Instead of testing one headline against another, replace the entire page. New headline, new copy structure, new visuals, new social proof. The combined effect is large enough to be visible even with limited traffic. You will not know which element drove the lift, but you do not need to — you need to know whether the new page is better, and at low volume that is the only question you can answer.

The price-tier reduction

Instead of A/B testing pricing, just change it and watch what happens. Trial conversion, ARPU, and churn all respond to pricing within a week or two. The signal is fast and unambiguous. Pricing changes are also reversible. The full framework lives in our pricing playbook.

The headline change as a bet

Read your existing landing page out loud to a friend. Ask them what the product does in one sentence. If they can't answer, your headline is wrong, and no amount of A/B testing will save it. Just rewrite it. Ship it. Move on. Treat it as a bet, not an experiment.

The takeaway

A/B testing is a powerful instrument when you have enough volume to use it honestly. For most solo SaaS founders pre-$10k MRR, that volume does not exist, and pretending otherwise is how founders convince themselves a 47-visitor experiment proved something. Optimize beliefs at low volume. Reserve formal experimentation for surfaces where the math works. And when you do run a test, decide your sample size, duration, and stopping rule before you look at the dashboard — not after.

Get one SaaS build breakdown every week

The stack, prompts, pricing, and mistakes to avoid — for solo founders building with AI.