An experiment-design walkthrough for solo founders: cohort-substitution, what to test, and when to roll back when the data goes sideways.
How this guide works. This is a methodology page that argues for one specific experiment design — cohort-substitution — and walks through how to run it cleanly. It is not an encyclopedia of pricing tactics. For pricing strategy, see the pricing playbook.
Solo founders ask the same question every quarter: “should I A/B test my pricing?” The honest answer is no, not in the conventional sense. The classic A/B test — show 50% of visitors version A and 50% version B, measure conversion, declare a winner — doesn’t work for SaaS pricing for three structural reasons. Founders who run those tests anyway end up with noise, customer-trust damage, or both.
This playbook walks through the experiment design that does work at indie scale: cohort-substitution. It is slower, simpler, and produces more honest results than classical A/B testing. It also keeps you out of the trust-damaging trap of charging two different customers two different prices for the same product on the same day.
Three reasons, none of which are usually mentioned in growth-marketing blog posts.
When you A/B test a landing page, the two variants are independent. Visitor A sees variant 1; visitor B sees variant 2; their experiences don’t interact. When you A/B test pricing on a real, public-facing page, the two variants are not independent. Visitor A might tell visitor B about your product. Visitor B clicks the link and sees a different price. They Google around. They find your old pricing on archive.org. They take a screenshot of price A and ask why they’re being shown price B. Trust damage compounds across visitors in a way it never does on landing-page A/B tests.
Pricing is the part of your product where buyers expect honesty. If a buyer sees $29 on Tuesday, decides to come back and buy on Thursday, and finds $39, they don’t conclude that they got randomly assigned to a treatment group. They conclude you tried to sneak a price increase past them. They tell their friends. They post on Twitter. Customers in screenshot-driven communities — which is most software buyers in 2026 — will surface and amplify any inconsistency.
The same isn’t true of headline copy or button colors. Nobody screenshots a CTA copy variant and accuses you of fraud. They will absolutely do that with prices.
This is the killer for solo founders specifically. Classical A/B testing on a metric like conversion rate requires roughly 1,000+ data points per variant to detect a 20% effect at 95% confidence. If your site converts 50 customers per month, splitting them 50/50 gives you 25 per variant per month. To run a properly powered pricing test, you would need to wait six months — during which time, every other variable in your business would change.
You don’t have six months of stable conditions. So your “A/B test” ends up measuring whatever else changed during the test window: a launch on Product Hunt, a seasonal slowdown, a viral tweet. The pricing signal is buried in the noise.
The experiment design that works at indie scale is simpler than A/B testing and stronger than “just change the price.” The shape of it:
Each cohort is internally consistent: every member of the cohort got the same price under the same site copy on the same product version. You compare cohort to cohort, not visitor to visitor. This is the same logic used by industry researchers like Patrick Campbell at ProfitWell when they study price elasticity at scale — the unit of analysis is the cohort, not the impression.
Why grandfather existing customers? Because customers who feel punished for being early will churn loudly and tell other prospects you can’t be trusted on price. Grandfathering is the cost of running pricing experiments without burning the trust capital of your existing base.
You have one cohort substitution per quarter to spend, realistically — the experiment cycle is long, and changing pricing too often signals instability to the market. So the question is: which variable matters most?
Four things are worth testing, in roughly this order of leverage.
Test one of these per cycle. Don’t change tier count, price level, and included quantity simultaneously — you’ll have no idea which variable moved the result. Patrick Campbell’s public research on SaaS pricing studies repeatedly shows that founders who change one variable at a time learn faster than founders who overhaul their pricing page wholesale.
Here is the part most founders get wrong. They look up classical statistical-significance calculators, see they need 1,000 conversions per variant, and conclude their volume is too small to make any pricing decisions at all. So they default to never changing price.
This is the wrong conclusion. The right one is: at low volume, you don’t need statistical significance — you need a large enough effect size to dominate noise. Specifically, you need a 25%+ relative change in the metric you care about, and a 4-week window.
The math: at 50 conversions per month, your noise floor on conversion-rate measurement is roughly ±15% relative standard error. To detect a real signal, the effect has to exceed that floor by a meaningful margin. A 25% relative change comfortably clears it; a 10% change is indistinguishable from noise.
Stripe’s public benchmarks on SaaS pricing — published on the Stripe blog and in their annual reports — show that price-level changes typically produce 30–60% effect sizes on revenue per signup, well above the noise threshold for indie-scale data. This is precisely why pricing is worth experimenting on even when sample sizes look small. The effects are big.
What this means in practice: if you double your price and conversion drops 30%, your revenue per signup went up 40%, and that’s a real signal you can act on without classical significance testing. If conversion drops 5% and revenue is flat, that’s noise — treat it as inconclusive and revert.
For more on the underlying revenue math, see our explainer on what MRR is and the broader playbook on SaaS metrics that actually matter.
The mechanics of running cohort substitution depend on your billing platform, but the core moves are the same.
Most founders are too slow to roll back. They have an emotional investment in the change they made and they want it to work. Three signals say roll back, and you should treat each as a hard rule, not a suggestion.
The roll-back protocol is symmetric to the experiment protocol: revert the public price to the original, leave new-cohort customers on whichever price they signed up at (you don’t want a third price tier in the wild), and announce the revert clearly. “We tried a new price; we’ve reverted; existing customers are unaffected” is a fine message and most buyers will respect the honesty.
Some pricing variables are not worth testing at solo-founder scale because the signal is too weak or the trust risk is too high.
Run through this list before flipping any price.
Classical A/B testing fails for SaaS pricing because of network effects, trust risk, and insufficient sample size at indie scale. Replace it with cohort substitution: one variable, one new value, applied to all new signups, existing customers grandfathered, measured over a 4–6 week window with a 25%+ effect threshold.
Test in priority order: price level first, billing period second, tier structure third, included quantity fourth. Pre-commit to your success threshold and your roll-back triggers. Don’t look at daily numbers in week 1.
Roll back if conversion drops more than 30%, if MRR fails to grow, or if trust signals show damage. Symmetric communication on the revert. The point is not to win every experiment; the point is to learn what your real price elasticity looks like, in your real market, with your real product. Most solo founders run two or three of these in their first eighteen months and end up at a price 2x what they started with.
The stack, prompts, pricing, and mistakes to avoid — for solo founders building with AI.