Choosing a Stats Engine

A vs B ships three statistical engines. They answer the same underlying question — "is this variation better than Control?" — in different ways. This page is a plain-English comparison to help you pick the right one.

The three engines at a glance

Bayesian (default) — reports the probability each variation beats Control, plus a credible interval. Forgiving of mid-experiment peeks, easy for non-statisticians to read. Answers: "how likely is it that Variant B is better?"
Frequentist — reports a p-value and 95% confidence interval. Declares significance only at the pre-declared sample size. Requires commitment to an experiment duration. Answers: "is the difference statistically significant at α = 0.05?"
Sequential — always-valid inference. Peek as often as you like, stop as soon as the evidence is in, without inflating the false-positive rate. Slightly wider intervals than Frequentist in exchange for the flexibility. Answers the same question as Frequentist, but safely at any point.

When Bayesian is the right pick

You want the result to read as "87% probability B beats Control" rather than "p = 0.018."
You want the team to be able to peek at the results without worrying about invalidating them.
Your stakeholders are non-technical and respond better to probability than to p-values.
Your traffic varies, so pre-committing to a precise sample size is hard.

When Frequentist is the right pick

A stakeholder or compliance reviewer expects p-values and confidence intervals.
You're in a regulated industry (pharma, finance, healthcare) where a fixed-α, pre-declared-sample-size trial is required.
You have predictable traffic and you're willing to plan the experiment up front.
You want the strictest classical guarantees about false-positive rates.

Frequentist p-values only hold their false-positive guarantee if you look once, at the pre-declared sample size. Peeking every day and stopping when it looks good inflates the real false-positive rate from 5% to 15-20%. If you want to peek, pick Sequential.

When Sequential is the right pick

You want to be able to check the results every day and stop as soon as the signal is there.
Traffic is unpredictable and pre-committing to a duration feels wrong.
You want the classical p-value interpretation but you also want to move fast.
You're running many experiments in parallel and need to make fast ship/kill decisions.

See the Sequential Engine page for the deeper reference: how always-valid inference works, why intervals are slightly wider than Frequentist, and what the "safe to stop" decision text means in practice.

Side-by-side comparison

Question	Bayesian	Frequentist	Sequential
Safe to peek?	Yes	No	Yes
Requires pre-declared sample size?	Recommended, not required	Yes	Recommended, not required
Reports probability-to-beat-control?	Yes	No	No
Reports p-value?	No	Yes	Yes (always-valid)
Interval style	Credible interval	Confidence interval (Wald)	Confidence sequence
Forgives stopping early?	Yes	No	Yes

Calculator modes

The sample-size calculator has three modes, all engine-aware (see Analysis Defaults for the tab link):

Fixed-horizon — inputs: baseline rate, minimum detectable effect (MDE), α, power, daily traffic. Output: required sample size per variation and estimated duration in days. Use this to plan a new experiment.
Power calculator — inputs: sample size, MDE. Output: achieved power at the end of the experiment. Use this to sanity-check how confident you can be in a null result.
Duration estimator — inputs: daily traffic, MDE. Output: estimated calendar duration. Use this when you care most about the timeline, not the exact sample size.

On Bayesian mode the calculator adds a prior input and reports duration to a target probability-to-beat-control. On Sequential mode it warns that the reported power is a lower bound (Sequential stopping can end the experiment earlier than the fixed-horizon calculation suggests).

At the top of the calculator you pick the measure: Conversion rate uses the classic sample-size formula; Rate uses the delta method; Percentile uses a bootstrap simulation against a historical sample; Composite uses a weighted-variance combination of components. Each measure renders the inputs it actually needs — the calculator never asks you for a baseline rate when you're sizing a percentile.

Can I change the engine mid-flight?

No. Once an experiment or flag rule is running, the engine is locked. Swapping engines during a live experiment is a classic p-hacking hazard — the engine that happens to look best becomes tempting. A vs B blocks it by design.

The Results page does include an Explore under dropdown that lets you re-render a past or present experiment under a different engine for exploratory analysis. The official result — the one in reports, exports, and audit logs — stays locked to the engine you chose at launch. See Comparing engines for how Explore-under and Compare-engines work.