Sequential Engine
The Sequential engine reports an always-valid p-value and a 95% confidence sequence. It's designed for one specific situation: you want to look at the results every day and stop the experiment the moment the evidence is in, without inflating your false-positive rate.
What Sequential analysis actually does
A Frequentist p-value is only valid once, at the pre-declared sample size. A vs B's Sequential engine sidesteps that constraint by computing a confidence interval that is valid at every sample size simultaneously — what statisticians call an always-valid or anytime-valid interval. The technical name is a confidence sequence (CS).
A vs B uses the asymptotic always-valid confidence sequence(AsympCS) of Howard, Ramdas, McAuliffe, and Sekhon (2021), specialised to the difference of means / proportions via the Waudby-Smith et al. (2024) time-uniform CLT. It's the same method Netflix ships in production.
In plain English: the engine widens the interval enough to cover every peek you might take, not just one. As more data comes in, the interval narrows. You can stop the moment it excludes zero.
With Sequential analysis, looking at your dashboard every morning is fine. Stopping the experiment the moment the result looks good is also fine. The math has already paid for the privilege.
Reading a Sequential result
On the Results page, a Sequential experiment's Significance column shows one of three states:
- ✓ Safe to stop — the always-valid 95% CS no longer contains zero. The variation is statistically distinguishable from Control. You can ship, conclude, or pause the experiment with confidence.
- Inconclusive — the CS still contains zero. There may or may not be a real effect; keep collecting data, or stop and accept the inconclusive result.
- — (em-dash) — not enough data yet. Common in the first few hundred visitors per arm.
The Confidence Interval column is labelled Always-valid 95% CSfor Sequential experiments to distinguish it from a Frequentist Wald CI. The numbers inside read the same way — they're plausible values for the true effect size — but the guarantee is stronger: this interval is valid no matter how often you peek.
The stopping rule
The Sequential workflow is simple:
- Launch the experiment.
- Open the results page whenever you want.
- If the Significance column reads ✓ Safe to stop, you can declare a winner and ship.
- If it's still Inconclusive, keep waiting — or stop the experiment and accept that the data didn't separate.
There's no peek-protection banner, no blocking modal, no audit stamp on early stops. Sequential is the engine that makes those layers unnecessary.
Why intervals are slightly wider than Frequentist
Sequential confidence sequences are strictly wider than Frequentist Wald confidence intervals at the same sample size. That's not a bug — it's the price of being able to peek. The AsympCS bound includes a multiplier that grows just slightly faster than the classical zα/2, which keeps the time-uniform guarantee intact.
Practically, this means a Sequential experiment needs roughly 1.5×–2× the visitors of a comparable Frequentist experiment to detect the same effect at the same confidence level — if you wait the full duration. But because Sequential lets you stop the moment the evidence is in, the actual sample size you collect is often smaller, especially when the effect is large.
The sample-size calculator's Sequential mode reports a planning estimate. Treat it as a conservative upper bound: real Sequential experiments routinely stop earlier.
Multiple variations
A vs B does not apply multiple-comparison correction (MCC) to Sequential experiments by default, even with more than two variations. The always-valid guarantee is per-comparison; correction across many comparisons is a separate problem with no consensus solution in the anytime-valid literature.
If you're running an experiment with several variations and you want stronger per-experiment-wise guarantees, the Frequentist engine with Bonferroni or Benjamini-Hochberg correction is the established choice. See Frequentist: multiple-variations and corrections.
When to pick Sequential
- You want to peek freely.Stakeholders ask "how is the test doing?" every other day, and you don't want every glance to require a disclaimer.
- You want to ship as soon as the answer is in. Long-running experiments have a real cost — opportunity cost on the winning variation, engineering cost on holding a feature flag in flight, traffic cost on whichever variation turns out to be worse. Sequential lets you cash out as soon as the evidence is conclusive.
- Traffic is unpredictable. Pre-committing to a sample size feels wrong when daily traffic could swing 5× over the next month.
- You want classical-flavoured output without the planning constraint.If your team thinks in p-values and confidence intervals but you don't want to pay the peek-protection tax, Sequential is the closest equivalent.
When not to pick Sequential
- A regulator requires a fixed-α, pre-declared-sample-size trial.Pharma, finance, and healthcare workflows often map directly onto Frequentist; Sequential's anytime-valid guarantee is a different (stronger) property and reviewers may not be familiar with it.
- You explicitly want a probability-to-beat-control number.That's the Bayesian engine's output, not Sequential.
- You're running many parallel comparisons and need MCC. Sequential + MCC is a research-frontier problem; for now, use Frequentist with Benjamini-Hochberg.
Where to configure
Set the engine in one of three places:
- Project level. Project Settings → Analysis sets the default for every new experiment and feature-flag A/B test rule in the project.
- Per experiment.In the experiment builder's Analysis Overrides card, before launch. Once the experiment is running, the engine is locked — switching engines mid-flight would invalidate the always-valid guarantee.
- Per flag rule.In the feature-flag A/B Test rule editor's Analysis section, before the rule is enabled. Same lock-on-launch behaviour as experiments.
Further reading
- Howard, Ramdas, McAuliffe & Sekhon (2021). Time-uniform, nonparametric, nonasymptotic confidence sequences. Annals of Statistics, 49(2). The AsympCS paper.
- Waudby-Smith, Arbour, Sinha, Kennedy & Ramdas (2024). Time-uniform central limit theory and asymptotic confidence sequences. The specialisation A vs B uses for mean and proportion differences.
- Netflix Tech Blog, Sequential Testing Keeps the World Streaming Netflix. A practical writeup of the same family of methods used here.
For a plain-English comparison of all three engines, see Choosing a Stats Engine.