/Docs

Early stopping & peek protection

Frequentist p-values only hold their advertised guarantee when you look at the data once, at the pre-declared sample size. Stopping the experiment early because the result looks good — even unintentionally — quietly inflates the false-positive rate. A vs B gives you three layers of protection so you can't do it by accident.

Why peeking is a problem

A test designed for one final look at, say, α = 0.05has a 5% chance of crying “winner” when there's actually no effect. But if you check the result every day and stop the experiment the first time the p-value crosses 0.05, you're running many tests, not one. Each peek gets its own roll of the dice. The real false-positive rate goes up, sometimes from 5% to 15-20% over a two-week experiment.

The fix is structural: don't make stopping decisions on a Frequentist test before its target sample size. (If you genuinely need to peek and stop, that's what the Sequential engine is for — it uses always-valid inference, designed to survive continuous monitoring.)

Layer 1 — the status banner

While a Frequentist experiment is still accumulating sample, the results page shows a persistent info banner at the top:

Day 5 of 14 · 3,400 of 10,000 visitors · not yet valid for stopping decisions.

The banner updates as data accumulates and disappears the moment you reach your target sample size (set by the sample-size calculator) or your scheduled end date. Bayesian and Sequential experiments don't see the banner — they don't need it.

Layer 2 — the blocking modal

If you try to pause, stop, or declare a winning variationon a Frequentist experiment before it has reached its target, A vs B interrupts the action with a modal. The modal explains, in plain English, what stopping early would do to your false-positive rate — for example, “at your current sample, the real false-positive rate is approximately 13%, about 2.6× higher than the 5% you configured.”

You then have three choices:

  • Let it run. Dismiss the modal and keep collecting data. Nothing changes.
  • Stop anyway and log. Proceed with the action — but the experiment record is stamped “Early-stopped under Frequentist · validity reduced.” The decision is captured in the audit log along with an optional reason, the timestamp, and your user identity. Future readers of the result see a small badge next to the engine name; CSV exports include the same information in a footer block.
  • Switch future experiments to Sequential. Updates your project default. The current experiment continues unchanged; new experiments inherit the Sequential engine.

Layer 3 — the audit stamp

Once you proceed with “Stop anyway,” A vs B writes five things:

  1. Experiment.earlyStoppedUnderFrequentist = true — surfaced as a badge on the results page.
  2. Experiment.earlyStopReason— whatever you typed into the modal's reason field, optional.
  3. Experiment.earlyStopAt — the timestamp the override was confirmed.
  4. Experiment.earlyStopByUserId — the user identity who chose to stop.
  5. A new EARLY_STOPPED entry in the org's audit log, with metadata recording the engine, the target sample size, the reason, and how the override was triggered (from a Pause, a Stop, or a Declare-Winner action).

The same fields exist on flag rules. Concluding a Frequentist A/B Test rule early stamps the rule and writes the same audit entry against the rule resource type.

Stopping early is a real choice — but it should be a deliberate one

The audit stamp doesn't prevent you from making the call. It just makes sure you, your team, and any auditors reading the result later can see how the decision was made. Reduced statistical validity isn't a hidden cost — it's a labelled one.

Exports and history

When you export a Frequentist experiment's results to CSV, an early-stopped record gets a footer block at the bottom of the file:

Early stop
"Stopped under Frequentist with reduced validity"
"Reason: Hit p<0.05 at week 2"
"Stopped at: 2026-04-25T10:00:00Z"

The org-wide audit log filters by the new Early Stopped action. Filter to it to see every Frequentist override across your org, ordered by date.

What does not trigger peek protection

  • Resuming a paused experiment.Resuming doesn't end the experiment, so there's nothing to gate.
  • Deleting an experiment.Destructive, but the data is gone — there's nothing to cherry-pick from a deleted record.
  • Exporting results, viewing the dashboard, or fetching results via the API. Passive viewing is harmless. Peek protection is about the moment you make a stopping decision, not about looking at numbers.
  • Bayesian and Sequential experiments. Bayesian inference is robust to peeks by design; Sequential analysis is built specifically to make stopping decisions safe. Neither surfaces the banner or the modal.