Winning Probability

Winning probability is the single most important number on the Results page. It tells you how confident the statistical model is that a variation is genuinely better than the control — not just better by random chance. Here is how to interpret it and when to act on it.

What it means

When the Results page shows a variation with a winning probability of 95%, it means: based on all the data collected so far, there is a 95% probability that this variation has a genuinely higher true conversion rate than the control.

There is still a 5% chance the observed difference is a fluke — that the control is actually just as good or better — but the evidence strongly points toward the variation winning. A 95% probability is a high bar, and most business decisions can be made confidently at this level.

Conversely, a probability of 60% means: there is a 60% chance the variation is better, but also a 40% chance it is not. That is barely better than a coin flip — you need more data.

The 95% threshold

By default, A vs B uses 95% as the threshold for calling a winner. When a variation crosses this threshold, it is marked with a Significant badge in the results table and the Winning Probability summary card highlights the leading variation.

You can adjust this threshold in your project settings. Some teams prefer a lower threshold (such as 90%) when moving fast and the cost of a wrong decision is low. Others use a higher threshold (such as 99%) for changes with high business risk or irreversible consequences.

The Significant badge

The Significant badge appears next to a variation in the results table when its probability to beat control has exceeded the configured threshold. This badge is your at-a-glance signal that there is enough evidence to declare a winner.

The badge does not automatically stop the experiment. You must decide to conclude the experiment yourself — either by stopping it through the experiment management interface or by letting it continue to gather more data.

When to call a winner

The winning probability crossing 95% is a necessary condition for calling a winner, but not the only one. Before making a final decision, also verify:

The experiment has run long enough. Results collected in the first few days of an experiment can be misleading due to novelty effects (visitors clicking on something simply because it is different) or day-of-week bias. Try to run experiments for at least one full week, and ideally two, so you cover a complete business cycle.
No health guardrails are red. Check the health guardrails section for any warnings about sample ratio mismatch or data quality issues. A significant result on corrupted data is not a reliable result.
Secondary metrics look sensible. If your primary metric improved but a related secondary metric got worse, investigate before shipping. For example, if purchases went up but so did refunds, that warrants a closer look.
The effect size is meaningful. A 0.1% improvement in conversion rate with 99% winning probability might not be worth the engineering effort to ship. Consider whether the magnitude of the lift justifies implementing the change.

Why you should not stop too early

Even with a Bayesian model — which handles peeking better than traditional frequentist tests — stopping an experiment the moment it first crosses 95% can inflate your false positive rate. Early in an experiment, random variation can push a probability above 95% briefly before settling back down as more data arrives.

A practical rule: once the winning probability crosses 95%, wait at least 24–48 more hours to confirm it stays above the threshold. If it drops back below, the experiment needs more data. If it remains above, you can be more confident in the result.

Running an experiment for at least one full business cycle (7–14 days) is the single most impactful thing you can do to get reliable results, regardless of what the winning probability says. Traffic patterns vary by day of the week, and an experiment run only on weekdays may look very different from one that runs over a full week.

If your experiment concludes without a winner — the winning probability never reached the threshold — that is a valid result too. It means the two variations performed similarly. Sometimes knowing that a change does not improve (or hurt) conversion is exactly the information you need to prioritize other work.