Health Guardrails

Health guardrails are automatic checks that A vs B runs on your experiment data to alert you to potential problems. They appear on the Results page with green, yellow, or red indicators. A red guardrail does not mean your experiment is broken — it means you should investigate before trusting the results.

The three guardrails

Sample Ratio Mismatch (SRM)

Sample Ratio Mismatch occurs when the actual distribution of visitors across your variations does not match the expected distribution.

For example, if you configured a 50/50 split between control and variation, you would expect roughly equal numbers of visitors in each group over time. If the control has 10,000 visitors and the variation has only 7,000, that is a mismatch — something in your setup may be causing one variation to receive more or less traffic than intended.

A vs B flags an SRM when the actual split deviates from the expected split by more than ±5%. A split of 50.1%/49.9% is fine. A split of 58%/42% on a 50/50 experiment is an SRM.

Common causes of SRM

Bot traffic — bots may not execute JavaScript consistently and can be bucketed into one variation more than another.
Caching issues — if a CDN or server-side cache is serving cached versions of a page, some visitors may not see the experiment variation they were assigned to.
JavaScript errors— if a variation's code throws an error and prevents the snippet from completing, those visitors may not be recorded.
Experiment code modifying the URL — if variation code changes the URL and causes the URL targeting rules to no longer match, the experiment may stop running for those visitors.
Multiple snippets — having two copies of the A vs B snippet on a page can cause double-bucketing or missed bucketing.

A Sample Ratio Mismatch means the data is biased in some way. The results may show an apparent winner, but that winner could be an artifact of the mismatch rather than a genuine effect of the variation. Always investigate and resolve the root cause before drawing conclusions.

Statistical Confidence

This guardrail checks whether any variation has a winning probability above 80%. Below 80%, the experiment is considered to have low confidence — the data does not yet point strongly enough in any direction to provide meaningful signal.

Green — at least one variation is above 80% winning probability. The experiment is producing signal.
Yellow — the highest winning probability is between 60% and 80%. The experiment is trending but needs more data.
Red — no variation exceeds 60% winning probability. Results are essentially noise at this point.

Low confidence is normal early in an experiment. It does not mean something is wrong — it means you should keep running.

Data Quality

This guardrail checks whether any variation has fewer than 30 visitors. With very small sample sizes, conversion rates are extremely volatile — a single conversion can shift the rate by several percentage points, and the winning probability can swing wildly.

Green — all variations have at least 30 visitors. The data is stable enough to read.
Yellow — some variations are close to 30 visitors but have not reached it yet.
Red — at least one variation has fewer than 30 visitors. The results are not yet meaningful.

The 30-visitor threshold is the minimum needed for results to not be completely noise. For meaningful, trustworthy results you typically want hundreds of visitors per variation, not just 30. Use the Days Remaining card as a guide for when you will have enough data.

Green, yellow, red — what to do

All green — everything looks healthy. Check the winning probability and credible intervals to understand the results.

Yellow — the experiment is in a transitional state. Keep running and check back in a few days.

Red SRM — pause the experiment and investigate the cause. Do not ship a result based on mismatched data.

Red confidence or data quality — keep running. These will resolve naturally as traffic accumulates. There is nothing to fix unless traffic is unusually low.