/Docs

Reading Your Results

The Results page gives you a complete picture of how your experiment is performing. It is organized into a summary section at the top, a detailed variation performance table in the middle, and metric breakdowns below. Here is a walk-through of every element and what it is telling you.

Analysis label

Near the top of every Results page you'll see a small pill showing how this experiment was analysed — for example Bayesian · Auto (CUPED applied) or Bayesian · Off. Hover the pill for a plain-English explanation of what the label means and, when variance reduction was applied, how much noise was removed from the numbers.

When variance reduction is applied to a specific metric, that metric's lift in the Secondary Metrics list also gets a hover tooltip showing the θ and the percentage variance reduction for that metric specifically. See Variance Reduction (CUPED) for a deeper explanation.

Summary cards

Four summary cards appear at the top of the Results page, each giving you a high-level answer to a key question about your experiment.

Winning Probability

This card shows the probability that the leading variation is genuinely better than the control. A value of 95% or higher is the threshold for calling a winner. Values below 80% are generally considered inconclusive — you need more data. See Winning Probability for a full explanation.

Observed Lift

Observed Lift is the relative improvement in conversion rate between the best-performing variation and the control. For example, if the control converts at 3% and a variation converts at 3.6%, the observed lift is +20%. This tells you the magnitude of the difference, not just whether one is better.

Revenue Impact

Revenue Impact appears when your experiment includes a metric with revenue data. It estimates how much additional revenue the winning variation would generate per month compared to the control, based on your current traffic levels and the observed revenue-per-visitor difference. This is an estimate — it assumes current traffic levels continue and the observed lift holds.

Days Remaining

Days Remaining is an estimate of how much longer the experiment needs to run before reaching a statistically meaningful result. It is based on the current traffic volume, the observed effect size, and the target winning probability threshold. If the experiment is already past the threshold, this card shows that the experiment is ready to call.

Variation performance table

The variation table shows one row per variation, including the control. Each row contains the following columns:

Visitors

The total number of unique visitors enrolled in this variation. A vs B counts each visitor once, regardless of how many sessions they had or how many pages they visited during the experiment.

Conversions

The number of visitors in this variation who completed the primary metric. Each visitor can only convert once.

Conversion Rate

Conversions divided by Visitors, expressed as a percentage. This is the core performance number for each variation.

Improvement vs Control

The relative change in conversion rate compared to the control variation. A positive number means the variation is converting better than the control. A negative number means it is converting worse. The control row always shows 0% since it is the baseline.

Credible Interval

The credible interval (sometimes shown as a confidence interval range) represents the range within which the true conversion rate of this variation most likely falls. A narrower interval means more certainty about the result. A wider interval means you need more data. If the credible interval for a variation's improvement overlaps with zero, the difference from control is not yet conclusive.

Probability to Beat Control

The probability that this specific variation is genuinely better than the control. This is the key number for decision making. Values above 95% indicate strong evidence that the variation is better.

Significant badge
Variations that have exceeded the 95% probability threshold are marked with a Significant badge in the results table. This is your signal that you have enough evidence to make a decision.

Metric sections

Below the variation table, you will find collapsible sections for each metric attached to the experiment — first the primary metric, then each secondary metric in order. Each section shows the same variation table breakdown for that specific metric.

Expanding secondary metric sections gives you supporting context. For example, your primary metric might be purchases, while a secondary metric is add-to-cart clicks. If purchases went up but add-to-cart also went up proportionally, that supports the result. If purchases went up but add-to-cart went down, something unexpected may have changed in the user journey.

Info
The health guardrails section appears above or alongside the results table when there are potential data quality issues. Always check for health warnings before making a decision on your experiment. See Health Guardrails for details.

Measure badge

Every metric row — primary and secondary — shows a small measure pill next to the metric name. The pill reads in plain English (Unique conversions per visitor, Total events, Value per visitor, Rate, Composite (weighted)) so you can scan the analysis kind without opening the metric configuration. Percentile measures with a specific percentile render as p95 of value; winsorized rows append a small · Capped suffix to signal outlier capping is active.

Reading ratio metric rows

Ratio metric rows show per-visitor ratios — for example, revenue per visitor. Point estimates render in the metric's units (e.g., $1.42 / visitor for control, $1.58 / visitor for variant). Lift is reported as a percentage with a confidence interval computed using the delta method for the frequentist engine and a normal approximation posterior for the Bayesian engine. A small badge on the row reads Ratio (delta method) so you can see at a glance which variance estimator is in play.

If the row shows a non-zero dropped count in the diagnostic, those are visitors with a zero denominator — they were excluded from the variant because they had no exposure to the ratio (e.g., revenue per visitor where no qualifying purchase event was recorded). The count is surfaced so you can sanity-check the exclusion. See Ratio metrics for the underlying math.

Reading quantile metric rows

Quantile metric rows show the percentile in question (e.g., p90 of page-load time) along with control and variant point estimates in the metric's units. Lift is a percentage; the CI comes from a bias-corrected percentile bootstrap (1000 resamples by default). The row badge reads Quantile (bootstrap CI).

A Show both CIstoggle on quantile rows displays the bootstrap CI alongside a normal-approximation CI — useful for seeing parametric / non-parametric agreement at extreme percentiles. For percentiles above p95, the row also shows the gap between ClickHouse's quantileTDigest (used for the point estimate) and quantileExact as a transparency indicator — a large gap warns that t-digest approximation may matter at small samples.

For very large experiments (N > 100k per variant), the bootstrap falls back to a t-digest-sampled subset and the row notes approximate (sampled bootstrap). From the row you can request a high-precision recompute (5000 resamples) that runs asynchronously. See Quantile metrics for details.

Reading composite metric rows

Composite metric rows show the weighted composite point estimate per variation, lift as a percentage, and a CI from the combined weighted variance (which uses pairwise component covariance — correlated components don't double-count). The row badge reads Composite (weighted).

Click the Decompositiondisclosure on a composite row to expand a per-component breakdown: each component metric's own lift, its CI, and its weighted contribution to the composite. The decomposition is the answer to why a composite moved — if the composite is up but one component is flat or negative, the Decomposition makes that obvious instead of hiding it inside the headline number.

If a component metric is paused or archived mid-experiment, the composite row flags component unavailable and you are prompted to amend the analysis plan (via the sealed-plan amendment flow) before results recompute. The composite is never silently re-weighted onto the remaining components. See Composite metrics for the underlying math.