Multi-Armed Bandits
Bandit rules let the platform automatically shift traffic toward better-performing variations as reward data accumulates, rather than waiting for a fixed experiment to reach significance. You configure the algorithm, attach a reward metric, and the SDK serves each user the variation most likely to maximise that metric.
What it is
A multi-armed bandit is an adaptive algorithm that balances two competing goals: exploration — trying variations that have not been measured enough to be confident about — and exploitation — serving the variation that currently looks best. Unlike an A/B test, which splits traffic equally for the entire experiment duration, a bandit updates its model continuously and re-weights allocations as data arrives.
A vs B supports three algorithms, all operating on the same BanditConfig shape attached to a flag rule:
- Epsilon-greedy: serves the best known variation with probability
1 - explorationRate, and a random variation with probabilityexplorationRate. Simple and predictable. - Thompson sampling:models each variation's reward as a Beta(α, β) distribution and samples from it at decision time. Naturally explores more when estimates are uncertain and converges quickly once a clear winner emerges.
- UCB1: selects the variation with the highest upper confidence bound:
mean + sqrt(2 ln N / n_i). Zero random exploration — every allocation is principled. Suited to deterministic environments.
When to use it
Prefer a bandit rule over a standard A/B test when:
- You have several content variations (e.g., five hero images, three email subject lines) and do not care about statistical inference — you just want the best one served as quickly as possible.
- The cost of serving a losing variation is high (e.g., real money, user churn) and you want to minimise regret rather than wait for a fixed experiment to conclude.
- You are running a recommendation or personalisation use case where contextual attributes (user segment, page context) should influence which variation is optimal.
Stick with a standard A/B test when you need clean statistical significance, want to measure multiple metrics simultaneously, or are doing primary metric analysis for a business decision that requires rigorous inference.
How it works
The bandit configuration lives in the flag's rule, and the reward model is stored server-side and refreshed on each datafile update. The SDK reads the current model snapshot and selects an action (variation) at evaluation time:
1// BanditConfig shape (part of the flag rule in the datafile)2interface BanditConfig {3 algorithm: 'epsilon-greedy' | 'thompson-sampling' | 'ucb1'4 explorationRate?: number // only for epsilon-greedy (0–1)5 rewardMetric: string // event key tracked via client.track()6 actions: Array<{7 id: string8 variationId: string9 contextAttributes?: Record<string, number | string>10 }>11}12
13// BanditModel snapshot — produced by offline training, read by SDK14interface BanditModel {15 version: string16 algorithm: 'epsilon-greedy' | 'thompson-sampling' | 'ucb1'17 perAction: Record<string, { mean: number; variance: number; samples: number }>18}The decision is logged as a BanditDecisionLogEntry which extends the standard DecisionLogEntry with the action ID, model version, the probability assigned to the action, and the optimality gap (epsilon at decision time for epsilon-greedy; null for UCB1).
Per-SDK usage
1import { AvsbClient } from '@avsbhq/browser'2
3const client = new AvsbClient({ sdkKey: 'sdk_production_abc123' })4await client.onReady()5
6// Evaluation — same API as any flag7const flag = client.getFlag('hero-image-bandit', 'control')8// flag.source === 'bandit', flag.variationKey === 'variant-b' (example)9
10// Track the reward metric the bandit is optimising on11document.querySelector('.cta')?.addEventListener('click', () => {12 client.track('hero_cta_click')13})1import { AvsbServer } from '@avsbhq/node'2
3const server = new AvsbServer({ sdkKey: process.env.AVSB_SDK_KEY })4await server.onReady()5
6// Server-side evaluation with context7const flag = server.getFlag('pricing-plan-bandit', 'starter', userContext)8
9// Track reward event (e.g., after a plan upgrade)10server.track('plan_upgrade', { value: 49, context: userContext })1from avsb import AvsbServer2
3server = AvsbServer(sdk_key=os.environ["AVSB_SDK_KEY"])4server.wait_for_ready()5
6# Evaluate — identical API to non-bandit flags7flag = server.get_flag("email-subject-bandit", "control", context)8
9# Track reward10server.track("email_open", context=context)1import cloud.avsb.AvsbServer;2import cloud.avsb.core.EvalContext;3
4AvsbServer server = AvsbServer.builder()5 .sdkKey(System.getenv("AVSB_SDK_KEY"))6 .build();7server.blockUntilReady(Duration.ofSeconds(5));8
9Flag<String> flag = server.getFlag("email-subject-bandit", "control", ctx);10
11// Track reward metric12server.track("email_open", ctx);