Bomber

Current context

Calm

Game1

Windcalm

Turn0

You0 — 0AI

Reward signal

Bomb within 60px:

✓ alpha + 1 (+ neighbors)

✗ beta + 1

Direct hit: alpha + 3

3 HP per bomber

Why Thompson Sampling works

Thompson Sampling continuously learns and relearns. When conditions shift, it re-explores automatically.

In this game, wind changes, you move, and craters reshape terrain. Each is a contextual parameter the AI adapts to.

No rules. No retraining. Just Bayesian updating from sparse signals.

Attempt log

AI hasn't fired yet. Move and fire first.

AI posterior beliefs

context: wind calm

hover a bar for details

5%power →100%

Wind shifting

Fixed

Wind stays fixed. The AI only needs one context. Turn this on to see contextual learning in the Knowledge Surface below.

Play fair

OFF

The AI only sees success/failure signals. It never sees terrain, positions, or trajectories. Toggle this to play with the same blind constraint. You'll only see hit/miss distance after each shot. This mirrors real-world decision-making: sparse feedback, no full picture.

Penalty strength

Miss: +1.0β

Off-screen: +2.5β

Higher = faster learning but more risk of over-correction. The AI may abandon a good arm too quickly after a single unlucky miss.