So far, we have been talking about multi-armed bandits where the rewards are stochastic, generated independently and identically from a fixed unknown distribution for each arm. Today, we’ll look at a different setup: adversarial rewards. Instead of there being a distribution for each arm, we assume there is a hidden sequence for each arm i, ri,1, ..., ri,T . We observe ri,t if we pull arm i at ...