This method has several good points:
- It can reasonably handle more than two options at once.. Eg, A, B, C, D, E, F, G, …
- New options can be added or removed at any time.
But the most enticing part is that you can set it and forget it.
The strategy that has been shown to win out time after time in practical problems is the epsilon-greedy method. We always keep track of the number of pulls of the lever and the amount of rewards we have received from that lever. 10% of the time, we choose a lever at random. The other 90% of the time, we choose the lever that has the highest expectation of rewards.
Update: a reaction by Visual WebSite Optimizer: Why multi-armed bandit algorithm is not “better” than A/B testing.
There’s a clear tradeoff between average conversion rate and the time it takes to detect statistical significance. Moreover, it is also clear that any advantages of multi-armed bandit algorithms vanish if conversion rate of different versions is similar. […]
So, comparing A/B testing and multi-armed bandit algorithms head to head is wrong because they are clearly meant for different purposes. A/B testing is meant for strict experiments where focus is on statistical significance, whereas multi-armed bandit algorithms are meant for continuous optimization where focus is on maintaining higher average conversion rate.