Learning Data Science: Day 20 - Bayesian Bandit Problem

4 min readJan 30, 2017

In our last story, we have been talking about Naive Bayes in concept. Today, we are going to talk about Bayesian Bandit, one of the famous problems in Bayesian Statistics.

Bayesian Bandit

Imagine you are at a casino and have a choice between N slot machines or also called Bandit. Each of those slot machines has an unknown probability of giving a reward. So, the probabilities of those Bandits are independent to each other and it is currently unknown. The objective is to search for the best slot machine that gives the highest probability of winning to maximized our winnings.

The difficulty in this task is to determine the accurate probability of a slot machine because it needs a large number of samples to get the most accurate probability. Unfortunately, we can’t run too much experiment on a low probability machine because we can’t maximize our winnings. So, we will need to find the best machine as fast as possible.

Explore or Exploit

The main problem in Bayesian Bandit is to decide whether we explore other slot machines to search for a better slot machine or to exploit the current slot machine to maximize our chance of winning more. The solution is to use Bayes’ rule.

Solution

P(θ|x)= P(x|θ) * P(θ) / P(x)

P(x|θ) is the likelihood. It is a probability of observing the data x given our current belief about parameters θ).
P(θ) is the prior. It is the assumption of the parameter.
P(θ|x) is the posterior. It is the updated assumption of the parameter after observing data x.

Usually, we will use Beta distribution. It has 2 parameters: a and b. We refer it as Beta(a, b). As an initial value, we initialize a and b with 1. This resulting in a uniform distribution. Because we don’t have any data before, we assume nothing. This is called minimally informative prior. To update the posterior we use an updated distribution Beta(a+x, b+1-x).

In short, here are steps to solve Bayesian Bandits problem.

For each trial do the following:

Take a random sample from each slot machine with the current a and b value.
Use the slot machine with the largest sample.
Update that slot machine with the data from the last usage.

Visualization

Here are some visualizations regarding the way we solve Bayesian Bandit problem (the visualizations originally came from this post).

Notice that after hundred of trials, the slot machine with red line distribution has the highest probability of winning, getting sharper as the number of trials increases.

The problem is that the slot machine with red line distribution already give fact that it is worth to use overtime the soonest compared to the other slot machine. Because of that, we are less concern to choose other slot machines because based on the samples the certain slot machine seems much more capable than the others.

The easiest way to decide whether we have to change the slot machine or not is based on the line distribution. If the slot machine became ‘fatter’, then it means we are better to explore more slot machine. If the slot machine became ‘sharper’, then it means it has a better odd to win more. Naturally, we will stick with the ‘sharper’ slot machine.

Wrap Up

Today, we have talked about one of the famous Bayesian problem called the Bayesian Bandit problem. Hopefully, this story can help you understand easier of how this problem behaves and how to generate the solution. Thanks for reading.

References

CS109 Data Science

Learning from data in order to gain useful predictions and insights. This course introduces methods for five key facets…

cs109.github.io

Bayesian Bandit Tutorial - Lazy Programmer

Objective To explain and provide code for the Bayesian bandit problem, without too much math. Motivation Also known as…

lazyprogrammer.me