Learning Data Science: Day 19 - Naive Bayes

Haydar Ali Ismail
4 min readJan 20, 2017
The image is taken by Aaron Burden.

In the last 2 stories, we have talked about Bayes’ Theorem and its application. Today we are going to learn more about Naive Bayes classifier.

Prior, Posterior, Likelihood, and Evidence

To learn more about Bayes, I think it’s better if we understand the Bayes’ rule in a more technical way. So we didn’t have any problem when we try to learn more.

We have known Bayes’ rule without understanding the role of each variable. The simplified Bayes’ rule is quoted below.

Prior - Pr(A)

The key point of Prior is the data which we have prior (before) our observation. Let’s say from our previous case about spam, then the prior is the probability that a certain data point is a spam or not.

Posterior - Pr(A|X)

Posterior is the opposite of Prior which is the data which we have post (after) our observation. Let’s say from the case about spam, then the posterior is the probability that a certain email is a spam based on words (X).

Evidence - Pr(X)

The main thing about evidence is that we have known some emails which are spam and which are not spam. Thus, now we can get the probability of spam emails compared to a certain distribution set. This probability of something had occurred is called evidence.

Likelihood - Pr(X|A)

The meaning of likelihood is the probability that a certain thing will happen. Some people interchangeable this term with probability which is not correct. So, in the spam case it will be the probability of certain words that we assume it’s on spam shows up in a certain email.

Naive Bayes

To be honest, what Naive Bayes anyway? It is a classification technique where features are independence to each other. That means each feature does not related to other feature on the same class. For example, if we create a Naive Bayes classification for classifying fruits. An apple would have red color, round-shaped, and about 3 cm in radius. Even though for us it looks correlated, the Naive Bayes will ignore those correlation and assumes all features are independent.

How it Works

Let’s say we have the data of whether a football game is played or not based on the weather. Convert the data set into a frequency table and likelihood table.

Dataset, Frequency Table, and Likelihood Table

Let’s say we want to know the probability of whether a game will be played or not when we know that on that day is sunny.

If we convert the problem into an equation,

P(Yes|Sunny) = (P(Sunny|Yes) * P(Yes)) / P(Sunny)

We have,

P(Sunny|Yes) = 3/9 = 0.33

P(Yes) = 9/14 = 0.64

P(Sunny) = 5/14 = 0.36

So what about the P(Yes|Sunny)?

P(Yes|Sunny) = (0.33 * 0.64) / 0.36 = 0.60

Then that means most probably the game will be played when it is sunny.

Scikit-Learn Example

Scikit-Learn already provides the Naive Bayes function. Let’s see the example from the Scikit-Learn documentation.

Example of Gaussian Naive Bayes
Result of the snippet above

There are various types of Naive Bayes available in the Scikit-Learn. In the example, we use the Gaussian one which is suitable for a dataset with features that follow a normal distribution.

Final Words

Today we have discussed Naive Bayes and its implementation using Scikit-Learn. To enhance the quality of the stories, the next story about Learning Data Science will be available later in 30th of January due to harder materials to understand that I can learn each day. Thanks for reading and have a nice weekend!

References

--

--