Have you ever wondered how your email provider implements spam filtering? Or how online news channels perform news text classification? Or how companies perform sentiment analysis of their audience on social media?

With a machine learning algorithm called a Naive Bayes classifier, you can do all of these things.

Get an in-depth understanding of the Machine Learning algorithm with the Machine Learning Training Course. Click to check out the course preview. |

This tutorial covers the following topics:

- What is Naive Bayes?
- Naive Bayes and machine learning
- Where is Naive Bayes used?
- Why do we need Naive Bayes?
- Understanding a Naive Bayes classifier
- Demo: text classification using Naive Bayes

Let's start with a basic introduction to the Bayes theorem, named after Thomas Bayes from the 1700s. The Naive Bayes classifier works on the principle of conditional probability, as given by the Bayes theorem.

Let us go through some of the simple concepts of probability that we will use. Consider the following example of tossing two coins. If we toss two coins and look at all the different possibilities, we have the sample space as:{HH, HT, TH, TT}

While calculating the math on probability, we usually denote probability as P. Some of the probabilities in this event would be as follows:

- The probability of getting two heads = 1/4
- The probability of at least one tail = 3/4
- The probability of the second coin being head given the first coin is tail = 1/2
- The probability of getting two heads given the first coin is a head = 1/2

The Bayes theorem gives us the conditional probability of event A, given that event B has occurred. In this case, the first coin toss will be B and the second coin toss A. This could be confusing because we've reversed the order of them and go from B to A instead of A to B.

According to Bayes theorem:

Let us apply Bayes theorem to our coin example. Here, we have two coins, and the first two probabilities of getting two heads and at least one tail are computed directly from the sample space.

Now in this sample space, let A be the event that the second coin is head, and B be the event that the first coin is tails. Again, we reversed it because we want to know what the second event is going to be.

We're going to focus on A, and we write that out as a probability of A given B:

Probability = P(A|B)

= [ P(B|A) * P(A) ] / P(B)

= [ P(First coin being tail given the second coin is the head) * P(Second coin being

head) ] / P(First coin being tail)

= [ (1/2) * (1/2) ] / (1/2)

= 1/2 = 0.5

Bayes theorem calculates the conditional probability of the occurrence of an event based on prior knowledge of conditions that might be related to the event.

Like with any of our other machine learning tools, it's important to understand where the Naive Bayes fits in the hierarchy.

Machine learning falls into two categories:

- Supervised learning
- Unsupervised learning

Supervised learning falls into two categories:

- Classification
- Regression

Naive Bayes algorithm falls under classification.

You can use Naive Bayes for the following things:

As a classifier, it is used to identify the faces or its other features, like nose, mouth, eyes, etc.

It can be used to predict if the weather will be good or bad.

Doctors can diagnose patients by using the information that the classifier provides. Healthcare professionals can use Naive Bayes to indicate if a patient is at high risk for certain diseases and conditions, such as heart disease, cancer, and other ailments.

With the help of a Naive Bayes classifier, Google News recognizes whether the news is political, world news, and so on.

As the Naive Bayes Classifier has so many applications, it’s worth learning more about how it works.

Based on the Bayes theorem, the Naive Bayes Classifier gives the conditional probability of an event A given event B.

Let us use the following demo to understand the concept of a Naive Bayes classifier:

Problem statement: To predict whether a person will purchase a product on a specific combination of day, discount, and free delivery using a Naive Bayes classifier.

Under the day, look for variables, like weekday, weekend, and holiday. For any given day, check if there are a discount and free delivery. Based on this information, we can predict if a customer would buy the product or not.

See a small sample data set of 30 rows, with 15 of them, as shown below:

Based on the dataset containing the three input types—day, discount, and free delivery— the frequency table for each attribute is populated.

For Bayes theorem, let the event ‘buy’ be A and the independent variables (discount, free delivery, day) be B.

Let us calculate the likelihood for one of the “day” variables, which includes weekday, weekend, and holiday variables.

We get a total of:

11 weekdays

Eight weekends

11 holidays

The total number of days adds up to 30 days.

There are nine out of 24 purchases on weekdays

There are seven out of 24 purchases on weekends

There are eight out of 24 purchases on holidays

Based on the above likelihood table, let us calculate some conditional probabilities:

P(B) = P(Weekday)

= 11/30

= 0.37

P(A) = P(No Buy)

= 6/30

= 0.2

P(B | A)

= P(Weekday | No Buy)

= 2/6

= 0.33

P(A | B)

= P(No Buy | Weekday)

= P(Weekday| No Buy) * P(No Buy) / P(Weekday)

= (0.33 * 0.2) / 0.37

= 0.18

The probability of purchasing on the weekday = 11/30 or 0.37

It means out of the 30 people who came into the store throughout the weekend, weekday, and holiday, 11 of those purchases were made on weekdays.

The probability of not making a purchase = 6/30 or 0.2. There's a 20 percent chance that they're not going to make a purchase, no matter what day of the week it is.

Finally, we look at the probability of B (i.e., weekdays) when no purchase occurs.

The probability of the weekday without a purchase = 0.18 or 18 percent. As the probability of ( No | Weekday) is less than 0.5, the customer will most likely buy the product on a weekday. Next, let’s see how the table and conditional probabilities work in the Naive Bayes Classifier.

We have the frequency tables of all three independent variables, and we can construct the tables for all the three variables.

See the likelihood tables for the three variables below:

The likelihood tables can be used to calculate whether a customer will purchase a product on a specific combination of the day when there is a discount and whether there is free delivery. Consider a combination of the following factors where B equals:

- Day = Holiday
- Discount = Yes
- Free Delivery = Yes

Let us find the probability of them not purchasing based on the conditions above.

A = No Purchase

Applying Bayes Theorem, we get P(A | B) as shown:

Similarly, let us find the probability of them purchasing a product under the conditions above.

Here, A = Buy

Applying Bayes Theorem, we get P(A | B) as shown:

From the two calculations above, we find that:

Probability of purchase = 0.986

Probability of no purchase = 0.178

Finally, we have a conditional probability of purchase on this day.

Next, normalize these probabilities to get the likelihood of the events:

Sum of probabilities = 0.986 + 0.178 = 1.164

Likelihood of purchase = 0.986 / 1.164 = 84.71 percent

Likelihood of no purchase = 0.178 / 1.164 = 15.29 percent

Result: As 84.71 percent is greater than 15.29 percent, we can conclude that an average customer will buy on holiday with a discount and free delivery.

After understanding how Naive Bayes Classifier works, we can explore its benefits.

The following are some of the benefits of the Naive Bayes classifier:

- It is simple and easy to implement
- It doesn’t require as much training data
- It handles both continuous and discrete data
- It is highly scalable with the number of predictors and data points
- It is fast and can be used to make real-time predictions
- It is not sensitive to irrelevant features

How much do you know about the algorithm called Naive Bayes? Try answering these Machine Learning Multiple Choice Questions and find out now. |

Text classification is one of the most popular applications of a Naive Bayes classifier.

Problem statement: To perform text classification of news headlines and classify news into different topics for a news website.

Watch the video to see how to use a Naive Bayes classifier for text classification -

Machine learning has created a drastic impact in every sector that has integrated it into their business processes. Industries like education, healthcare, retail, manufacturing, banking services, and more have already started investing in their initiatives involving machine learning. So why not seize upon the opportunities for growth enabled by machine learning? Enroll in a Machine Learning Certification course today and harness the power.

Name | Date | Place | |
---|---|---|---|

Machine Learning | 1 Mar -4 Apr 2020, Weekend batch | Your City | View Details |

Machine Learning | 7 Mar -11 Apr 2020, Weekend batch | San Francisco | View Details |

Machine Learning | 15 Mar -2 Apr 2020, Weekdays batch | New York City | View Details |

Simplilearn is one of the world’s leading providers of online training for Digital Marketing, Cloud Computing, Project Management, Data Science, IT, Software Development, and many other emerging technologies.

Machine Learning

7125 Learners

Lifetime Access*

Data Science Certification Training - R Programming

13327 Learners

Lifetime Access*

*Lifetime access to high-quality, self-paced e-learning content.

Explore Category Next Article

66429Dec 23, 2019

- Ebook
Machine Learning Career Guide: A complete playbook to becoming a Machine Learning Engineer

- Article
What is Machine Learning and Its Importance

- Article
Machine Learning Tutorial: Types, Usecases and Choosing the Right One

- Ebook
IDC Study: “How Blended Learning Impacts Training Success”

- Article
Decision Trees in Machine Learning: Approaches and Applications

- Article
Top 34 Machine Learning Interview Questions & Answers for 2020

- Disclaimer
- PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.