Understanding Naive Bayes Classifier

Have you ever wondered how your email provider implements spam filtering? Or how online news channels perform news text classification? Or how companies perform sentiment analysis of their audience on social media?

With a machine learning algorithm called a Naive Bayes classifier, you can do all of these things. 

Get an in-depth understanding of the Machine Learning algorithm with the Machine Learning Course. Click to check out the course preview. 

What is Naive Bayes?

Let's start with a basic introduction to the Bayes theorem, named after Thomas Bayes from the 1700s. The Naive Bayes classifier works on the principle of conditional probability, as given by the Bayes theorem.

Let us go through some of the simple concepts of probability that we will use. Consider the following example of tossing two coins. If we toss two coins and look at all the different possibilities, we have the sample space as:{HH, HT, TH, TT}

While calculating the math on probability, we usually denote probability as P. Some of the probabilities in this event would be as follows:

  • The probability of getting two heads = 1/4 
  • The probability of at least one tail = 3/4 
  • The probability of the second coin being head given the first coin is tail = 1/2 
  • The probability of getting two heads given the first coin is a head = 1/2

The Bayes theorem gives us the conditional probability of event A, given that event B has occurred. In this case, the first coin toss will be B and the second coin toss A. This could be confusing because we've reversed the order of them and go from B to A instead of A to B.

PCP in AI and Machine Learning

In Partnership with Purdue UniversityExplore Course
PCP in AI and Machine Learning

According to Bayes theorem:


Let us apply Bayes theorem to our coin example. Here, we have two coins, and the first two probabilities of getting two heads and at least one tail are computed directly from the sample space.

Now in this sample space, let A be the event that the second coin is head, and B be the event that the first coin is tails. Again, we reversed it because we want to know what the second event is going to be. 

We're going to focus on A, and we write that out as a probability of A given B:

Probability = P(A|B)

= [ P(B|A) * P(A) ] / P(B)

= [ P(First coin being tail given the second coin is the head) * P(Second coin being

head) ] / P(First coin being tail)

= [ (1/2) * (1/2) ] / (1/2)

= 1/2 = 0.5

Bayes theorem calculates the conditional probability of the occurrence of an event based on prior knowledge of conditions that might be related to the event. 

Like with any of our other machine learning tools, it's important to understand where the Naive Bayes fits in the hierarchy. 

Free Course: Machine Learning Algorithms

Learn the Basics of Machine Learning AlgorithmsEnroll Now
Free Course: Machine Learning Algorithms

Where is Naive Bayes Used?

You can use Naive Bayes for the following things:

Face Recognition

As a classifier, it is used to identify the faces or its other features, like nose, mouth, eyes, etc.

Weather Prediction 

It can be used to predict if the weather will be good or bad.

Medical Diagnosis 

Doctors can diagnose patients by using the information that the classifier provides. Healthcare professionals can use Naive Bayes to indicate if a patient is at high risk for certain diseases and conditions, such as heart disease, cancer, and other ailments. 

News Classification 

With the help of a Naive Bayes classifier, Google News recognizes whether the news is political, world news, and so on. 

As the Naive Bayes Classifier has so many applications, it’s worth learning more about how it works.

Understanding Naive Bayes Classifier 

Based on the Bayes theorem, the Naive Bayes Classifier gives the conditional probability of an event A given event B.

Let us use the following demo to understand the concept of a Naive Bayes classifier:

Shopping Example 

Problem statement: To predict whether a person will purchase a product on a specific combination of day, discount, and free delivery using a Naive Bayes classifier. 

shopping example

Under the day, look for variables, like weekday, weekend, and holiday. For any given day, check if there are a discount and free delivery. Based on this information, we can predict if a customer would buy the product or not. 

See a small sample data set of 30 rows, with 15 of them, as shown below:

sample data

Based on the dataset containing the three input types—day, discount, and free delivery— the frequency table for each attribute is populated.

slide 29

For Bayes theorem, let the event ‘buy’ be A and the independent variables (discount, free delivery, day) be B.

slide 30

Artificial Intelligence Engineer

Your Gateway to Becoming a Successful AI ExpertView Course
Artificial Intelligence Engineer

Let us calculate the likelihood for one of the “day” variables, which includes weekday, weekend, and holiday variables. 

slide 31

We get a total of:

11 weekdays 

Eight weekends

11 holidays

The total number of days adds up to 30 days.

There are nine out of 24 purchases on weekdays

There are seven out of 24 purchases on weekends

There are eight out of 24 purchases on holidays

Based on the above likelihood table, let us calculate some conditional probabilities:

P(B) = P(Weekday)

= 11/30

= 0.37

P(A) = P(No Buy)

= 6/30

= 0.2

P(B | A) 

= P(Weekday | No Buy)

= 2/6

= 0.33

P(A | B) 

= P(No Buy | Weekday)

= P(Weekday| No Buy) * P(No Buy) / P(Weekday)

= (0.33 * 0.2) / 0.37

= 0.18

The probability of purchasing on the weekday = 11/30 or 0.37

It means out of the 30 people who came into the store throughout the weekend, weekday, and holiday, 11 of those purchases were made on weekdays.

The probability of not making a purchase = 6/30 or 0.2. There's a 20 percent chance that they're not going to make a purchase, no matter what day of the week it is.

Finally, we look at the probability of B (i.e., weekdays) when no purchase occurs. 

The probability of the weekday without a purchase = 0.18 or 18 percent. As the probability of ( No | Weekday) is less than 0.5, the customer will most likely buy the product on a weekday. Next, let’s see how the table and conditional probabilities work in the Naive Bayes Classifier.

We have the frequency tables of all three independent variables, and we can construct the tables for all the three variables. 

See the likelihood tables for the three variables below:


Free Course: Programming Fundamentals

Learn the Basics of ProgrammingEnroll Now
Free Course: Programming Fundamentals

The likelihood tables can be used to calculate whether a customer will purchase a product on a specific combination of the day when there is a discount and whether there is free delivery. Consider a combination of the following factors where B equals:

  • Day = Holiday
  • Discount = Yes
  • Free Delivery = Yes 

Let us find the probability of them not purchasing based on the conditions above. 

A = No Purchase

Applying Bayes Theorem, we get P(A | B) as shown:


Similarly, let us find the probability of them purchasing a product under the conditions above.  

Here, A = Buy

Applying Bayes Theorem, we get P(A | B) as shown:


From the two calculations above, we find that:

Probability of purchase = 0.986 

Probability of no purchase = 0.178

Finally, we have a conditional probability of purchase on this day.

Next,  normalize these probabilities to get the likelihood of the events:

Sum of probabilities = 0.986 + 0.178 = 1.164

Likelihood of purchase = 0.986 / 1.164 = 84.71 percent

Likelihood of no purchase = 0.178 / 1.164 = 15.29 percent

Result: As 84.71 percent is greater than 15.29 percent, we can conclude that an average customer will buy on holiday with a discount and free delivery.

After understanding how Naive Bayes Classifier works, we can explore its benefits.

Machine Learning Free Course

Start Learning Today's Most In-Demand SkillsExplore Course
Machine Learning Free Course

Advantages of Naive Bayes Classifier 

The following are some of the benefits of the Naive Bayes classifier: 

  • It is simple and easy to implement
  • It doesn’t require as much training data
  • It handles both continuous and discrete data
  • It is highly scalable with the number of predictors and data points
  • It is fast and can be used to make real-time predictions
  • It is not sensitive to irrelevant features 
How much do you know about the algorithm called Naive Bayes? Try answering these Machine Learning Multiple Choice Questions and find out now. 

Use Case - Text Classification 

Text classification is one of the most popular applications of a Naive Bayes classifier.

Problem statement: To perform text classification of news headlines and classify news into different topics for a news website.


Machine learning has created a drastic impact in every sector that has integrated it into their business processes. Sectors like education, healthcare, retail, manufacturing, banking services, and more have already started investing in their initiatives involving machine learning. So why not seize upon the opportunities for growth enabled by machine learning? Enroll in Simplilearn's AI and Machine Learning Course today and harness the power.  

About the Author

Mayank BanoulaMayank Banoula

Mayank is a Research Analyst at Simplilearn. He is proficient in Machine learning and Artificial intelligence with python.

View More
  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.
  • *According to Simplilearn survey conducted and subject to terms & conditions with Ernst & Young LLP (EY) as Process Advisors