Tutorial Playlist

Machine Learning Tutorial: A Step-by-Step Guide for Beginners


What is Machine Learning and How Does It Work?

Lesson - 1

Random Forest Algorithm

Lesson - 2

The Ultimate Guide to Cross-Validation in Machine Learning

Lesson - 3

How to Leverage KNN Algorithm in Machine Learning?

Lesson - 4

Everything You Need to Know About Classification in Machine Learning

Lesson - 5

Top 34 Machine Learning Interview Questions and Answers in 2021

Lesson - 6

PCA in Machine Learning - Your Complete Guide to Principal Component Analysis

Lesson - 7

Top 10 Machine Learning Applications in 2020

Lesson - 8

The Best Guide On How To Implement Decision Tree In Python

Lesson - 9

Supervised and Unsupervised Learning in Machine Learning

Lesson - 10

What Is Reinforcement Learning? The Best Guide To Reinforcement Learning

Lesson - 11

The Best Guide to Confusion Matrix

Lesson - 12

Understanding Naive Bayes Classifier

Lesson - 13

Machine Learning Tutorial

Lesson - 14

Linear Regression in Python

Lesson - 15

How to Become a Machine Learning Engineer?

Lesson - 16

An Introduction to Logistic Regression in Python

Lesson - 17

What Is Q-Learning? The Best Guide to Understand Q-Learning

Lesson - 18

An Introduction to the Types Of Machine Learning

Lesson - 19

Everything You Need to Know About Feature Selection

Lesson - 20

The Best Guide to Regularization in Machine Learning

Lesson - 21

Everything You Need to Know About Bias and Variance

Lesson - 22

What is Cost Function in Machine Learning

Lesson - 23

Embarking on a Machine Learning Career? Here’s All You Need to Know

Lesson - 24

A One-Stop Guide to Statistics for Machine Learning

Lesson - 25

Mathematics for Machine Learning - Important Skills You Must Possess

Lesson - 26

K-Means Clustering Algorithm: Applications, Types, Demos and Use Cases

Lesson - 27
The Best Guide On How To Implement Decision Tree In Python

Machine learning algorithms are used in almost every sector of business to solve critical problems and build intelligent systems and processes. Supervised machine learning algorithms, specifically, are used for solving classification and regression problems. In this article, we’ll be covering one of the most popularly used supervised learning algorithms: decision trees in Python.

We’ll cover the following topics in this article:

  • What is a decision tree?
  • Advantages of using decision trees
  • Decision tree applications
  • Important terms used in decision trees
  • How does a decision tree algorithm work?
  • Building a decision tree in Python

What is a Decision Tree?

A decision tree is a tree-based supervised learning method used to predict the output of a target variable. Supervised learning uses labeled data (data with known output variables) to make predictions with the help of regression and classification algorithms. Supervised learning algorithms act as a supervisor for training a model with a defined output variable. It learns from simple decision rules using the various data features. Decision trees in Python can be used to solve both classification and regression problems—they are frequently used in determining odds. 

The following is an example of a simple decision tree used to classify different animals based on their features. We will be using the color and height of the animals as input features.


Fig: Decision tree to classify animals

Advantages of Using Decision Trees

  • Decision trees are simple to understand, interpret, and visualize
  • They can effectively handle both numerical and categorical data
  • They can determine the worst, best, and expected values for several scenarios
  • Decision trees require little data preparation and data normalization
  • They perform well, even if the actual model violates the assumptions

FREE Machine Learning Certification Course

To become a Machine Learning EngineerExplore Course
FREE Machine Learning Certification Course

Decision Tree Applications

  1. A decision tree is used to determine whether an applicant is likely to default on a loan.
  2. It can be used to determine the odds of an individual developing a specific disease. 
  3. It can help ecommerce companies in predicting whether a consumer is likely to purchase a specific product.
  4. Decision trees can also be used to find customer churn rates.

Important Terms Used in Decision Trees

1. Entropy: Entropy is the measure of uncertainty or randomness in a data set. Entropy handles how a decision tree splits the data.

It is calculated using the following formula:

2. Information Gain: The information gain measures the decrease in entropy after the data set is split.

It is calculated as follows:

IG( Y, X) = Entropy (Y) - Entropy ( Y | X)

3. Gini Index: The Gini Index is used to determine the correct variable for splitting nodes. It measures how often a randomly chosen variable would be incorrectly identified.

4. Root Node: The root node is always the top node of a decision tree. It represents the entire population or data sample, and it can be further divided into different sets.

5. Decision Node: Decision nodes are subnodes that can be split into different subnodes; they contain at least two branches. 

6. Leaf Node: A leaf node in a decision tree carries the final results. These nodes, which are also known as terminal nodes, cannot be split any further. 

How Does a Decision Tree Algorithm Work?

Suppose there are different animals, and you want to identify each animal and classify them based on their features. We can easily accomplish this by using a decision tree.

The following is a cluttered sample data set with high entropy:


We have to determine which features split the data so that the information gain is the highest. We can do that by splitting the data using each feature and checking the information gain that we obtain from them. The feature that returns the highest gain will be used for the first split.

For our demo, we will take the following features into consideration:


We’ll use the information gain method to determine which variable yields the maximum gain, which can also be used as the root node.

Suppose Color == Yellow results in the maximum information gain, so that is what we will use for our first split at the root node.


Fig: Using Color == Yellow for our first split of decision tree

The entropy after splitting should decrease considerably. However, we still need to split the child nodes at both the branches to attain an entropy value equal to zero.

We will split both the nodes using ‘height’ variable and height > 10 and height < 10 as our conditions.


Fig: Slitting the decision tree with the height variable

The decision tree above can now predict all the classes of animals present in the data set.

Now, it’s time to build a prediction model using the decision tree in Python.

Building a Decision Tree in Python

We’ll now predict if a consumer is likely to repay a loan using the decision tree algorithm in Python. The data set contains a wide range of information for making this prediction, including the initial payment amount, last payment amount, credit score, house number, and whether the individual was able to repay the loan.


1. First, we’ll import the libraries required to build a decision tree in Python.


2. Load the data set using the read_csv() function in pandas.


3. Display the top five rows from the data set using the head() function.


4. Separate the independent and dependent variables using the slicing method.


5. Split the data into training and testing sets.


6. Train the model using the decision tree classifier.


7. Predict the test data set values using the model above.


8. Calculate the accuracy of the model using the accuracy score function.

Our prediction model shows that there is an excellent accuracy score of 93.67 percent.

Looking forward to make a move to programming? Take up the Python Training Course and begin your career. Enroll now!


In this article, we covered one of the most widely used supervised learning algorithms—decision trees in Python. We not only introduced the basics of this algorithm, but we also went over its many benefits, explored more about how it works, and went through a demo that used the decision tree algorithm. 

Do you have any questions about this article and what we covered? Please leave  it in the comment section below, and someone from our team will get back to you as soon as possible.

Want to Learn Even More?

If you’re interested in developing more expertise in machine learning to boost your career in data science and AI, check out Simplilearn’s Machine Learning Certification Course. You will learn how to develop and train machine learning algorithms using Python and more through our unique Blended Learning approach.

About the Author


Simplilearn is one of the world’s leading providers of online training for Digital Marketing, Cloud Computing, Project Management, Data Science, IT, Software Development, and many other emerging technologies.

View More
  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.