Everything You Need To Know About Bias And Variance

Last updated on Nov 7, 202390757

Tutorial Playlist

The Ultimate Machine Learning Tutorial
Overview
An Introduction To Machine Learning
Lesson - 1
What is Machine Learning and How Does It Work?
Lesson - 2
Machine Learning Steps: A Complete Guide
Lesson - 3
Top 10 Machine Learning Applications in 2025
Lesson - 4
Different Types of Machine Learning: Exploring AI's Core
Lesson - 5
A Beginner's Guide to Supervised & Unsupervised Learning in AI
Lesson - 6
Everything You Need to Know About Feature Selection
Lesson - 7
Linear Regression in Python
Lesson - 8
Everything You Need to Know About Classification in Machine Learning
Lesson - 9
Logistic Regression
Lesson - 10
Understanding the Difference Between Linear vs Logistic Regression
Lesson - 11
Random Forest Algorithm
Lesson - 12
Understanding Naive Bayes Classifier
Lesson - 13
Guide to Confusion Matrix
Lesson - 14
How to Leverage KNN Algorithm in Machine Learning?
Lesson - 15
K Means Clustering Algorithm: Applications, Types, Demos and Use Cases
Lesson - 16
PCA in Machine Learning: Your Complete Guide to Principal Component Analysis
Lesson - 17
What is Cost Function in Machine Learning
Lesson - 18
The Ultimate Guide to Cross-Validation in Machine Learning
Lesson - 19
Stock Price Prediction Using Machine Learning
Lesson - 20
What Is Reinforcement Learning: A Complete Guide
Lesson - 21
What Is Q-Learning: The Best Guide to Understand Q-Learning
Lesson - 22
The Best Guide to Regularization in Machine Learning
Lesson - 23
Everything You Need to Know About Bias and Variance
Lesson - 24
The Complete Guide on Overfitting and Underfitting in Machine Learning
Lesson - 25
Mathematics for Machine Learning - Important Skills You Must Possess
Lesson - 26
A One-Stop Guide to Statistics for Machine Learning
Lesson - 27
Embarking on a Machine Learning Career? Here’s All You Need to Know
Lesson - 28
How to Become a Machine Learning Engineer?
Lesson - 29
Top 45 Machine Learning Interview Questions and Answers for 2025
Lesson - 30
Explaining the Concepts of Quantum Computing
Lesson - 31
Supervised Machine Learning: All You Need to Know
Lesson - 32
10 Machine Learning Platforms to Revolutionize Your Business
Lesson - 33
What Is Boosting in Machine Learning ?: A Comprehensive Guide
Lesson - 34
Machine Learning vs. Neural Networks: Understanding the Differences
Lesson - 35
Unlocking the Future: 5 Compelling Reasons to Master Machine Learning in 2025
Lesson - 36
Feature Engineering
Lesson - 37
How to Create a Fake News Detection System?
Lesson - 38
Automated Machine Learning: A Quick Guide
Lesson - 39
Gaussian Mixture Models (GMM) Explained
Lesson - 40

While discussing model accuracy, we need to keep in mind the prediction errors, ie: Bias and Variance, that will always be associated with any machine learning model. There will always be a slight difference in what our model predicts and the actual predictions. These differences are called errors. The goal of an analyst is not to eliminate errors but to reduce them. There is always a tradeoff between how low you can get errors to be. In this article titled ‘Everything you need to know about Bias and Variance’, we will discuss what these errors are.

Errors in Machine Learning

We can describe an error as an action which is inaccurate or wrong. In Machine Learning, error is used to see how accurately our model can predict on data it uses to learn; as well as new, unseen data. Based on our error, we choose the machine learning model which performs best for a particular dataset.

There are two main types of errors present in any machine learning model. They are Reducible Errors and Irreducible Errors.

Irreducible errors are errors which will always be present in a machine learning model, because of unknown variables, and whose values cannot be reduced.
Reducible errors are those errors whose values can be further reduced to improve a model. They are caused because our model’s output function does not match the desired output function and can be optimized.

We can further divide reducible errors into two: Bias and Variance.

1-errors-ml

Figure 1: Errors in Machine Learning

What is Bias?

To make predictions, our model will analyze our data and find patterns in it. Using these patterns, we can make generalizations about certain instances in our data. Our model after training learns these patterns and applies them to the test set to predict them.

Bias is the difference between our actual and predicted values. Bias is the simple assumptions that our model makes about our data to be able to predict new data.

2-bias-ml

Figure 2: Bias

When the Bias is high, assumptions made by our model are too basic, the model can’t capture the important features of our data. This means that our model hasn’t captured patterns in the training data and hence cannot perform well on the testing data too. If this is the case, our model cannot perform on new data and cannot be sent into production.

This instance, where the model cannot find patterns in our training set and hence fails for both seen and unseen data, is called Underfitting.

The below figure shows an example of Underfitting. As we can see, the model has found no patterns in our data and the line of best fit is a straight line that does not pass through any of the data points. The model has failed to train properly on the data given and cannot predict new data either.

3-underfitting

Figure 3: Underfitting

What is Variance?

Variance is the very opposite of Bias. During training, it allows our model to ‘see’ the data a certain number of times to find patterns in it. If it does not work on the data for long enough, it will not find patterns and bias occurs. On the other hand, if our model is allowed to view the data too many times, it will learn very well for only that data. It will capture most patterns in the data, but it will also learn from the unnecessary data present, or from the noise.

We can define variance as the model’s sensitivity to fluctuations in the data. Our model may learn from noise. This will cause our model to consider trivial features as important.

4-example-ml

Figure 4: Example of Variance

In the above figure, we can see that our model has learned extremely well for our training data, which has taught it to identify cats. But when given new data, such as the picture of a fox, our model predicts it as a cat, as that is what it has learned. This happens when the Variance is high, our model will capture all the features of the data given to it, including the noise, will tune itself to the data, and predict it very well but when given new data, it cannot predict on it as it is too specific to training data.

Hence, our model will perform really well on testing data and get high accuracy but will fail to perform on new, unseen data. New data may not have the exact same features and the model won’t be able to predict it very well. This is called Overfitting.

5-overfitted-ml

Figure 5: Over-fitted model where we see model performance on, a) training data b) new data

Bias-Variance Tradeoff

For any model, we have to find the perfect balance between Bias and Variance. This just ensures that we capture the essential patterns in our model while ignoring the noise present it in. This is called Bias-Variance Tradeoff. It helps optimize the error in our model and keeps it as low as possible.

An optimized model will be sensitive to the patterns in our data, but at the same time will be able to generalize to new data. In this, both the bias and variance should be low so as to prevent overfitting and underfitting.

error-ml

Figure 6: Error in Training and Testing with high Bias and Variance

In the above figure, we can see that when bias is high, the error in both testing and training set is also high.If we have a high variance, the model performs well on the testing set, we can see that the error is low, but gives high error on the training set. We can see that there is a region in the middle, where the error in both training and testing set is low and the bias and variance is in perfect balance.

7-bullseye

Figure 7: Bull’s Eye Graph for Bias and Variance

The above bull’s eye graph helps explain bias and variance tradeoff better. The best fit is when the data is concentrated in the center, ie: at the bull’s eye. We can see that as we get farther and farther away from the center, the error increases in our model. The best model is one where bias and variance are both low.

Plotting Bias and Variance Using Python

Let’s find out the bias and variance in our weather prediction model. For this we use the daily forecast data as shown below:

8-weather-ml

Figure 8: Weather forecast data

We start off by importing the necessary modules and loading in our data.

9-importing-ml

Figure 9: Importing modules

In the data, we can see that the date and month are in military time and are in one column. The day of the month will not have much effect on the weather, but monthly seasonal variations are important to predict the weather. So, let’s make a new column which has only the month.

10-importing-ml

Figure 10: Creating new month column

The dataset now looks as shown below.

11-dataset-ml

Figure 11: New dataset

Dropping unnecessary columns.

12-dropping-ml

Figure 12: Dropping columns

The dataset becomes as shown.

13-new-ml

Figure 13: New Dataset

Let’s convert categorical columns to numerical ones.

14-converting-ml

Figure 14 : Converting categorical columns to numerical form

The dataset becomes as shown:

15-new-ml

Figure 15: New Numerical Dataset

Let’s convert the precipitation column to categorical form, too.

16-converting

Figure 16: Converting precipitation column to numerical form

Finding all missing values

17-finding

Figure 17: Finding Missing values

Replacing missing values with ‘0’.

18-replacing

Figure 18: Replacing ‘NaN’ with 0

Let’s drop the prediction column from our dataset.

Figure 19: Input variable

The output column looks as shown.

20-output

Figure 20: Output Variable

Splitting the dataset into training and testing data and fitting our model to it.

21-splitting-ml

Figure 21: Splitting and fitting our dataset

Predicting on our dataset and using the variance feature of numpy

22-finding-ml

Figure 22: Finding variance

Using squared mean error to find bias

23-finding-ml

Figure 23: Finding Bias

Conclusion

In this article - Everything you need to know about Bias and Variance, we find out about the various errors that can be present in a machine learning model. We then took a look at what these errors are and learned about Bias and variance, two types of errors that can be reduced and hence are used to help optimize the model. We learn about model optimization and error reduction and finally learn to find the bias and variance using python in our model.

Was this article on bias and variance useful to you? Do you have any doubts or questions for us? Mention them in this article's comments section, and we'll have our experts answer them for you at the earliest!

Looking forward to becoming a Machine Learning Engineer? Enroll in Simplilearn's AIML Course and get certified today.

About the Author

Mayank Banoula

Mayank is a Research Analyst at Simplilearn. He is proficient in Machine learning and Artificial intelligence with python.

Recommended Resources

prevNext

Tutorial Playlist

The Ultimate Machine Learning Tutorial

An Introduction To Machine Learning

What is Machine Learning and How Does It Work?

Machine Learning Steps: A Complete Guide

Top 10 Machine Learning Applications in 2025

Different Types of Machine Learning: Exploring AI's Core

A Beginner's Guide to Supervised & Unsupervised Learning in AI

Everything You Need to Know About Feature Selection

Linear Regression in Python

Everything You Need to Know About Classification in Machine Learning

Logistic Regression

Understanding the Difference Between Linear vs Logistic Regression

Random Forest Algorithm

Understanding Naive Bayes Classifier

Guide to Confusion Matrix

How to Leverage KNN Algorithm in Machine Learning?

K Means Clustering Algorithm: Applications, Types, Demos and Use Cases

PCA in Machine Learning: Your Complete Guide to Principal Component Analysis

What is Cost Function in Machine Learning

The Ultimate Guide to Cross-Validation in Machine Learning

Stock Price Prediction Using Machine Learning

What Is Reinforcement Learning: A Complete Guide

What Is Q-Learning: The Best Guide to Understand Q-Learning

The Best Guide to Regularization in Machine Learning

Everything You Need to Know About Bias and Variance

The Complete Guide on Overfitting and Underfitting in Machine Learning

Mathematics for Machine Learning - Important Skills You Must Possess

A One-Stop Guide to Statistics for Machine Learning

Embarking on a Machine Learning Career? Here’s All You Need to Know

How to Become a Machine Learning Engineer?

Top 45 Machine Learning Interview Questions and Answers for 2025

Explaining the Concepts of Quantum Computing

Supervised Machine Learning: All You Need to Know

10 Machine Learning Platforms to Revolutionize Your Business

What Is Boosting in Machine Learning ?: A Comprehensive Guide

Machine Learning vs. Neural Networks: Understanding the Differences

Unlocking the Future: 5 Compelling Reasons to Master Machine Learning in 2025

Feature Engineering

How to Create a Fake News Detection System?

Automated Machine Learning: A Quick Guide

Gaussian Mixture Models (GMM) Explained