Tutorial Playlist

Machine Learning Tutorial: A Step-by-Step Guide for Beginners

Overview

An Introduction To Machine Learning

Lesson - 1

What is Machine Learning and How Does It Work?

Lesson - 2

The Complete Guide to Understanding Machine Learning Steps

Lesson - 3

Top 10 Machine Learning Applications in 2020

Lesson - 4

An Introduction to the Types Of Machine Learning

Lesson - 5

Supervised and Unsupervised Learning in Machine Learning

Lesson - 6

Everything You Need to Know About Feature Selection

Lesson - 7

Linear Regression in Python

Lesson - 8

Everything You Need to Know About Classification in Machine Learning

Lesson - 9

An Introduction to Logistic Regression in Python

Lesson - 10

Understanding the Difference Between Linear vs. Logistic Regression

Lesson - 11

The Best Guide On How To Implement Decision Tree In Python

Lesson - 12

Random Forest Algorithm

Lesson - 13

Understanding Naive Bayes Classifier

Lesson - 14

The Best Guide to Confusion Matrix

Lesson - 15

How to Leverage KNN Algorithm in Machine Learning?

Lesson - 16

K-Means Clustering Algorithm: Applications, Types, Demos and Use Cases

Lesson - 17

PCA in Machine Learning - Your Complete Guide to Principal Component Analysis

Lesson - 18

What is Cost Function in Machine Learning

Lesson - 19

The Ultimate Guide to Cross-Validation in Machine Learning

Lesson - 20

An Easy Guide to Stock Price Prediction Using Machine Learning

Lesson - 21

What Is Reinforcement Learning? The Best Guide To Reinforcement Learning

Lesson - 22

What Is Q-Learning? The Best Guide to Understand Q-Learning

Lesson - 23

The Best Guide to Regularization in Machine Learning

Lesson - 24

Everything You Need to Know About Bias and Variance

Lesson - 25

The Complete Guide on Overfitting and Underfitting in Machine Learning

Lesson - 26

Mathematics for Machine Learning - Important Skills You Must Possess

Lesson - 27

A One-Stop Guide to Statistics for Machine Learning

Lesson - 28

Embarking on a Machine Learning Career? Here’s All You Need to Know

Lesson - 29

How to Become a Machine Learning Engineer?

Lesson - 30

Top 34 Machine Learning Interview Questions and Answers in 2021

Lesson - 31
The Complete Guide on Overfitting and Underfitting in Machine Learning

Overfitting and Underfitting are two crucial concepts in machine learning and are the prevalent causes for the poor performance of a machine learning model. This tutorial will explore Overfitting and Underfitting in machine learning, and help you understand how to avoid them with a hands-on demonstration.

What is Overfitting?

When a model performs very well for training data but has poor performance with test data (new data), it is known as overfitting. In this case, the machine learning model learns the details and noise in the training data such that it negatively affects the performance of the model on test data. Overfitting can happen due to low bias and high variance.

Overfitting_in-ML

Reasons for Overfitting

  • Data used for training is not cleaned and contains noise (garbage values) in it
  • The model has a high variance
  • The size of the training dataset used is not enough
  • The model is too complex

Ways to Tackle Overfitting

  • Using K-fold cross-validation
  • Using Regularization techniques such as Lasso and Ridge
  • Training model with sufficient data
  • Adopting ensembling techniques

What is Underfitting?

When a model has not learned the patterns in the training data well and is unable to generalize well on the new data, it is known as underfitting. An underfit model has poor performance on the training data and will result in unreliable predictions. Underfitting occurs due to high bias and low variance.

Underfitting_in_ML

Reasons for Underfitting

  • Data used for training is not cleaned and contains noise (garbage values) in it
  • The model has a high bias
  • The size of the training dataset used is not enough
  • The model is too simple

Ways to Tackle Underfitting

  • Increase the number of features in the dataset
  • Increase model complexity
  • Reduce noise in the data
  • Increase the duration of training the data

Now that you have understood what overfitting and underfitting are, let’s see what is a good fit model in this tutorial on overfitting and underfitting in machine learning. 

FREE Machine Learning Course

Learn In-demand Machine Learning SkillsStart Now
FREE Machine Learning Course

What Is a Good Fit In Machine Learning?

To find the good fit model, you need to look at the performance of a machine learning model over time with the training data. As the algorithm learns over time, the error for the model on the training data reduces, as well as the error on the test dataset. If you train the model for too long, the model may learn the unnecessary details and the noise in the training set and hence lead to overfitting. In order to achieve a good fit, you need to stop training at a point where the error starts to increase.

Good_fit_model.

Demo - Analyzing Goodness of Fit For IRIS Dataset

  • Import the libraries

Import_Libraries

  • Load the IRIS dataset

Load_dataset

  • Now, you will use K-Fold Cross-Validation with 20 folds (K=20) to evaluate the generalization efficiency of the model. Within each fold, you will estimate the train and test error using the training and testing datasets, respectively.

KFoldCrossValidation

  • Plot the mean absolute error (MAE) of the training phase and the MAE of the testing phase

CalculatingMAE

PlottingErrors

Using the K-Fold Cross Validation method, you were able to significantly reduce the error in the testing dataset.

Enhance your skill set and give a boost to your career with the Post Graduate Program in AI and Machine Learning.

Conclusion

Overfitting and Underfitting are two vital concepts that are related to the bias-variance trade-offs in machine learning. In this tutorial, you learned the basics of overfitting and underfitting in machine learning and how to avoid them. You also looked at the various reasons for their occurrence. 

If you are looking to learn the fundamentals of machine learning and get a comprehensive work-ready understanding of it, Simplilearn’s Post Graduate Program in AI and Machine Learning in partnership with Purdue & in collaboration with IBM. should be ideal for you. This 12-month long bootcamp program features comprehensive applied training in key concepts of Machine learning, Deep Learning with Keras and Tensorflow, Advanced deep learning and Computer Vision, Natural Language Processing and more.

Do you have any questions related to this tutorial on overfitting and underfitting in machine learning? In case you have questions, please put them in the comments section. We’ll help you answer them. To learn more, check the following video: Overfitting and Underfitting.

Happy learning!

About the Author

Avijeet BiswalAvijeet Biswal

Avijeet is a Senior Research Analyst at Simplilearn. Passionate about Data Analytics, Machine Learning, and Deep Learning, Avijeet is also interested in politics, cricket, and football.

View More
  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.