The Complete Guide on Overfitting and Underfitting in Machine Learning

Overfitting and Underfitting are two crucial concepts in machine learning and are the prevalent causes for the poor performance of a machine learning model. This tutorial will explore Overfitting and Underfitting in machine learning, and help you understand how to avoid them with a hands-on demonstration.

Your AI/ML Career is Just Around The Corner!

AI Engineer Master's ProgramExplore Program
Your AI/ML Career is Just Around The Corner!

What is Overfitting?

When a model performs very well for training data but has poor performance with test data (new data), it is known as overfitting. In this case, the machine learning model learns the details and noise in the training data such that it negatively affects the performance of the model on test data. Overfitting can happen due to low bias and high variance.

Overfitting_in-ML

Become an AI and ML Expert in 2024

Discover the Power of AI and ML With UsEXPLORE NOW
Become an AI and ML Expert in 2024

Reasons for Overfitting

  • Data used for training is not cleaned and contains noise (garbage values) in it
  • The model has a high variance
  • The size of the training dataset used is not enough
  • The model is too complex

Ways to Tackle Overfitting

  • Using K-fold cross-validation
  • Using Regularization techniques such as Lasso and Ridge
  • Training model with sufficient data
  • Adopting ensembling techniques

What is Underfitting?

When a model has not learned the patterns in the training data well and is unable to generalize well on the new data, it is known as underfitting. An underfit model has poor performance on the training data and will result in unreliable predictions. Underfitting occurs due to high bias and low variance.

Underfitting_in_ML

Reasons for Underfitting

  • Data used for training is not cleaned and contains noise (garbage values) in it
  • The model has a high bias
  • The size of the training dataset used is not enough
  • The model is too simple

Ways to Tackle Underfitting

  • Increase the number of features in the dataset
  • Increase model complexity
  • Reduce noise in the data
  • Increase the duration of training the data

Now that you have understood what overfitting and underfitting are, let’s see what is a good fit model in this tutorial on overfitting and underfitting in machine learning. 

Your AI/ML Career is Just Around The Corner!

AI Engineer Master's ProgramExplore Program
Your AI/ML Career is Just Around The Corner!

What Is a Good Fit In Machine Learning?

To find the good fit model, you need to look at the performance of a machine learning model over time with the training data. As the algorithm learns over time, the error for the model on the training data reduces, as well as the error on the test dataset. If you train the model for too long, the model may learn the unnecessary details and the noise in the training set and hence lead to overfitting. In order to achieve a good fit, you need to stop training at a point where the error starts to increase.

Good_fit_model.

Demo - Analyzing Goodness of Fit For IRIS Dataset

  • Import the libraries

Import_Libraries

  • Load the IRIS dataset

Load_dataset

  • Now, you will use K-Fold Cross-Validation with 20 folds (K=20) to evaluate the generalization efficiency of the model. Within each fold, you will estimate the train and test error using the training and testing datasets, respectively.

KFoldCrossValidation

  • Plot the mean absolute error (MAE) of the training phase and the MAE of the testing phase

CalculatingMAE

PlottingErrors

Using the K-Fold Cross Validation method, you were able to significantly reduce the error in the testing dataset.

Become an AI and ML Expert in 2024

Discover the Power of AI and ML With UsEXPLORE NOW
Become an AI and ML Expert in 2024

Conclusion

Overfitting and Underfitting are two vital concepts that are related to the bias-variance trade-offs in machine learning. In this tutorial, you learned the basics of overfitting and underfitting in machine learning and how to avoid them. You also looked at the various reasons for their occurrence. 

If you are looking to learn the fundamentals of machine learning and get a comprehensive work-ready understanding of it, Simplilearn’s AI ML Course in partnership with Purdue & in collaboration with IBM. should be ideal for you. This 12-month long bootcamp program features comprehensive applied training in key concepts of Machine learning, Deep Learning with Keras and Tensorflow, Advanced deep learning and Computer Vision, Natural Language Processing and more.

Do you have any questions related to this tutorial on overfitting and underfitting in machine learning? In case you have questions, please put them in the comments section. We’ll help you answer them. To learn more, check the following video: Overfitting and Underfitting.

Happy learning!

About the Author

Avijeet BiswalAvijeet Biswal

Avijeet is a Senior Research Analyst at Simplilearn. Passionate about Data Analytics, Machine Learning, and Deep Learning, Avijeet is also interested in politics, cricket, and football.

View More
  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.