Linear regression is a model that predicts one variable's values based on another's importance. It's one of the most popular and widely-used models in machine learning, and it's also one of the first things you should learn as you explore machine learning.

Linear regression is so popular because it's so simple: all it does is try to predict values based on past data, which makes it easy to get started with and understand. The simplicity means it's also easy to implement, which makes it a great starting point if you're new to machine learning.

There are two types of linear regression algorithms - 

  • Simple - deals with two features.
  • Multiple - deals with more than two features.

In this guide, let’s understand multiple linear regression in depth.

PCP in AI and Machine Learning

In Partnership with Purdue UniversityExplore Course
PCP in AI and Machine Learning

What Is Multiple Linear Regression (MLR)?

One of the most common types of predictive analysis is multiple linear regression. This type of analysis allows you to understand the relationship between a continuous dependent variable and two or more independent variables.

The independent variables can be either continuous (like age and height) or categorical (like gender and occupation). It's important to note that if your dependent variable is categorical, you should dummy code it before running the analysis.

Formula and Calculation of Multiple Linear Regression 

Several circumstances that influence the dependent variable simultaneously can be controlled through multiple regression analysis. Regression analysis is a method of analyzing the relationship between independent variables and dependent variables. 

Let k represent the number of variables denoted by x1, x2, x3, ……, xk. 

For this method, we assume that we have k independent variables x1, . . . , xk that we can set, then they probabilistically determine an outcome Y. 

Furthermore, we assume that Y is linearly dependent on the factors according to 

Y = β0 + β1x1 + β2x2 + · · · + βkxk + ε

  • The variable yi is dependent or predicted
  • The slope of y depends on the y-intercept, that is, when xi and x2 are both zero, y will be β0.
  • The regression coefficients β1 and β2 represent the change in y as a result of one-unit changes in xi1 and xi2.
  • βp refers to the slope coefficient of all independent variables
  • ε term describes the random error (residual) in the model.

Where ε is a standard error, this is just like we had for simple linear regression, except k doesn’t have to be 1. 

We have n observations, n typically being much more than k. 

For i th observation, we set the independent variables to the values xi1, xi2 . . . , xik and measure a value yi for the random variable Yi. 

Thus, the model can be described by the equations. 

Yi = β0 + β1xi1 + β2xi2 + · · · + βkxik + i for i = 1, 2, . . . , n, 

Where the errors i are independent standard variables, each with mean 0 and the same unknown variance σ2.

Altogether the model for multiple linear regression has k + 2 unknown parameters: 

β0, β1, . . . , βk, and σ 2. 

When k was equal to 1, we found the least squares line y = βˆ 0 +βˆ 1x. 

It was a line in the plane R 2. 

Now, with k ≥ 1, we’ll have a least squares hyperplane. 

y = βˆ 0 + βˆ 1x1 + βˆ 2x2 + · · · + βˆ kxk in Rk+1. 

The way to find the estimators βˆ 0, βˆ 1, . . ., and βˆ k is the same. 

Take the partial derivatives of the squared error. 

Q = Xn i=1 (yi − (β0 + β1xi1 + β2xi2 + · · · + βkxik))2 

When that system is solved we have fitted values 

yˆi = βˆ 0 + βˆ 1xi1 + βˆ 2xi2 + · · · + βˆ kxik for i = 1, . . . , n that should be close to the actual values yi.

FREE Machine Learning Course

Learn In-demand Machine Learning Skills and ToolsStart Now
FREE Machine Learning Course

Assumptions of Multiple Linear Regression

In multiple linear regression, the dependent variable is the outcome or result from you're trying to predict. The independent variables are the things that explain your dependent variable. You can use them to build a model that accurately predicts your dependent variable from the independent variables.

For your model to be reliable and valid, there are some essential requirements:

  • The independent and dependent variables are linearly related.
  • There is no strong correlation between the independent variables.
  • Residuals have a constant variance.
  • Observations should be independent of one another.
  • It is important that all variables follow multivariate normality.

Example of How to Use Multiple Linear Regression

from sklearn.datasets import load_boston

import pandas as pd

from sklearn.model_selection import train_test_split

def sklearn_to_df(data_loader):

    X_data = data_loader.data

    X_columns = data_loader.feature_names

    X = pd.DataFrame(X_data, columns=X_columns)

    y_data = data_loader.target

    y = pd.Series(y_data, name='target')

    return x, y

x, y = sklearn_to_df(load_boston())

x_train, x_test, y_train, y_test = train_test_split(

    x, y, test_size=0.2, random_state=42)

from load_dataset import x_train, x_test, y_train, y_test

from multiple_linear_regression import MultipleLinearRegression

from sklearn.linear_model import LinearRegression

mulreg = MultipleLinearRegression()

# fit our LR to our data

mulreg.fit(x_train, y_train)

# make predictions and score

pred = mulreg.predict(x_test)

# calculate r2_score

score = mulreg.r2_score(y_test, pred)

print(f'Our Final R^2 score: {score}')

The Difference Between Linear and Multiple Regression

When predicting a complex process's outcome, it is best to use multiple linear regression instead of simple linear regression.

A simple linear regression can accurately capture the relationship between two variables in simple relationships. On the other hand, multiple linear regression can capture more complex interactions that require more thought.

A multiple regression model uses more than one independent variable. It does not suffer from the same limitations as the simple regression equation, and it is thus able to fit curved and non-linear relationships. The following are the uses of multiple linear regression.

  1. Planning and Control.
  2. Prediction or Forecasting. 

Estimating relationships between variables can be exciting and useful. As with all other regression models, the multiple regression model assesses relationships among variables in terms of their ability to predict the value of the dependent variable.

Free Course: Machine Learning Algorithms

Learn the Basics of Machine Learning AlgorithmsEnroll Now
Free Course: Machine Learning Algorithms

Why and When to Use Multiple Regression Over a Simple OLS Regression?

When you're trying to predict something, it's usually helpful to start with a linear model. But sometimes things aren't so simple.

Multiple regression is used when you want to predict a dependent variable using more than one independent variable. It's the same type of regression as ordinary linear squares (OLS) regression. On the other hand, OLS regression distinguishes the effect of an explanatory variable on a continuous dependent variable by comparing the distributions of these variables based on the changes in the value of the explanatory variables.

MLR can use more than one explanatory variable at once. This allows you to make better predictions about what might happen in your data if certain changes were made.

Our Learners Also Ask

1. When should we use multiple linear regression?

Multiple linear regression is a statistical technique used to analyze a dataset with various independent variables affecting the dependent variable. When forecasting more complex relationships, this is often the case.

The technique allows researchers to predict a dependent variable's outcome based on certain variables' values. It also will enable researchers to assess whether or not there are any interactions between independent variables, which can help them understand more about how they affect each other.

2. What is multiple regression used for?

When making a prediction or forecasting, it's best to have as much data as possible. Multiple linear regression is a model that allows you to account for all of these potentially significant variables in one model. 

The benefits of this approach include a more accurate and detailed view of the relationship between each particular factor and the outcome. It means you can plan and monitor your data more effectively.

3. What is the difference between linear and multiple regression?

Simple linear regression is the way to go when trying to model a relationship between two variables. But what if the relationship is more complex? That's when multiple linear regression comes in handy!

Multiple regressions are used for: 

  1. Planning and monitoring
  2. Prediction or forecasting. 

Multiple linear regression uses many variables to predict the outcome of a dependent variable. It can account for nonlinear relationships and interactions between variables in ways that simple linear regression can't. And it does so with greater accuracy!

3. What is the formula for multiple linear regression?

MLR formula look like : y = a + bx1 + cx2 + dx3 + ……. 

The coefficients tell you exactly how much each independent variable contributes to the dependent variable and how much each independent variable contributes in isolation. 

For example, if you had two independent variables (x1 and x2), then the coefficient for x1 would tell you how strongly each unit change in x1 affects y—and likewise for x2.

4. What are the assumptions for multiple linear regression?

To ensure that your data is appropriate for the linear regression analysis, you need to make sure that it meets the following five conditions:

  1. A linear relationship between the dependent and independent variables.
  2. The independent variables are not highly correlated with each other.
  3. The variance of the residuals is constant.
  4. Independence of observation (that is, each observation should have been collected independently).
  5. Multivariate normality (that is, all variables should be normally distributed).
Stay ahead of the tech-game with our PG Program in AI and Machine Learning in partnership with Purdue and in collaboration with IBM. Explore more!

Conclusion

Multiple linear regression is a statistical technique that uses multiple linear regression to model more complex relationships between two or more independent variables and one dependent variable. It is used when there are two or more x variables.

Are you looking for a career in AI, Machine Learning, Deep Learning, and Computer Vision?

You can do it with our intensive Post Graduate Program In AI And Machine Learning. We offer this program in collaboration with IBM and Purdue University and include live sessions from outside experts, laboratories, and business projects.

About the Author

SimplilearnSimplilearn

Simplilearn is one of the world’s leading providers of online training for Digital Marketing, Cloud Computing, Project Management, Data Science, IT, Software Development, and many other emerging technologies.

View More
  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.