- What is Multiple Linear Regression in Machine Learning? - RefreshIt is a statistical technique to forecast a single result depending on several variables.
- It is an expansion of standard regression, which makes predictions based on just one element. MLR uses two or more independent variables (factors) that impact a single dependent variable.
- Envision a straight line. MLR aids in locating the line that best fits the data and best describes how the various elements affect the result.
- MLR assumes that the relationship between the factors and the outcome is linear, which isn't always the case.

Linear regression is a model that predicts one variable's values based on another's importance. It's one of the most popular and widely-used models in machine learning, and it's also one of the first things you should learn as you explore machine learning.

Linear regression is so popular because it's so simple: all it does is try to predict values based on past data, which makes it easy to get started with and understand. The simplicity means it's also easy to implement, which makes it a great starting point if you're new to machine learning.

There are two types of linear regression algorithms -

- Simple - deals with two features.
- Multiple - deals with more than two features.

In this guide, let’s understand multiple linear regression in depth.

## What Is Multiple Linear Regression (MLR)?

In machine learning and data analysis, multiple linear regression (MLR) is a statistical technique used to predict the relationship between one dependent variable and two or more independent variables. By adding more predictors to the simple linear regression model, this technique helps to understand better how the predictors affect the outcome variable as a whole. Using an equation that best fits the observed data, the main objective of multiple linear regression (MLR) is to forecast the dependent variable's value based on the independent variables' values. This methodology is extensively employed across many domains, including economics, finance, biology, and social sciences, to facilitate forecasting, detect patterns, and comprehend the impact of multiple elements on a singular result.

## Formula and Calculation of Multiple Linear Regression

Several circumstances that influence the dependent variable simultaneously can be controlled through multiple regression analysis. Regression analysis is a method of analyzing the relationship between independent variables and dependent variables.

Let k represent the number of variables denoted by x1, x2, x3, ……, xk.

For this method, we assume that we have k independent variables x1, . . . , xk that we can set, then they probabilistically determine an outcome Y.

Furthermore, we assume that Y is linearly dependent on the factors according to

Y = β0 + β1x1 + β2x2 + · · · + βkxk + ε

- The variable yi is dependent or predicted
- The slope of y depends on the y-intercept, that is, when xi and x2 are both zero, y will be β0.
- The regression coefficients β1 and β2 represent the change in y as a result of one-unit changes in xi1 and xi2.
- βp refers to the slope coefficient of all independent variables
- ε term describes the random error (residual) in the model.

Where ε is a standard error, this is just like we had for simple linear regression, except k doesn’t have to be 1.

We have n observations, n typically being much more than k.

For i th observation, we set the independent variables to the values xi1, xi2 . . . , xik and measure a value yi for the random variable Yi.

Thus, the model can be described by the equations.

Yi = β0 + β1xi1 + β2xi2 + · · · + βkxik + i for i = 1, 2, . . . , n,

Where the errors i are independent standard variables, each with mean 0 and the same unknown variance σ2.

Altogether the model for multiple linear regression has k + 2 unknown parameters:

β0, β1, . . . , βk, and σ 2.

When k was equal to 1, we found the least squares line y = βˆ 0 +βˆ 1x.

It was a line in the plane R 2.

Now, with k ≥ 1, we’ll have a least squares hyperplane.

y = βˆ 0 + βˆ 1x1 + βˆ 2x2 + · · · + βˆ kxk in Rk+1.

The way to find the estimators βˆ 0, βˆ 1, . . ., and βˆ k is the same.

Take the partial derivatives of the squared error.

Q = Xn i=1 (yi − (β0 + β1xi1 + β2xi2 + · · · + βkxik))2

When that system is solved we have fitted values

yˆi = βˆ 0 + βˆ 1xi1 + βˆ 2xi2 + · · · + βˆ kxik for i = 1, . . . , n that should be close to the actual values yi.

## Assumptions of Multiple Linear Regression

Multiple linear regression relies on several key assumptions to produce valid and reliable results:

### 1. Linearity

The relationship between the dependent variable and each independent variable is linear. This means the change in the dependent variable is proportional to the change in each independent variable.

### 2. Independence

The observations are independent of each other. This assumption ensures that the value of the dependent variable for one observation is not influenced by the value for another.

### 3. Homoscedasticit

The variance of the residuals (errors) is constant across all levels of the independent variables. This means that the spread of residuals should be roughly the same for all predicted values.

### 4. Normality of Residuals

The residuals (differences between observed and predicted values) are typically distributed. This is particularly important for hypothesis testing and constructing confidence intervals.

### 5. No Multicollinearity

The independent variables are not too highly correlated. However, high multicollinearity can make it difficult to determine the individual effect of each independent variable.

### 6. No Autocorrelation

There is no correlation between the residuals. Autocorrelation can indicate that the model is missing some crucial predictors.

### 7. Fixed Independent Variables

The values of the independent variables are fixed in repeated samples, meaning they are measured without error.

## Example of How to Use Multiple Linear Regression

from sklearn.datasets import load_boston

import pandas as pd

from sklearn.model_selection import train_test_split

def sklearn_to_df(data_loader):

X_data = data_loader.data

X_columns = data_loader.feature_names

X = pd.DataFrame(X_data, columns=X_columns)

y_data = data_loader.target

y = pd.Series(y_data, name='target')

return x, y

x, y = sklearn_to_df(load_boston())

x_train, x_test, y_train, y_test = train_test_split(

x, y, test_size=0.2, random_state=42)

from load_dataset import x_train, x_test, y_train, y_test

from multiple_linear_regression import MultipleLinearRegression

from sklearn.linear_model import LinearRegression

mulreg = MultipleLinearRegression()

# fit our LR to our data

mulreg.fit(x_train, y_train)

# make predictions and score

pred = mulreg.predict(x_test)

# calculate r2_score

score = mulreg.r2_score(y_test, pred)

print(f'Our Final R^2 score: {score}')

## The Difference Between Linear and Multiple Regression

When predicting a complex process's outcome, it is best to use multiple linear regression instead of simple linear regression.

A simple linear regression can accurately capture the relationship between two variables in simple relationships. On the other hand, multiple linear regression can capture more complex interactions that require more thought.

A multiple regression model uses more than one independent variable. It does not suffer from the same limitations as the simple regression equation, and it is thus able to fit curved and non-linear relationships. The following are the uses of multiple linear regression.

- Planning and Control.
- Prediction or Forecasting.

Estimating relationships between variables can be exciting and useful. As with all other regression models, the multiple regression model assesses relationships among variables in terms of their ability to predict the value of the dependent variable.

## Why and When to Use Multiple Regression Over a Simple OLS Regression?

In situations where more than one predictor variable influences the result variable, multiple regression should be utilized instead of a straightforward OLS (Ordinary Least Squares) regression. The intricacy of the relationships in the data may be missed by simple OLS regression, which only uses one predictor, producing biased or insufficient findings. A more thorough and accurate model that considers the combined effects of numerous factors on the dependent variable can be created by using multiple regression, which enables the inclusion of multiple independent variables. This is especially crucial in real-world situations where various factors usually influence the outcome. Multiple regression analysis helps you determine each predictor's relative importance, account for confounding variables, and enhance the model's overall predictive strength and explanatory capacity.

Our Learners Also Ask

### 1. When should we use multiple linear regression?

Multiple linear regression is a statistical technique used to analyze a dataset with various independent variables affecting the dependent variable. When forecasting more complex relationships, this is often the case.

The technique allows researchers to predict a dependent variable's outcome based on certain variables' values. It also will enable researchers to assess whether or not there are any interactions between independent variables, which can help them understand more about how they affect each other.

### 2. What is multiple regression used for?

When making a prediction or forecasting, it's best to have as much data as possible. Multiple linear regression is a model that allows you to account for all of these potentially significant variables in one model.

The benefits of this approach include a more accurate and detailed view of the relationship between each particular factor and the outcome. It means you can plan and monitor your data more effectively.

### 3. What is the difference between linear and multiple regression?

Simple linear regression is the way to go when trying to model a relationship between two variables. But what if the relationship is more complex? That's when multiple linear regression comes in handy!

Multiple regressions are used for:

- Planning and monitoring
- Prediction or forecasting.

Multiple linear regression uses many variables to predict the outcome of a dependent variable. It can account for nonlinear relationships and interactions between variables in ways that simple linear regression can't. And it does so with greater accuracy!

### 3. What is the formula for multiple linear regression?

MLR formula look like : y = a + bx1 + cx2 + dx3 + …….

The coefficients tell you exactly how much each independent variable contributes to the dependent variable and how much each independent variable contributes in isolation.

For example, if you had two independent variables (x1 and x2), then the coefficient for x1 would tell you how strongly each unit change in x1 affects y—and likewise for x2.

### 4. What are the assumptions for multiple linear regression?

To ensure that your data is appropriate for the linear regression analysis, you need to make sure that it meets the following five conditions:

- A linear relationship between the dependent and independent variables.
- The independent variables are not highly correlated with each other.
- The variance of the residuals is constant.
- Independence of observation (that is, each observation should have been collected independently).
- Multivariate normality (that is, all variables should be normally distributed).

### 5. What Makes a Multiple Regression Multiple?

The word "multiple" refers to a regression in which more than one independent variable is used to predict a single dependent variable. Many regression analysis incorporate many predictors to capture the complexity of real-world scenarios where an outcome is influenced by multiple factors at once, in contrast to simple regression, which employs only one predictor. As a result, the relationships within the data may be understood more thoroughly, considering the cumulative impact of all the factors included. Numerous regression is a potent tool for statistical analysis and machine learning applications since it can yield more accurate predictions and insights when innumerable factors are included.

### 6. Why Would One Use a Multiple Regression Over a Simple OLS Regression?

When numerous independent variables affect the dependent variable, a multiple regression is preferable to a basic OLS (Ordinary Least Squares) regression. The analysis of a single predictor's connection to the result variable is the only one that simple OLS regression can do, which may only partially represent the complexity of the data. Contrarily, multiple regression enables the inclusion of several predictors, resulting in a more complete model that can consider the combined impacts of different elements. As a result, forecasts and insights become more accurate and trustworthy because they better capture the real-world situation in which several variables often impact outcomes. Furthermore, multiple regression can be used to determine the relative significance of each predictor, providing a more in-depth understanding of the connections within the data.

### 7. Can I Do a Multiple Regression by Hand?

It is possible to carry out a multiple regression by hand, but it is a complex and drawn-out procedure. The procedures entail computing several variables, including the covariances between each pair of variables and the means and variances of the independent and dependent variables. Afterward, you build a set of linear equations to find the regression coefficients. Matrix algebra is usually needed to manage calculations efficiently. Multiple regression analysis can be done more precisely and effectively using statistical software, programming languages like R and Python, or tools like Excel, given the possibility of error and the laborious nature of these computations, especially with more extensive datasets.

### 8. What Does It Mean for a Multiple Regression to Be Linear?

Being linear in multiple regression refers to the model's assumption that a straight line best represents the relationship between the dependent variable (what you're attempting to predict) and the independent variables (factors you believe impact the outcome). The model implies that, similar to points on a graph following a straight line, changes in the independent variables cause proportionate changes in the dependent variable. This contrasts with non-linear regression, in which a more complex or curved relationship may exist.

Stay ahead of the tech-game with our Caltech Post Graduate Program In AI And Machine Learning in partnership with Caltech. Explore more!

## Conclusion

Multiple linear regression is a statistical technique that uses multiple linear regression to model more complex relationships between two or more independent variables and one dependent variable. It is used when there are two or more x variables.

Are you looking for a career in AI, Machine Learning, Deep Learning, and Computer Vision?

You can do it with our intensive Post Graduate Program In AI And Machine Learning. We offer this program in collaboration with IBM and Purdue University and include live sessions from outside experts, laboratories, and business projects.