Regression is a tool that allows you to estimate how the dependent variable changes as the independent variable(s) change.

Regression models describe the relationship between variables by fitting a line to the observed data. Linear regression models use a straight line, while logistic and nonlinear regression models use a curved line.

Regression models can be used for many purposes:

  • Evaluating the effect of an independent variable on a dependent variable.
  • Forecasting future values of the dependent variable based on prior observations of both variables.

PCP in AI and Machine Learning

In Partnership with Purdue UniversityExplore Course
PCP in AI and Machine Learning

What Is Simple Linear Regression?

Simple linear regression is a statistical method for establishing the relationship between two variables using a straight line. The line is drawn by finding the slope and intercept, which define the line and minimize regression errors.

The simplest form of simple linear regression has only one x variable and one y variable. The x variable is the independent variable because it is independent of what you try to predict the dependent variable. The y variable is the dependent variable because it depends on what you try to predict.

y = β0 +β1x+ε is the formula used for simple linear regression.

  • y is the predicted value of the dependent variable (y) for any given value of the independent variable (x).
  • B0 is the intercept, the predicted value of y when the x is 0.
  • B1 is the regression coefficient – how much we expect y to change as x increases.
  • x is the independent variable ( the variable we expect is influencing y).
  • e is the error of the estimate, or how much variation there is in our regression coefficient estimate.

Simple linear regression establishes a line that fits your data, but it does not guarantee that the line is good enough. For example, if your data points have an upward trend and are very far apart, then simple linear regression will give you a downward-sloping line, which will not match your data.

Simple Linear Regression vs. Multiple Linear Regression

When predicting a complex process's outcome, it's best to use multiple linear regression instead of simple linear regression. But it is not necessary to use complex algorithms for simple problems. 

A simple linear regression can accurately capture the relationship between two variables in simple relationships. But when dealing with more complex interactions that require more thought, you need to switch from simple to multiple regression.

A multiple regression model uses more than one independent variable. It does not suffer from the same limitations as the simple regression equation, and it is thus able to fit curved and non-linear relationships. 

FREE Machine Learning Certification Course

To become a Machine Learning EngineerExplore Course
FREE Machine Learning Certification Course

Implementation of Simple Linear Regression Algorithm using Python

import numpy as np

import matplotlib.pyplot as plt

def estimate_coef(x, y):

# number of observations/points

n = np.size(x)

# mean of x and y vector

m_x = np.mean(x)

m_y = np.mean(y)

# calculating cross-deviation and deviation about x

SS_xy = np.sum(y*x) - n*m_y*m_x

SS_xx = np.sum(x*x) - n*m_x*m_x

# calculating regression coefficients

b_1 = SS_xy / SS_xx

b_0 = m_y - b_1*m_x

return (b_0, b_1)

def plot_regression_line(x, y, b):

# plotting the actual points as a scatter plot

plt.scatter(x, y, color = "m",

marker = "o", s = 30)

# predicted response vector

y_pred = b[0] + b[1]*x

# plotting the regression line

plt.plot(x, y_pred, color = "g")

# putting labels

plt.xlabel('x')

plt.ylabel('y')

# function to show plot

plt.show()

def main():

# observations / data

x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])

# estimating coefficients

b = estimate_coef(x, y)

print("Estimated coefficients:\nb_0 = {} \

\nb_1 = {}".format(b[0], b[1]))

# plotting regression line

plot_regression_line(x, y, b)

if __name__ == "__main__":

main()

Assumptions of Simple Linear Regression

Linearity

The relationship between x and y should be linear. It means that, as one value increases, the other increases correspondingly. The scatterplot should show this linearity.

Independent of Errors

It is essential to check if your data are independent of errors. If there is a relationship between the residuals and the variable, this could cause problems with your model. To check the independence of errors, examine a scatterplot of “residuals versus fits”; it should not look like there is a relationship.

Normal Distribution

It is also essential to check if your data are normally distributed. To do this, examine a histogram of the residuals; it should be approximately normally distributed. The histogram should also show that most of your observations are close to 0 or 1 (the max/min values). It will help you make sure that your model is accurate and reliable. 

Variance Equality

Finally, it is essential to check if your data have equal variances. To do this, examine a scatterplot and look for any outliers or points that seem far from each other in conflict (you can also use statistics software like Minitab or Excel). If there are outliers or points with high variance compared to others.

Free Course: Machine Learning Algorithms

Learn the Basics of Machine Learning AlgorithmsEnroll Now
Free Course: Machine Learning Algorithms

Our Learners Also Asked

1. What is simple linear regression, and when do we use it?

Simple linear regression is a statistical method that you can use to estimate the relationship between two quantitative variables. It is most frequently used in situations where there is a linear relationship between them. 

Simple linear regression may capture this relationship well when it is straightforward and clear-cut. Still, it may not be able to do so when the data are noisy or otherwise difficult to interpret.

2. What is the difference between simple regression and simple linear regression?

Regression is a tool that allows you to estimate how the dependent variable changes as the independent variable(s) change.

Regression models describe the relationship between variables by fitting a line to the observed data. Linear regression models use a straight line, while logistic and nonlinear regression models use a curved line.

The simple linear regression model assumes that there is only one independent variable. The basic form of this model is: y = β0 +β1x+ε

3. What are the steps of simple linear regression?

In the world of statistics, linear regression analysis is a staple. But just because you know how to do it doesn't mean you understand what it's all about.

Linear regression analysis involves more than just fitting a linear line through a cloud of data points. It consists of 3 phases:

  1. Analyzing the correlation and directionality of the data.
  2. Estimating the model, i.e., fitting the line.
  3. Evaluating the validity and usefulness of the model.

If you're performing any statistical analysis, these three phases are vital to understanding what you're doing and why it matters!

4. What is a simple linear regression, for example?

Using a straight line, simple linear regression establishes the relationship between two variables - dependent and independent.

Independent and dependent variables are terms used to describe the relationship between two or more variables. 

One variable is called the independent variable, and its value determines the value of the other variable. The other variable is called the dependent variable, and its value depends on the value of the other variable.

An example, if you wanted to know what a person's salary would be based on their experience with that company, then the experience would be the independent variable, and compensation would be the dependent variable.

5. What are the assumptions of simple linear regression?

Simple linear regression is a parametric test that makes certain assumptions about the data. These assumptions are:

  1. Homoscedasticity: The variance of each observation should be constant throughout the range of x-values.
  2. Independence of observations: The probability distribution for each observation should be independent of all observations in the sample.
  3. Normality: The distribution of residuals should be approximately normal when plotted against their standard errors.
  4. The relationship between the independent and dependent variables is linear.
Do you wish to accelerate your AL and ML career? Join our PG Program in AI and Machine Learning and gain access to 25+ industry relevant projects, career mentorship and more.

Conclusion

Simple linear regression is an approach for predicting a response using a single feature. It is a basic technique that can be used to analyze data from a wide range of fields.

You can do it with our intensive Post Graduate Program In AI And Machine Learning. We offer this program in collaboration with IBM and Purdue University and include live sessions from outside experts, laboratories, and business projects.

The program is designed for professionals who want to learn about artificial intelligence, machine learning, and deep learning technologies. This course is suitable for both beginners and experienced professionals who wish to improve their skills in these areas.

About the Author

SimplilearnSimplilearn

Simplilearn is one of the world’s leading providers of online training for Digital Marketing, Cloud Computing, Project Management, Data Science, IT, Software Development, and many other emerging technologies.

View More
  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.