Regression is a tool that allows you to estimate how the dependent variable changes as the independent variable(s) change.

Regression models describe the relationship between variables by fitting a line to the observed data. Linear regression models use a straight line, while logistic and nonlinear regression models use a curved line.

Regression models can be used for many purposes:

  • Evaluating the effect of an independent variable on a dependent variable.
  • Forecasting future values of the dependent variable based on prior observations of both variables.

What Is Simple Linear Regression?

Simple linear regression is a statistical method for establishing the relationship between two variables using a straight line. The line is drawn by finding the slope and intercept, which define the line and minimize regression errors.

The simplest form of simple linear regression has only one x variable and one y variable. The x variable is the independent variable because it is independent of what you try to predict the dependent variable. The y variable is the dependent variable because it depends on what you try to predict.

y = β0 +β1x+ε is the formula used for simple linear regression.

  • y is the predicted value of the dependent variable (y) for any given value of the independent variable (x).
  • B0 is the intercept, the predicted value of y when the x is 0.
  • B1 is the regression coefficient – how much we expect y to change as x increases.
  • x is the independent variable ( the variable we expect is influencing y).
  • e is the error of the estimate, or how much variation there is in our regression coefficient estimate.

Simple linear regression establishes a line that fits your data, but it does not guarantee that the line is good enough. For example, if your data points have an upward trend and are very far apart, then simple linear regression will give you a downward-sloping line, which will not match your data.

Simple Linear Regression vs. Multiple Linear Regression

When predicting a complex process's outcome, it's best to use multiple linear regression instead of simple linear regression. But it is not necessary to use complex algorithms for simple problems. 

A simple linear regression can accurately capture the relationship between two variables in simple relationships. But when dealing with more complex interactions that require more thought, you need to switch from simple to multiple regression.

A multiple regression model uses more than one independent variable. It does not suffer from the same limitations as the simple regression equation, and it is thus able to fit curved and non-linear relationships. 

Implementation of Simple Linear Regression Algorithm using Python

import numpy as np

import matplotlib.pyplot as plt

def estimate_coef(x, y):

# number of observations/points

n = np.size(x)

# mean of x and y vector

m_x = np.mean(x)

m_y = np.mean(y)

# calculating cross-deviation and deviation about x

SS_xy = np.sum(y*x) - n*m_y*m_x

SS_xx = np.sum(x*x) - n*m_x*m_x

# calculating regression coefficients

b_1 = SS_xy / SS_xx

b_0 = m_y - b_1*m_x

return (b_0, b_1)

def plot_regression_line(x, y, b):

# plotting the actual points as a scatter plot

plt.scatter(x, y, color = "m",

marker = "o", s = 30)

# predicted response vector

y_pred = b[0] + b[1]*x

# plotting the regression line

plt.plot(x, y_pred, color = "g")

# putting labels

plt.xlabel('x')

plt.ylabel('y')

# function to show plot

plt.show()

def main():

# observations / data

x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])

# estimating coefficients

b = estimate_coef(x, y)

print("Estimated coefficients:\nb_0 = {} \

\nb_1 = {}".format(b[0], b[1]))

# plotting regression line

plot_regression_line(x, y, b)

if __name__ == "__main__":

main()

Assumptions of Simple Linear Regression

Linearity

The relationship between x and y should be linear. It means that, as one value increases, the other increases correspondingly. The scatterplot should show this linearity.

Independent of Errors

It is essential to check if your data are independent of errors. If there is a relationship between the residuals and the variable, this could cause problems with your model. To check the independence of errors, examine a scatterplot of “residuals versus fits”; it should not look like there is a relationship.

Normal Distribution

It is also essential to check if your data are normally distributed. To do this, examine a histogram of the residuals; it should be approximately normally distributed. The histogram should also show that most of your observations are close to 0 or 1 (the max/min values). It will help you make sure that your model is accurate and reliable. 

Variance Equality

Finally, it is essential to check if your data have equal variances. To do this, examine a scatterplot and look for any outliers or points that seem far from each other in conflict (you can also use statistics software like Minitab or Excel). If there are outliers or points with high variance compared to others.

Our Learners Also Asked

1. What is simple linear regression, and when do we use it?

Simple linear regression is a statistical method that you can use to estimate the relationship between two quantitative variables. It is most frequently used in situations where there is a linear relationship between them. 

Simple linear regression may capture this relationship well when it is straightforward and clear-cut. Still, it may not be able to do so when the data are noisy or otherwise difficult to interpret.

2. What is the difference between simple regression and simple linear regression?

Regression is a tool that allows you to estimate how the dependent variable changes as the independent variable(s) change.

Regression models describe the relationship between variables by fitting a line to the observed data. Linear regression models use a straight line, while logistic and nonlinear regression models use a curved line.

The simple linear regression model assumes that there is only one independent variable. The basic form of this model is: y = β0 +β1x+ε

3. What are the steps of simple linear regression?

In the world of statistics, linear regression analysis is a staple. But just because you know how to do it doesn't mean you understand what it's all about.

Linear regression analysis involves more than just fitting a linear line through a cloud of data points. It consists of 3 phases:

  1. Analyzing the correlation and directionality of the data.
  2. Estimating the model, i.e., fitting the line.
  3. Evaluating the validity and usefulness of the model.

If you're performing any statistical analysis, these three phases are vital to understanding what you're doing and why it matters!

4. What is a simple linear regression, for example?

Using a straight line, simple linear regression establishes the relationship between two variables - dependent and independent.

Independent and dependent variables are terms used to describe the relationship between two or more variables. 

One variable is called the independent variable, and its value determines the value of the other variable. The other variable is called the dependent variable, and its value depends on the value of the other variable.

An example, if you wanted to know what a person's salary would be based on their experience with that company, then the experience would be the independent variable, and compensation would be the dependent variable.

5. What are the assumptions of simple linear regression?

Simple linear regression is a parametric test that makes certain assumptions about the data. These assumptions are:

  1. Homoscedasticity: The variance of each observation should be constant throughout the range of x-values.
  2. Independence of observations: The probability distribution for each observation should be independent of all observations in the sample.
  3. Normality: The distribution of residuals should be approximately normal when plotted against their standard errors.
  4. The relationship between the independent and dependent variables is linear.

Choose the Right Program

Master the future of technology with Simplilearn's AI and ML courses. Discover the power of artificial intelligence and machine learning and gain the skills you need to excel in the industry. Choose the right program and unlock your potential today. Enroll now and pave your way to success!

Program Name

AI Engineer

Post Graduate Program In Artificial Intelligence

Post Graduate Program In Artificial Intelligence

Geo All Geos All Geos IN/ROW
University Simplilearn Purdue Caltech
Course Duration 11 Months 11 Months 11 Months
Coding Experience Required Basic Basic No
Skills You Will Learn 10+ skills including data structure, data manipulation, NumPy, Scikit-Learn, Tableau and more. 16+ skills including
chatbots, NLP, Python, Keras and more.
8+ skills including
Supervised & Unsupervised Learning
Deep Learning
Data Visualization, and more.
Additional Benefits Get access to exclusive Hackathons, Masterclasses and Ask-Me-Anything sessions by IBM
Applied learning via 3 Capstone and 12 Industry-relevant Projects
Purdue Alumni Association Membership Free IIMJobs Pro-Membership of 6 months Resume Building Assistance Upto 14 CEU Credits Caltech CTME Circle Membership
Cost $$ $$$$ $$$$
Explore Program Explore Program Explore Program

Conclusion

Simple linear regression is an approach for predicting a response using a single feature. It is a basic technique that can be used to analyze data from a wide range of fields.

You can do it with our intensive Caltech Post Graduate Program in AI & ML. This program include live sessions from outside experts, laboratories, and business projects.

The program is designed for professionals who want to learn about artificial intelligence, machine learning, and deep learning technologies. This course is suitable for both beginners and experienced professionals who wish to improve their skills in these areas.

Our AI & Machine Learning Courses Duration And Fees

AI & Machine Learning Courses typically range from a few weeks to several months, with fees varying based on program and institution.

Program NameDurationFees
Generative AI for Business Transformation

Cohort Starts: 24 Apr, 2024

4 Months$ 3,350
Post Graduate Program in AI and Machine Learning

Cohort Starts: 25 Apr, 2024

11 Months$ 4,800
AI & Machine Learning Bootcamp

Cohort Starts: 6 May, 2024

6 Months$ 10,000
Applied Generative AI Specialization

Cohort Starts: 21 May, 2024

4 Months$ 4,000
AI and Machine Learning Bootcamp - UT Dallas6 Months$ 8,000
Artificial Intelligence Engineer11 Months$ 1,449

Learn from Industry Experts with free Masterclasses

  • Future-Proof Your AI/ML Career: Top Dos and Don'ts for 2024

    AI & Machine Learning

    Future-Proof Your AI/ML Career: Top Dos and Don'ts for 2024

    5th Dec, Tuesday9:00 PM IST
  • Industry Trends: Popular AI Tools and Frameworks to Look Out For in 2024!

    AI & Machine Learning

    Industry Trends: Popular AI Tools and Frameworks to Look Out For in 2024!

    8th Nov, Wednesday7:00 PM IST
  • Skyrocket your AI/ML Career in 2024 with IIT Kanpur

    AI & Machine Learning

    Skyrocket your AI/ML Career in 2024 with IIT Kanpur

    30th Jan, Tuesday9:00 PM IST
prevNext