Some machine learning algorithms work on data that has the input values song with the corresponding output values. These algorithms are called supervised learning algorithms. It only restricts their output value to the output values provided in the data. Two of the most commonly used supervised learning algorithms are Linear and Logistic Regression. In this tutorial titled ‘Understanding the difference between Linear vs. Logistic Regression’, you will see the working and the differences between these two algorithms.
What Is Regression?
Regression is a statistical method that allows you to predict a dependent output variable based on the values of independent input variables.
Regression, a type of supervised learning, finds the relationship between input and output values and, a given input data, to predict the output value. It does this by finding a mathematical, linear relationship between input and output values. It can have multiple inputs but has a single output.
You can understand regression better, using the diagram below. Using the given input variables or grocery ingredients, you can get a new output or dish. Here, Regression acts as a recipe used to find how these variables go together and the relationship between them.
Figure 1: Regression
What Is Classification?
Classification allows you to divide a given input into some pre-defined categories. The output is a discrete value, i.e., distinct, like 0/1, True/False, or a pre-defined output label class.
Simply put, classification is the process of segregating or classifying objects. It is a type of supervised learning method where input data is usually classified into output classes. It provides a mapping function to convert input values into known, discrete output classes. It can have multiple inputs and gives multiple outputs.
The diagram below clearly explains classification. Given a list of grocery items, you can separate them into different categories like vegetables, fruits, dairy products, groceries, etc., using classification.
Figure 2: Classification
What Is Linear Regression?
Consider the data points given below. The input variables, X, are called independent variables and are used to predict response values. They are unrelated values that have no relationship with each other. The output variable, Y, is called the dependent variable. Its value depends on the value of X. It is found by deriving a relationship between the input variables.
Figure 3: Data
Linear Regression finds the relationship between the input and output data by plotting a line that fits the input data and maps it onto the output. This line represents the mathematical relationship between the independent input variables and is called The Line of Best Fit. Ideally, it covers as many input variables as possible while leaving out the outliers or the noise. For your given data, the best fit is a straight line.
Figure 4: Linear Regression line of best fit
The equation which can be used to fit a line is the Equation of a Straight Line. The equation gives the output variable based on the input variable and inclination of the line. The line can be found using the following equation :
Figure 5: Equation of a straight line
In the above equation,
Y = Dependent Output Variable
X = Independent Input Variable
b0 = Y-intercept, or the point at which the line meets the y-axis
b1 = Slope, or the inclination of the line
e = Error term
Using this line, you can find the output value for a given input variable by extending a line from the X-axis onto the line of best fit and seeing the corresponding Y-axis term. Consider the data that is displayed below, which tells you the sales corresponding to the amount spent on advertising.
Figure 6: Advertising data
Using Linear Regression, you can plot the graph of Sales vs. Advertising, and find the line of best fit between them, and, using that, find the values of the missing variable.
Figure 7: Sales vs Advertising
Using regression, given the advertisement amount, you can predict how many sales will take place.
Figure 8: Prediction using Linear regression
What Is Logistic Regression?
Logistic Regression is a classification algorithm used to predict the category of a dependent variable based on the values of the independent variable. Its output is 0 or 1.
In Logistic Regression, the input data belongs to categories, which means multiple input values map onto the same output values. Using Logistic Regression, you can find the category that a new input value belongs to. Unlike Linear regression, Logistic Regression does not assume that the values are linearly correlated to one other. Consider the data below, which shows the input data mapped onto two output categories, 0 and 1.
Figure 9: Logistic Regression data
The data is plotted, and it draws a curve to represent the relationship between the points in our data, which joins the various classes in our output. To classify values into these two categories, you need to set a threshold value between them.
Figure 10: Setting a threshold
It maps the values of the input values onto a categorical variable depending on their position relative to the threshold value. Values of Y above this threshold will be classified as category 1, and it will take values below the threshold as category 0.
Logistic Regression finds the relationship between points by first plotting a curve between the output classes. This curve is called a sigmoid, and the given equation is used to represent a sigmoid function. Y is the probability of output, c is a constant, X is the various dependent variables, and b0, b1 gives you the intercept values.
Figure 12: Sigmoid Function
Suppose you have credit card numbers and their transaction history. You want to classify credit cards as fraudulent and legitimate. You can do this with Logistic Regression.
Linear vs. Logistic Regression: Differences
The table below lists the difference between these two supervised algorithms.
Table 1: Linear vs. Logistic Regression
Enhance your skill set and give a boost to your career with the Post Graduate Program in AI and Machine Learning.
In this tutorial titled ' Understanding the difference between Linear Vs. Logistic Regression, you took a look at the definition of Regression and classification. You then learned about Linear regression, a regression algorithm, and Logistic Regression, a classification algorithm. Finally, you explored the difference between these two algorithms.
We hope this helped you understand the difference between Linear and Logistic Regression. To learn more about regression and machine learning, check out Simplilearn’s Machine Learning Certification Course. If you have any questions or doubts, mention them in this article's comments section, and we'll have our experts answer them for you at the earliest!