This is the ‘Regression’ tutorial and is part of the Machine Learning course offered by Simplilearn. We will learn Regression and Types of Regression in this tutorial.
Let us look at the objectives below covered in this Regression tutorial.
In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships among variables.
Let us look at the types of Regression below:
Linear Regression is the statistical model used to predict the relationship between independent and dependent variables by examining two factors. The first one is which variables, in particular, are significant predictors of the outcome variable and the second one is how significant is the regression line to make predictions with the highest possible accuracy.
Linear Regression Formula
Linear regression is a linear approach for modeling the relationship between a scalar dependent variable y and an independent variable x.
where x, y, w are vectors of real numbers and w is a vector of weight parameters. The equation is also written as: y = wx + b, where b is the bias or the value of output for zero input.
Let's look at Linear Regression example:
If you had to invest in a company, you would definitely like to know how much money you could expect to make. Let’s take a look at a venture capitalist firm and try to understand which companies they should invest in. First, we need to figure out:
Now that we have our company’s data for different expenses, marketing, location and the kind of administration, we would like to calculate the profit based on all this different information.
Let's consider a single variable-R&D and find out which companies to invest in. We will now be plotting the profit based on the R&D expenditure and how much money they put into the research and development and then we will look at the profit that goes with that.
We have to draw a line through the data and when you look at that you can see how much they have invested in the R&D and how much profit it is going to make. We can also observe that the company that is spending more on R&D make good profits and thereby we invest in the ones that spend a higher rate in their R&D.
Few applications of Linear Regression mentioned below are:
It is a statistical technique used to predict the outcome of a response variable through several explanatory variables and model the relationships between them. It represents line fitment between multiple inputs and one output, typically:
y = w1x1 + w2x2 + b
Polynomial regression is applied when data is not formed in a straight line. It is used to fit a linear model to non-linear data by creating new features from powers of non-linear features. Example: Quadratic features
x2 ’ = x2 2
y = w1x1 + w2x2 2 + 6 = w1x1 + w2x2 ’ + 6
Know more about Regression and its types. Click here!
A decision tree is a graphical representation of all the possible solutions to a decision based on a few conditions.
The algorithms involved in Decision Tree Regression are mentioned below.
Decision Trees can perform regression tasks. The following is a decision tree on a noisy quadratic dataset:
Let us look at the steps to perform Regression using Decision Trees.
The regression plot is shown below. Notice that predicted value for each region is the average of the values of instances in that region
Let us understand Regularization in detail below.
Given below are some of the features of Regularization.
The table below explains some of the functions and their tasks.
max_depth |
limit the maximum depth of the tree |
min_samples_split |
the minimum number of samples a node must have before it can be split |
Min_samples_leaf |
the minimum number of samples a leaf node must have |
Min_weight_fraction_leaf |
same as min_samples_leaf but expressed as a fraction of total instances |
max_leaf_nodes |
maximum number of leaf nodes |
max_features |
maximum number of features that are evaluated for splitting at each node |
To achieve regression task, the CART algorithm follows the logic as in classification; however, instead of trying to minimize the leaf impurity, it tries to minimize the MSE or the mean square error, which represents the difference between observed and target output – (y-y’)2 ”
J(k, tk ) represents the total loss function that one wishes to minimize. It is the sum of weighted (by a number of samples) MSE for the left and right node after the split.
Ensemble Learning uses the same algorithm multiple times or a group of different algorithms together to improve the prediction of a model. Random Forests use an ensemble of decision trees to perform regression tasks.
Random decision forest is a method that operates by constructing multiple decision trees, and the random forest chooses the decision of the majority of the trees as the final decision.
Let us look at the applications of Random Forest below:
Remote Sensing
Used in the ETM devices to look at images of the Earth's surface. The accuracy is higher and training time is less than many other machine learning tools.
Object Detection
Multi-class object detection is done using random forest algorithms and it provides a better detection in complicated environments.
Kinect
They are used as a random forest as part of the game, and it tracks the body movements along with it recreates the game.
Let us look at the Algorithm steps for Random Forest below.
Pick any random K data points from the dataset
Build a decision tree from these K points
Choose the number of trees you want (N) and repeat steps 1 and 2
For a new data point, average the value of y predicted by all the N trees. This is the predicted value.
Mean-squared error (MSE) is used to measure the performance of a model.
The objective is to design an algorithm that decreases the MSE by adjusting the weights w during the training session.
The above function is also called the LOSS FUNCTION or the COST FUNCTION. The value needs to be minimized.
Find parameters θ that minimize the least squares (OLS) equation, also called Loss Function:
This decreases the difference between observed output [h(x)] and desired output [y].
There are two ways to learn the parameters:
Normal Equation: Set the derivative (slope) of the Loss function to zero (this represents minimum error point).
LMS Algorithm: The minimization of the MSE loss function, in this case, is called LMS (least mean squared) rule or Widrow-Hoff learning rule. This typically uses the Gradient Descent algorithm.
To minimize MSEtrain, solve the areas where the gradient (or slope ) with respect to weight w is 0.
This can be simplified as: w = (XT .X)-1 .XT .y This is called the Normal Equation.
In the case of Linear Regression, the hypotheses are represented as:
Where θi ’s are parameters (or weights). θi ’s can also be represented as θ0*x0 where x0 = 1, so:
The cost function (also called Ordinary Least Squares or OLS) defined is essentially MSE – the ½ is just to cancel out the 2 after derivative is taken and is less significant.
It is advisable to start with random θ. Then repeatedly adjust θ to make J(θ) smaller.
Gradient descent is an algorithm used to minimize the loss function.
The J(θ) in dJ(θ)/dθ represents the cost function or error function that you wish to minimize, for example, OLS or (y-y')2.
Minimizing this would mean that y' approaches y. In other words, observed output approaches the expected output. Other examples of loss or cost function include cross-entropy, that is, y*log(y’), which also tracks the difference between y and y‘.
Steps required to plot a graph are mentioned below.
Calculate the derivative term for one training sample (x, y) to begin with.
Update rule for one training sample
The algorithm moves from outward to inward to reach the minimum error point of the loss function bowl.
Extend the rule for more than one training sample:
In this type of gradient descent, (also called incremental gradient descent), one updates the parameters after each training sample is processed.
The graph shows how the weight adjustment with each learning step brings down the cost or the loss function until it converges to a minimum cost.
An epoch refers to one pass of the model training loop.
Get ahead with Machine Learning. Click for course description!
Let us understand Regularization in detail below.
Steps to Regularize a model are mentioned below.
Let us quickly go through what you have learned so far in this Regression tutorial.
This concludes “Regression” tutorial. The next lesson is "Classification."
Name | Date | Place | |
---|---|---|---|
Machine Learning | 22 Jun -27 Jul 2019, Weekend batch | Your City | View Details |
Machine Learning | 29 Jun -3 Aug 2019, Weekend batch | New York City | View Details |
A Simplilearn representative will get back to you in one business day.