Online SAS Tools and Excel Training: Predictive Modeling Techniques


In this slide, we will look at a graph that illustrates the concept of linear regression. In this graph, the dependent variable Y is plotted on the Y axis and the independent variable is plotted on the X axis. The dots in orange are the observed values of the dependent variable. The fitted linear regression line is drawn in blue. Let us now look at the graph in detail.
For the given scatter plot data, a regression line has been fitted. The regression line is the line with the smallest possible set of distances between itself and each data point. As you can see, the regression line touches some data points, but not others.
The slope of the line, as given by the coefficient beta, is shown here as a blue dotted line.
The intercept, alpha, is the distance between the origin and the regression line, as shown in the figure.
The line is fitted using the equation mentioned in the previous slide, that is, y equals alpha plus beta times x plus epsilon.
For an input x i, the predicted value is shown here, from the line. The observed value for x i can be seen from the orange dot. The difference between the observed and predicted value, is denoted by epsilon, or the random error.
In the next slide, we will look at the coefficient of determination for linear regression.

The coefficient of determination, denoted by R squared, is a measure of goodness of fit, that is, how well a statistical model, like a line or curve, fits the data. R squared is calculated in different ways for different models. For a simple linear regression, R squared is calculated as the square of correlation coefficient between the observed and predicted values.
We will now look at how different values of R squared are interpreted. In the first figure, the line is perfectly horizontal, and the R squared is zero, which implies no linear relationship.
In the second figure, the R squared value is -1, implying a negative linear relationship.
The last figure denotes an R squared value of +1, denoting a positive linear relationship.
In the next slide, we will look at ways to determine the goodness of a model.