Machine Learning models require a high level of accuracy to work in the actual world. But how do you calculate how wrong or right your model is? This is where the cost function comes into the picture. A machine learning parameter that is used for correctly judging the model, cost functions are important to understand to know how well the model has estimated the relationship between your input and output parameters.

## What Is Cost Function in Machine Learning?

After training your model, you need to see how well your model is performing. While accuracy functions tell you how well the model is performing, they do not provide you with an insight on how to better improve them. Hence, you need a correctional function that can help you compute when the model is the most accurate, as you need to hit that small spot between an undertrained model and an overtrained model.

A Cost Function is used to measure just how wrong the model is in finding a relation between the input and output. It tells you how badly your model is behaving/predicting

Consider a robot trained to stack boxes in a factory. The robot might have to consider certain changeable parameters, called Variables, which influence how it performs. Let’s say the robot comes across an obstacle, like a rock. The robot might bump into the rock and realize that it is not the correct action.

It will learn from this, and next time it will learn to avoid rocks. Hence, your machine uses variables to better fit the data. The outcome of all these obstacles will further optimize the robot and help it perform better. It will generalize and learn to avoid obstacles in general, say like a fire that might have broken out. The outcome acts as a cost function, which helps you optimize the variable, to get the best variables and fit for the model.

Figure 1: Robot learning to avoid obstacles

## What Is Gradient Descent?

Gradient Descent is an algorithm that is used to optimize the cost function or the error of the model. It is used to find the minimum value of error possible in your model.

Gradient Descent can be thought of as the direction you have to take to reach the least possible error. The error in your model can be different at different points, and you have to find the quickest way to minimize it, to prevent resource wastage.

Gradient Descent can be visualized as a ball rolling down a hill. Here, the ball will roll to the lowest point on the hill. It can take this point as the point where the error is least as for any model, the error will be minimum at one point and will increase again after that.

In gradient descent, you find the error in your model for different values of input variables. This is repeated, and soon you see that the error values keep getting smaller and smaller. Soon you’ll arrive at the values for variables when the error is the least, and the cost function is optimized.

Figure 2: Gradient Descent

## What Is the Cost Function For Linear Regression?

A Linear Regression model uses a straight line to fit the model. This is done using the equation for a straight line as shown :

Figure 3: Linear regression function

In the equation, you can see that two entities can have changeable values (variable) a, which is the point at which the line intercepts the x-axis, and b, which is how steep the line will be, or slope.

At first, if the variables are not properly optimized, you get a line that might not properly fit the model. As you optimize the values of the model, for some variables, you will get the perfect fit. The perfect fit will be a straight line running through most of the data points while ignoring the noise and outliers. A properly fit Linear Regression model looks as shown below :

Figure 4: Linear regression graph

For the Linear regression model, the cost function will be the minimum of the Root Mean Squared Error of the model, obtained by subtracting the predicted values from actual values. The cost function will be the minimum of these error values.

Figure 5: Linear regression cost function

By the definition of gradient descent, you have to find the direction in which the error decreases constantly. This can be done by finding the difference between errors. The small difference between errors can be obtained by differentiating the cost function and subtracting it from the previous gradient descent to move down the slope.

Figure 6: Linear regression gradient descent function

After substituting the value of the cost function (J) in the above equation, you get :

Figure 7: Linear regression gradient descent function simplified

In the above equations, a is known as the learning rate. It decides how fast you move down the slope. If alpha is large, you take big steps, and if it is small; you take small steps. If alpha is too large, you can entirely miss the least error point and our results will not be accurate. If it is too small it will take too long to optimize the model and you will also waste computational power. Hence you need to choose an optimal value of alpha.

Figure 8: (a) Large learning rate, (b) Small learning rate, (c) Optimum learning rate

## What Is the Cost Function for Neural Networks?

A neural network is a machine learning algorithm that takes in multiple inputs, runs them through an algorithm, and essentially sums the output of the different algorithms to get the final output.

The cost function of a neural network will be the sum of errors in each layer. This is done by finding the error at each layer first and then summing the individual error to get the total error. In the end, it can represent a neural network with cost function optimization as :

Figure 9: Neural network with the error function

For neural networks, each layer will have a cost function, and each cost function will have its own least minimum error value. Depending on where you start, you can arrive at a unique value for the minimum error. You need to find the minimum value out of all local minima. This value is called the global minima.

Figure 10: Cost function graph for Neural Networks

The cost function for neural networks is given as :

Figure 11: Cost function for Neural Networks

Gradient descent is just the differentiation of the cost function. It is given as :

Figure 12: Gradient descent for Neural Networks

## How to Implement Cost Functions in Python?

You have looked at what a cost function is and the formulae required to find the cost function for different algorithms. Now let’s implement cost functions using Python. For this, you must take a numpy array of random numbers as our data.

Start by importing important modules.

Figure 13: Importing necessary modules

Now, let’s load up the data.

Figure 14: Importing data

The numpy array is a 2-D array with random points. Each element of the array corresponds to an x and y coordinate. Here, x is the input and y is the output required. Let’s separate these points and plot them.

Figure 15: Plotting the data

Now, let's set our theta value and store the y values in a different array so we can predict the x values.

Figure 16: Setting theta values and separating x and y

Let’s initialize the ‘m’ and ‘b’ values along with the learning rate.

Figure 17: Setting learning parameters

Using mathematical operations, find the cost function value for our inputs.

Figure 18: Finding cost function

Using the cost function, you can update the theta value.

Figure 19: Updating theta value

Now, find the gradient descent and print the updated value of theta at every iteration.

Figure 20: Finding gradient descent

On plotting the gradient descent, you can see the decrease in the loss at each iteration.

Figure 21: Plotting gradient descent

Enhance your skill set and give a boost to your career with the Post Graduate Program in AI and Machine Learning.

## Conclusion

In this article titled ‘Cost Function in Machine Learning: The important parameter you must know about’, you saw the important machine learning parameter, the cost function, and tell you why it is important. Then you explored the gradient descent, which can be used to optimize the cost function. You then look at cost functions for linear regression and neural networks. Finally, you saw how the cost functions in machine learning can be implemented from scratch in Python.

We hope this article taught you all that you need to know about cost functions in machine learning. If you are serious about building a career in machine learning, the Post Graduate Program in AI and Machine Learning from Simplilearn should be your next step. Rated #1 AI and ML course by TechGig and delivered in partnership with Purdue & in collaboration with IBM, this bootcamp program will fulfill all your learning needs and equip you with the most in-demand skills you need to become a machine learning expert today.

Do you have any doubts or questions for us? Mention them in this article's comments section, and we'll have our experts answer them for you at the earliest!