Machine learning models are the mathematical engines that drive Artificial Intelligence and thus are highly vital for successful AI implementation. In fact, you could say that your AI is only as good as the machine models that drive them.
So, now convinced of the importance of a good machine learning model, you apply yourself to the task, and after some hard work, you finally create what you believe to be a great machine learning model. Congratulations!
But wait. How can you tell if your machine learning model is as good as you believe it is? Clearly, you need an objective means of measuring your machine learning model’s performance and determining if it’s good enough for implementation. It would help if you had a ROC curve.
This article has everything you need to know about ROC curves. We will define ROC curves and the term “area under the ROC curve,” how to use ROC curves in performance modeling, and a wealth of other valuable information. We begin with some definitions.
What Is a ROC Curve?
A ROC (which stands for “receiver operating characteristic”) curve is a graph that shows a classification model performance at all classification thresholds. It is a probability curve that plots two parameters, the True Positive Rate (TPR) against the False Positive Rate (FPR), at different threshold values and separates a so-called ‘signal’ from the ‘noise.’
The ROC curve plots the True Positive Rate against the False Positive Rate at different classification thresholds. If the user lowers the classification threshold, more items get classified as positive, which increases both the False Positives and True Positives. You can see some imagery regarding this here.
What Is a ROC Curve: AUC — Area Under the ROC Curve
AUC is short for "Area Under the ROC Curve," which measures the whole two-dimensional area located underneath the entire ROC curve from (0,0) to (1,1). The AUC measures the classifier's ability to distinguish between classes. It is used as a summary of the ROC curve. The higher the AUC, the better the model can differentiate between positive and negative classes. AUC supplies an aggregate measure of the model's performance across all possible classification thresholds.
Here’s how Google explains it.
Model creators want AUC for two chief reasons:
- AUC is scale-invariant. The AUC measures how well the predictions were ranked instead of measuring their absolute values.
- AUC is classification-threshold-invariant, meaning it measures the quality of the model's predictions regardless of the classification threshold.
However, AUC has its downsides, which manifest in certain situations:
- Scale invariance is not always wanted. For instance, sometimes, the situation calls for well-calibrated probability outputs, and AUC doesn’t deliver that.
- Classification-threshold invariance isn't always wanted, especially in cases that show wide disparities in the cost of false negatives compared to false positives. Instead, it may be essential to minimize only one type of classification error. For instance, when designing a model that performs email spam detection, you probably want to prioritize minimizing false positives, despite resulting in a notable increase of false negatives. Unfortunately, AUC isn't a good metric for this kind of optimization.
What Is a ROC Curve: How Do You Speculate Model Performance?
AUC is a valuable tool for speculating model performance. An excellent model has its AUC close to 1, indicating a good separability measure. Consequently, a poor model's AUC leans closer to 0, showing the worst separability measure. In fact, the proximity to 0 means it reciprocates the result, predicting the negative class as positive and vice versa, showing 0s as 1s and 1s as 0s. Finally, if the AUC is 0.5, it shows that the model has no class separation capacity at all.
So, when we have a 0.5<AUC<1 result, there’s a high likelihood that the classifier can distinguish between the positive class values and the negative class values. That’s because the classifier can detect more numbers of True Positives and Negatives instead of False Negatives and Positives.
The Relation Between Sensitivity, Specificity, FPR, and Threshold
Before we examine the relation between Specificity, FPR, Sensitivity, and Threshold, we should first cover their definitions in the context of machine learning models. For that, we'll need a confusion matrix to help us to understand the terms better. Here is an example of a confusion matrix:
TP stands for True Positive, and TN means True Negative. FP stands for False Positive, and FN means False Negative.
- Sensitivity: Sensitivity, also termed "recall," is the metric that shows a model's ability to predict the true positives of all available categories. It shows what proportion of the positive class was classified correctly. For example, when trying to figure out how many people have the flu, sensitivity, or True Positive Rate, measures the proportion of people who have the flu and were correctly predicted as having it.
Here’s how to mathematically calculate sensitivity:
Sensitivity = (True Positive)/(True Positive + False Negative)
- Specificity: The specificity metric Specificity evaluates a model's ability to predict true negatives of all available categories. It shows what proportion of the negative class was classified correctly. For example, specificity measures the proportion of people who don't have the flu and were correctly predicted as not suffering from it in our flu scenario.
Here’s how to calculate specificity:
Specificity = (True Negative)/(True Negative + False Positive)
- FPR: FPR stands for False Positive Rate and shows what proportion of the negative class was incorrectly classified. This formula shows how we calculate FPR:
FPR= 1 – Specificity
- Threshold: The threshold is the specified cut-off point for an observation to be classified as either 0 or 1. Typically, an 0.5 is used as the default threshold, although it’s not always assumed to be the case.
Sensitivity and specificity are inversely proportional, so if we boost sensitivity, specificity drops, and vice versa. Furthermore, we net more positive values when we decrease the threshold, thereby raising the sensitivity and lowering the specificity.
On the other hand, if we boost the threshold, we will get more negative values, which results in higher specificity and lower sensitivity.
And since the FPR is 1 – specificity, when we increase TPR, the FPR also increases and vice versa.
How to Use the AUC - ROC Curve for the Multi-Class Model
We can use the One vs. ALL methodology to plot the N number of AUC ROC Curves for N number classes when using a multi-class model. One vs. ALL gives us a way to leverage binary classification. If you have a classification problem with N possible solutions, One vs. ALL provides us with one binary classifier for each possible outcome.
So, for example, you have three classes named 0, 1, and 2. You will have one ROC for 0 that’s classified against 1 and 2, another ROC for 1, which is classified against 0 and 2, and finally, the third one of 2 classified against 0 and 1.
We should take a moment and explain the One vs. ALL methodology to better answer the question “what is a ROC curve?”. This methodology is made up of N separate binary classifiers. The model runs through the binary classifier sequence during training, training each to answer a classification question. For instance, if you have a cat picture, you can train four different recognizers, one seeing the image as a positive example (the cat) and the other three seeing a negative example (not the cat). It would look like this:
- Is this image a rutabaga? No
- Is this image a cat? Yes
- Is this image a dog? No
- Is this image a hammer? No
This methodology works well with a small number of total classes. However, as the number of classes rises, the model becomes increasingly inefficient.
Acelerate your career in AI and ML with the AI and Machine Learning Courses with Purdue University collaborated with IBM.
Are You Interested in a Career in Machine Learning?
There’s a lot to learn about Machine Learning, as you can tell from this “what is a ROC curve” article! However, both machine learning and artificial intelligence are the waves of the future, so it’s worth acquiring skills and knowledge in these fields. Who knows? You could find yourself in an exciting machine learning career!
If you want a career in machine learning, Simplilearn can help you on your way. The AI and ML Certification offers students an in-depth overview of machine learning topics. You will learn to develop algorithms using supervised and unsupervised learning, work with real-time data, and learn about concepts like regression, classification, and time series modeling. You will also learn how Python can be used to draw predictions from data. In addition, the program features 58 hours of applied learning, interactive labs, four hands-on projects, and mentoring.
And since machine learning and artificial intelligence work together so frequently, check out Simplilearn’s Artificial Intelligence Engineer Master’s program, and cover all of your bases.
According to Glassdoor, Machine Learning Engineers in the United States enjoy an average annual base pay of $133,001. Payscale.com reports that Machine Learning Engineers in India can potentially earn ₹732,566 a year on average.
So visit Simplilearn today, and explore the rich possibilities of a rewarding vocation in the machine learning field!