Machine learning models are the mathematical engines that drive Artificial Intelligence and thus are highly vital for successful AI implementation. In fact, you could say that your AI is only as good as the machine models that drive them.

So, now convinced of the importance of a good machine learning model, you apply yourself to the task, and after some hard work, you finally create what you believe to be a great machine learning model. Congratulations!

But wait. How can you tell if your machine learning model is as good as you believe it is? Clearly, you need an objective means of measuring your machine learning model’s performance and determining if it’s good enough for implementation. It would help if you had a ROC curve.

This article has everything you need to know about ROC curves. We will define ROC curves and the term “area under the ROC curve,” how to use ROC curves in performance modeling, and a wealth of other valuable information. We begin with some definitions.

## What Is a ROC Curve?

A ROC (which stands for “receiver operating characteristic”) curve is a graph that shows a classification model performance at all classification thresholds. It is a probability curve that plots two parameters, the True Positive Rate (TPR) against the False Positive Rate (FPR), at different threshold values and separates a so-called ‘signal’ from the ‘noise.’

The ROC curve plots the True Positive Rate against the False Positive Rate at different classification thresholds. If the user lowers the classification threshold, more items get classified as positive, which increases both the False Positives and True Positives. You can see some imagery regarding this here.

## ROC Curve

An ROC (Receiver Operating Characteristic) curve is a graphical representation used to evaluate the performance of a binary classifier. It plots two key metrics:

- True Positive Rate (TPR): Also known as sensitivity or recall, it measures the proportion of actual positives correctly identified by the model. It is calculated as:

TPR=True Positives/(TP)True Positives (TP)+False Negatives - False Positive Rate (FPR): This measures the proportion of actual negatives incorrectly identified as positives by the model. It is calculated as:

FPR=False Positives (FP)/False Positives (FP)+True Negatives (TN)

The ROC curve plots TPR (y-axis) against FPR (x-axis) at various threshold settings. Here's a more detailed explanation of these metrics:

- True Positive (TP): The instance is positive, and the model correctly classifies it as positive.
- False Positive (FP): The instance is negative, but the model incorrectly classifies it as positive.
- True Negative (TN): The instance is negative, and the model correctly classifies it as negative.
- False Negative (FN): The instance is positive, but the model incorrectly classifies it as negative.

### Interpreting the ROC Curve

- A curve closer to the top left corner indicates a better-performing model.
- The diagonal line (from (0,0) to (1,1)) represents a random classifier.
- The area under the ROC curve (AUC) is a single scalar value that measures the model's overall performance. It ranges from 0 to 1, and a higher AUC indicates a better-performing model.

## Area Under the ROC Curve (AUC)

The Area Under the ROC Curve (AUC) is a single scalar value that summarizes the overall performance of a binary classification model. It measures the ability of the model to distinguish between the positive and negative classes. Here's what you need to know about AUC:

### Key Points About AUC

Range of AUC:

- The AUC value ranges from 0 to 1.
- An AUC of 0.5 indicates a model that performs no better than random chance.
- An AUC closer to 1 indicates a model with excellent performance.

Interpretation of AUC Values:

- 0.9 - 1.0: Excellent
- 0.8 - 0.9: Good
- 0.7 - 0.8: Fair
- 0.6 - 0.7: Poor
- 0.5 - 0.6: Fail

Advantages of Using AUC:

- Threshold Independent: AUC evaluates the model's performance across all possible classification thresholds.
- Scale Invariant: AUC measures how well the predictions are ranked rather than their absolute values.

Calculation of AUC:

- AUC is typically calculated using numerical integration methods, such as the trapezoidal rule, applied to the ROC curve.
- In practical terms, libraries like Scikit-learn in Python provide functions to compute AUC directly from model predictions and true labels.

## Key Terms Used in AUC and ROC Curve

### 1. True Positive (TP)

- Definition: The number of positive instances correctly identified by the model.
- Example: In a medical test, a TP is when the test correctly identifies a person with a disease as having the disease.

### 2. True Negative (TN)

- Definition: The number of negative instances correctly identified by the model.
- Example: In a spam filter, a TN is when a legitimate email is correctly identified as not spam.

### 3. False Positive (FP)

- Definition: The number of negative instances incorrectly identified as positive by the model.
- Example: In a fraud detection system, an FP is when a legitimate transaction is incorrectly flagged as fraudulent.

### 4. False Negative (FN)

- Definition: The number of positive instances incorrectly identified as negative by the model.
- Example: In a cancer screening test, an FN is when a person with cancer is incorrectly identified as not having cancer.

### 5. True Positive Rate (TPR)

- Definition: Also known as sensitivity or recall, it measures the proportion of actual positives that are correctly identified by the model.
- Formula: TPR=TP/TP+FN
- Example: A TPR of 0.8 means 80% of actual positive cases are correctly identified.

### 6. False Positive Rate (FPR)

- Definition: It measures the proportion of actual negatives that are incorrectly identified as positive by the model.
- Formula: FPR=FP/FP+TN
- Example: An FPR of 0.1 means 10% of actual negative cases are incorrectly identified.

### 7. Threshold

- Definition: The value at which the model's prediction is converted into a binary classification. By adjusting the threshold, different TPR and FPR values can be obtained.
- Example: In a binary classification problem, a threshold of 0.5 might mean that predicted probabilities above 0.5 are classified as positive.

### 8. ROC Curve

- Definition: A graphical plot that illustrates the diagnostic ability of a binary classifier as its discrimination threshold is varied. It plots TPR against FPR at various threshold settings.
- Example: An ROC curve close to the top left corner indicates a better-performing model.

### 9. Area Under the Curve (AUC)

- Definition: A single scalar value that summarizes the overall performance of a binary classifier across all possible thresholds. It is the area under the ROC curve.
- Range: 0 to 1, where 1 indicates perfect performance and 0.5 indicates no better than random guessing.
- Example: An AUC of 0.9 indicates excellent performance.

### 10. Precision

- Definition: The proportion of positive identifications that are actually correct.
- Formula: Precision=TP/TP+FP
- Example: A precision of 0.75 means 75% of the instances classified as positive are actually positive.

### 11. Recall (Sensitivity)

- Definition: Another term for True Positive Rate (TPR), measuring the proportion of actual positives correctly identified.
- Example: A recall of 0.8 means 80% of actual positive cases are correctly identified.

### 12. Specificity

- Definition: The proportion of actual negatives correctly identified by the model.
- Formula: Specificity=TN/TN+FP
- Example: A specificity of 0.9 means 90% of actual negative cases are correctly identified.

## What Is a ROC Curve: How Do You Speculate Model Performance?

AUC is a valuable tool for speculating model performance. An excellent model has its AUC close to 1, indicating a good separability measure. Consequently, a poor model's AUC leans closer to 0, showing the worst separability measure. In fact, the proximity to 0 means it reciprocates the result, predicting the negative class as positive and vice versa, showing 0s as 1s and 1s as 0s. Finally, if the AUC is 0.5, it shows that the model has no class separation capacity at all.

So, when we have a 0.5<AUC<1 result, there’s a high likelihood that the classifier can distinguish between the positive class values and the negative class values. That’s because the classifier can detect more numbers of True Positives and Negatives instead of False Negatives and Positives.

## The Relation Between Sensitivity, Specificity, FPR, and Threshold

Before we examine the relation between Specificity, FPR, Sensitivity, and Threshold, we should first cover their definitions in the context of machine learning models. For that, we'll need a confusion matrix to help us to understand the terms better. Here is an example of a confusion matrix:

TP stands for True Positive, and TN means True Negative. FP stands for False Positive, and FN means False Negative.

- Sensitivity: Sensitivity, also termed "recall," is the metric that shows a model's ability to predict the true positives of all available categories. It shows what proportion of the positive class was classified correctly. For example, when trying to figure out how many people have the flu, sensitivity, or True Positive Rate, measures the proportion of people who have the flu and were correctly predicted as having it.

Here’s how to mathematically calculate sensitivity:

Sensitivity = (True Positive)/(True Positive + False Negative)

- Specificity: The specificity metric Specificity evaluates a model's ability to predict true negatives of all available categories. It shows what proportion of the negative class was classified correctly. For example, specificity measures the proportion of people who don't have the flu and were correctly predicted as not suffering from it in our flu scenario.

Here’s how to calculate specificity:

Specificity = (True Negative)/(True Negative + False Positive)

- FPR: FPR stands for False Positive Rate and shows what proportion of the negative class was incorrectly classified. This formula shows how we calculate FPR:

FPR= 1 – Specificity

- Threshold: The threshold is the specified cut-off point for an observation to be classified as either 0 or 1. Typically, an 0.5 is used as the default threshold, although it’s not always assumed to be the case.

Sensitivity and specificity are inversely proportional, so if we boost sensitivity, specificity drops, and vice versa. Furthermore, we net more positive values when we decrease the threshold, thereby raising the sensitivity and lowering the specificity.

On the other hand, if we boost the threshold, we will get more negative values, which results in higher specificity and lower sensitivity.

And since the FPR is 1 – specificity, when we increase TPR, the FPR also increases and vice versa.

## How AUC-ROC Works

The AUC-ROC (Area Under the Curve - Receiver Operating Characteristic) is a performance measurement for classification problems at various threshold settings. Here's how it works:

### Threshold Variation:

- The ROC curve is generated by plotting the True Positive Rate (TPR) against the False Positive Rate (FPR) at various threshold levels.
- By varying the threshold, different pairs of TPR and FPR values are obtained.

### Plotting the ROC Curve:

- True Positive Rate (TPR), also known as Sensitivity or Recall, is plotted on the y-axis. It is the ratio of true positives to the sum of true positives and false negatives.
- False Positive Rate (FPR) is plotted on the x-axis. It is the ratio of false positives to the sum of false positives and true negatives.

### Calculating AUC:

- The area under the ROC curve (AUC) quantifies the overall ability of the model to discriminate between positive and negative classes.
- An AUC value ranges from 0 to 1. A value of 0.5 suggests no discrimination (random performance), while a value closer to 1 indicates excellent model performance.

## When to Use the AUC-ROC Evaluation Metric?

The AUC-ROC metric is particularly useful in the following scenarios:

- Binary Classification Problems: It is primarily used for binary classification tasks with only two classes.
- Imbalanced Datasets: AUC-ROC is beneficial when dealing with imbalanced datasets, providing an aggregate performance measure across all possible classification thresholds.
- Model Comparison: It is useful for comparing the performance of different models. A higher AUC value indicates a better-performing model.
- Threshold-Independent Evaluation: When you need a performance metric that does not depend on selecting a specific classification threshold.

## Understanding the AUC-ROC Curve

### 1. ROC Curve Interpretation

- Closer to Top Left Corner: A curve that hugs the top left corner indicates a high-performing model with high TPR and low FPR.
- Diagonal Line: A curve along the diagonal line (from (0,0) to (1,1)) indicates a model with no discrimination capability, equivalent to random guessing.

### 2. AUC Value Interpretation

- 0.9 - 1.0: Excellent discrimination capability.
- 0.8 - 0.9: Good discrimination capability.
- 0.7 - 0.8: Fair discrimination capability.
- 0.6 - 0.7: Poor discrimination capability.
- 0.5 - 0.6: Fail, model performs worse than random guessing.

## How to Use the AUC - ROC Curve for the Multi-Class Model

We can use the One vs. ALL methodology to plot the N number of AUC ROC Curves for N number classes when using a multi-class model. One vs. ALL gives us a way to leverage binary classification. If you have a classification problem with N possible solutions, One vs. ALL provides us with one binary classifier for each possible outcome.

So, for example, you have three classes named 0, 1, and 2. You will have one ROC for 0 that’s classified against 1 and 2, another ROC for 1, which is classified against 0 and 2, and finally, the third one of 2 classified against 0 and 1.

We should take a moment and explain the One vs. ALL methodology to better answer the question “what is a ROC curve?”. This methodology is made up of N separate binary classifiers. The model runs through the binary classifier sequence during training, training each to answer a classification question. For instance, if you have a cat picture, you can train four different recognizers, one seeing the image as a positive example (the cat) and the other three seeing a negative example (not the cat). It would look like this:

- Is this image a rutabaga? No
- Is this image a cat? Yes
- Is this image a dog? No
- Is this image a hammer? No

This methodology works well with a small number of total classes. However, as the number of classes rises, the model becomes increasingly inefficient.

Acelerate your career in AI and ML with the AI and Machine Learning Courses with Purdue University collaborated with IBM.

## Are You Interested in a Career in Machine Learning?

There’s a lot to learn about Machine Learning, as you can tell from this “what is a ROC curve” article! However, both machine learning and artificial intelligence are the waves of the future, so it’s worth acquiring skills and knowledge in these fields. Who knows? You could find yourself in an exciting machine learning career!

If you want a career in machine learning, Simplilearn can help you on your way. The AI and ML Certification offers students an in-depth overview of machine learning topics. You will learn to develop algorithms using supervised and unsupervised learning, work with real-time data, and learn about concepts like regression, classification, and time series modeling. You will also learn how Python can be used to draw predictions from data. In addition, the program features 58 hours of applied learning, interactive labs, four hands-on projects, and mentoring.

And since machine learning and artificial intelligence work together so frequently, check out Simplilearn’s Artificial Intelligence Engineer Master’s program, and cover all of your bases.

According to Glassdoor, Machine Learning Engineers in the United States enjoy an average annual base pay of $133,001. Payscale.com reports that Machine Learning Engineers in India can potentially earn ₹732,566 a year on average.

So visit Simplilearn today, and explore the rich possibilities of a rewarding vocation in the machine learning field!

## FAQs

### 1. What does a perfect AUC-ROC curve look like?

A perfect AUC-ROC curve reaches the top left corner of the plot, indicating a True Positive Rate (TPR) of 1 and a False Positive Rate (FPR) of 0 for some threshold. This means the model perfectly distinguishes between positive and negative classes, resulting in an AUC value of 1.0.

### 2. What does an AUC value of 0.5 signify?

An AUC value of 0.5 signifies that the model's performance is no better than random guessing. It indicates that the model cannot distinguish between positive and negative classes, as the True Positive Rate (TPR) and False Positive Rate (FPR) are equal across all thresholds.

### 3. How do you compare ROC curves of different models?

To compare ROC curves of different models, plot each model's ROC curve on the same graph and examine their shapes and positions. The model with the ROC curve closest to the top left corner and the highest Area Under the Curve (AUC) value generally performs better.

### 4. What are some limitations of the ROC curve?

Some limitations of the ROC curve include:

- It can be less informative for highly imbalanced datasets, as the True Negative Rate (specificity) might dominate the curve.
- It does not account for the cost of false positives and false negatives, which can be crucial in some applications.
- Interpretation can be less intuitive compared to precision-recall curves in certain contexts.

### 5. What are common metrics derived from ROC curves?

Common metrics derived from ROC curves include:

- True Positive Rate (TPR): Also known as sensitivity or recall, it measures the proportion of actual positives correctly identified.
- False Positive Rate (FPR): Measures the proportion of actual negatives incorrectly identified as positives.
- Area Under the Curve (AUC): Summarizes the model's overall performance across all thresholds.
- Optimal Threshold: The threshold value maximizes the TPR while minimizing the FPR.