A Simple Guide to ROC Curves, Sensitivity and Specificity and the Confusion Matrix

Tom Ribaroff
5 min readNov 10, 2020

ROC Curves can look a little confusing at first so here’s a handy guide to understanding what it means, starting from the basic related concepts:

Confusion Matrix

When building a classifying model, we want to look at how successful it is performing. The results of its’ performance can be summarised in a handy table called a Confusion Matrix.

Diagram 1 – Confusion Matrix

The model’s correct classifications are totalled in the green boxes, and the incorrect ones are in the red boxes.

For example, the model predicted 50 data points correctly as negative, but incorrectly predicted 10 data points as positive when they should have been called negative.

Sensitivity

Sensitivity is the measure of how well your model is performing on your ‘positives’.

It is the proportion of positive results your model predicted verses how many it *should* have predicted.

Number of Correctly Predicted Positives / Number of Actual Positives

In the example above, we can see that there were 100 correct positives and 5 false negatives (that should have been predicted positive). This means that our model predicted 100 out of 105 positives, or had a “sensitivity of 94%”

Thus, a model will 100% sensitivity never misses a positive data point.

Specificity

Specificity is the measure of how well your model is classifying your ‘negatives’.

It is the number of true negatives (the data points your model correctly classified as negative) divided by the total number of negatives your model *should* have predicted.

Number of Correctly Predicted Negatives / Number of Actual Negatives

In the example above, we can see that there were 50 correct negatives and 10 false positives (that should have been predicted negative). This means that our model predicted 50 out of 60 negatives, or had a “specificity of 83%”

Models with 100% specificity always get the negatives right.

We’ll use Logistic Regression in our example well work through, but any binary classifier would work (logistic regression, decision trees etc).

Say we are trying to predict if an animal is a cat or a dog, from its’ weight.

Diagram 2 – Logistic Regression Curve

We’ve fit our data to this log curve (hence logistic) and set the threshold to 0.5. Any animal above this threshold is a dog, any value below is not. Simple, right?

Let’s say y=0.8 is actually negative value – it’s very large cat confusing the model. Our model would label this a positive.

What if the value at 0.3 is actually a positive? Our model would label this negative, and hence we’d have one dog being labelled a cat.

These incorrect predictions are not a huge problem; it’s sacrifice we’d happily make to have a model that works well on a large dataset of dogs. We don’t want to overfit! But what if we aren’t predicting for dogs, but are predicting for a serious disease? Then the stakes are higher, and it is much less acceptable to miss positives, so you would have to consider lowering the threshold so you don’t miss any.

ROC Curve

The ROC curve is a plot of how well the model performs at all the different thresholds, 0 to 1! We go through all the different thresholds plotting away until we have the whole curve. We can then compare this curve to the other ROC Curves of other models, to see which is performing better overall.

Let’s have a closer look at an example one:

Diagram 3 – ROC Curve

The True Positive Rate is the rate that we correctly predict positive values to be positive:

Number of Correctly Predicted Positives / Number of Real Positives

This is the same as Sensitivity, which we saw above!

The False Positive Rate is the rate that we incorrectly labelled negatives to be positive.

Number of Incorrectly Predicted Positives / Number of Real Negatives

NB This is actually the same as 1 – Specificity, subject to a bit of algebra.

The Dotted Line: this marks our baseline which we are hoping to beat. On this line, the True Positive Rate and the False Positive rate are equal, meaning that our model would be useless, as a positive prediction is just as likely to be a True as it is to be False.

All the points along the orange line are the results of our model’s performance at a different threshold value.

Imagine this ROC curve is from our Dogs and Cats example.

Let’s start at the bottom left: If we set the Threshold to one, our logistic regression model will predict that every single animal is a cat. If our model predicts zero dogs, then the sensitivity (or True Positive Rate) would be zero (as the numerator of the sensitivity function above would be zero). The same goes for our False Positive Rate; you can’t have any false positives if you predict zero positives! So our first point on the graphs is at (0,0).

As we lower our threshold, we start to correctly predict dogs, shooting our orange line up the graph, occasionally being pulled to the right when False positives are picked up (like at y=0.8 on Picture 2).

As we approach Threshold = 0, our orange line approaches (1,1) as a zero Threshold would predict all the animals as dogs, meaning that while dog is correctly predicted to be a dog, every cat is incorrectly predicted to be a dog, so the True and False positive rates are both 1.

In an ideal scenario, our model would pick up on every positive, while not misdiagnosing any of the negatives as positives. If this was represented on the graph, it would be a point at (1,0), so the closer the orange line goes towards the top left, the better the model is performing.

--

--