Confusion Metrics:

#ModelEvaluation #Series #2

A way to view various metrics of classification is confusions matrix.

It also helps up understand precision and recall better.

In a classification problem, during the testing phase you will have two categroies:

True condition
A text message is SPAM
Predicted condition
Ml model predicted SPAM

Keep in mind the model could predicted it as HAM, This means if you have two possible classes you should have 4 separate groups at the end of testing.

Correctly classified to class 1: TRUE HAM
Correctly classified to class 2: TRUE SPAM
Incorrectly classified to class 1: FALSE HAM
Incorrectly classified to class 2: FALSE SPAM

Real Condition Predicted Ham Predicted Spam

Ham True False

Spam False True

now if we expand these table futher:

(Positive)

| Real Condition | Predicted Ham | Predicted Spam (Negative)

|----------------------------|---------------------------|---------------------------|

| Ham | True Positive | False Negative |

| Spam | False Positive | True Negative |

The key point with confusion matrix and various calculated metrics is that

they are all fundamentally ways of comparing the predicted values vs true values

What constitutes a GOOD metics?

It really depends on the specific situation.

In some situation 99% accuracy is fantastic.
In other situation 99% accuracy may not even be good enough, may be it comes with really poor precision and poor recall.
So we cannot just say there's certain good values for particular metrics
Obviously if you get 100% across precision, accuracy and recall. 100% on all three of those is a really good model.
But in a real world you probably not going to get 100% of all of those.

Confusion matrix to evaluate our model. In this example we are going to test for the presence of the disease. This is a supervised learning, so before we run them through the testing program, we already know the true conditions of these patients. Whether they have diseases or not. So imaging this as testing a new diagnostic tool.

For presence of disease:

Yes = Positive Test or True or 1

No = Negative Test or False or O

Total People for the test = 165, N = 165

The result is:

| Real Condition | Predicted No | Predicted YES (Positive)

|----------------------------|---------------------------|---------------------------|

| Actual No | 50 | 10 |

| Actual Yes | 5 | 100 |

50 people did not have cancer, we correctly predicted, True Negative
Accidentally predicted 10 people to have disease, when these people did not have disease False Positive
5 People had disease but we said they did not have disease False Negative
Actually had disease and predicted to have disease is 100 True positive

let's map the value in table:

| Real Condition | Predicted No | Predicted YES (Positive)

|----------------------------|---------------------------|---------------------------|

| Actual No | 50 TN | 10 FP |

| Actual Yes | 5 FN | 100 TP |

Accuracy:
how often is it correct?
(TP + TN)/ Total
How many did i classify correctly over all my examples?
(50 + 100) / 165 = 0.91 or 91% accuracy

Now is 91% accurate good enough?

This depends on the situation, if you are dealing with cancer, that's a high stakes game so 91% is not good enough,

The really important statistics here is False Negative, these 5 people had cancer that we predicted as safe.This is a extremely dangerous situation to be in. So you have to keep in mind the context of what your ML model is trying to achieve?.

SO there is always going to a trade off between false negative vs false positive.

In this situation we want to minimize the false negative. Here we'd really like to avoid in this situation, telling someone they are clear of the disease when they actually have one.

You can also calculate Misclassification Rate:

= (FP + FN) / total

= 15 /165

= 0.09 or 9% error rate.

In statistics, false positives and false negatives are referred as type I and type II error.

Type1 i.e false positive: telling man a he is pregnant

Type II error, false negative: telling a pregnant women she is not pregnant.