False Positive (Type I Error): This occurs when we incorrectly predict a positive result when it is actually negative. For example, diagnosing a disease when the person is healthy.
False Negative (Type II Error): This happens when we incorrectly predict a negative result when it is actually positive. For example, failing to diagnose a disease that is actually present.
Comparison:
Type I Error is generally considered less dangerous than Type II Error because a false positive might lead to unnecessary tests or treatments, but a false negative could cause a missed diagnosis, which can be life-threatening in medical contexts.
Both errors can have significant consequences in fields like medicine, where accuracy is critical.
2. Accuracy Paradox
The Accuracy Paradox occurs when a model's accuracy is misleading, especially in imbalanced datasets. For example, a model that always predicts the majority class might appear to be highly accurate but perform poorly for the minority class.
3. CAP Curve (Cumulative Accuracy Profile Curve)
CAP Curve is used to assess the performance of classification models, particularly in terms of their ability to rank predictions effectively. The curve helps visualize how well the model is identifying the target class.
Better Model: A larger area under the curve indicates a better model.
Interpretation: By using the CAP curve, we can evaluate the gain we get from a model's predictions.
The ROC Curve is a graphical representation of the trade-off between sensitivity (True Positive Rate) and specificity (1 - False Positive Rate) at different classification thresholds.
Key Insight: Neither of the axes represents "correct" or "incorrect" alone—it's a balance between false positives and false negatives at various thresholds.
5. CAP Curve Analysis
Analysis of the CAP Curve:
Blue Line: Represents a random model (baseline performance).
Red Line: Represents the performance of the model being evaluated.
Grey Line: Represents the performance of a perfect model.
Interpretation:
If your model is closer to the grey line, it means it is performing very well.
If it’s closer to the blue line, it’s performing poorly (random guesswork).
If your model performance is over 90% but not perfect, it suggests there might still be issues with the variables used or other aspects of the model.
6. Overfitting
Overfitting occurs when a model learns not only the genuine patterns but also the noise or random fluctuations in the training data. As a result, it performs well on the training data but poorly on new, unseen data.
Prevention: Use techniques like cross-validation, pruning, or regularization to prevent overfitting.
7. False Positive and False Negative (Definitions Recap)
False Positive: We predict a positive outcome when it is actually negative.
False Negative: We predict a negative outcome when it is actually positive.
1. False Positive and False Negative
2. Accuracy Paradox
3. CAP Curve (Cumulative Accuracy Profile Curve)
4. ROC Curve (Receiver Operating Characteristic Curve)
5. CAP Curve Analysis
6. Overfitting
7. False Positive and False Negative (Definitions Recap)