Confusion Matrix in Machine Learning

In this post, I explain Confusion Matrix in detail. Learn Confusion Matrix Definition and Intuition, Claim Approval Example, Confusion Matrix Table Layout, Core Concepts Explained (TP, TN, FP, FN), Confusion Matrix Formulae, Derived Metrics from the Confusion Matrix (Precision, Recall, F1, Specificity), and Visualization and Code.
If you want to additionally learn about the following confusion matrix topics or comment, you can do so on my original Confusion Matrix article on LinkedIn here.

Thresholding, ROC and PR Curves, Imbalanced Data and the Accuracy Paradox, Multiclass and Multi-Label Confusion Matrices (Visualization and Interpretation), Cost-Sensitive Decisions: Cost Matrix, Business Tradeoffs, and Setting Operational Thresholds, Calibration, Confidence, and When to Trust Model Probabilities, Practical Tips and Troubleshooting (Data leakage, label noise, sampling effects) — confusion matrix tutorial, debugging checklist for AI Developers and AI QA Testers, Ethics, Fairness and Human Impact in Confusion Matrix Decisions, Performance Monitoring and Alerts for Using Confusion Matrix Metrics in Production (drift detection, rolling windows)

Confusion Matrix Definition & Intuition

A Confusion Matrix is the diagnostic table that is used to evaluate classification models in supervised learning. In confusion matrix in machine learning, the matrix cross-tabulates the model’s Predicted labels against the True (actual) labels, so you can see not just overall accuracy but also the pattern of errors (misclassification). Intuitively, the matrix answers questions such as: when the model says Approve, how often is it correct; and when it says Deny, what mistakes is it making? For Machine Learning (ML) developers and Artificial Intelligence (AI) / ML testers, the confusion matrix definition is less about a single evaluation score and more about exposing operational errors.

Claim Approval Scenario Example: Consider an automated claim adjudication model that classifies incoming insurance claims as Approve or Deny. Business impact differs by error type: an erroneous Approve may pay out a fraudulent claim (which might lead to unnecessary costs, perhaps huge for the business), while an erroneous Deny inconveniences a customer (and might lead to an appeal against the erroneous decision). What matters is which mistakes the model makes when it is wrong. The Confusion Matrix provides that information.

Confusion Matrix Table Layout: True (vertical) vs Predicted (horizontal)

Draw the matrix as a 2×2 table with the True (actual) axis on the vertical (rows) and the Predicted axis on the horizontal (columns). The top row is the actual Deny cases; the bottom row is the actual Approve cases. The left column contains the model's predictions of Deny and the right column contains predictions of Approve.
Placing True on the vertical axis and Predicted on the horizontal axis is the standard orientation used in typical tooling and visualizations; it aligns with how derived rates (e.g., sensitivity vs specificity) are computed and interpreted in production dashboards. Note: Always check documentation, because some libraries (e.g., older packages) transpose the layout.

TP, TN, FP, FN Explained

Each of the four cells has an operational meaning in this Claim Approval example. The bottom-right cell is TP (True Positive): the model correctly Approves a legitimate claim. The bottom-left cell is FN (False Negative): the model incorrectly Denies a legitimate claim. The top-right cell is FP (False Positive): the model incorrectly Approves a fraudulent claim (which is an expensive operational error). The top-left cell is TN (True Negative): the model correctly Denies a fraudulent claim. These labels let you translate counts into business questions: How many fraudulent payouts slipped through (FP)? How many real customers suffered incorrect denials (FN)?

Confusion Matrix Formulae: Overall Accuracy & Sensitivity (True Positive Rate)

Two foundational formulas use the four cell counts directly. Let TP, TN, FP, and FN denote the integer counts in their respective cells. TP = 6, TN = 2, FP = 1, FN = 1

Accuracy = (TP + TN) / (TP + TN + FP + FN) = 8 / 10 = 0.8 (= 80.0%).
Sensitivity (also called True Positive Rate or Recall for the positive class) is defined as Sensitivity = TP / (TP + FN) = 6 / 7 = 0.857142857... → 0.857 (≈ 85.7%).

Derived Metrics from the Confusion Matrix (Precision, Recall, F1, Specificity)

From the four cell counts of a confusion matrix (TP, TN, FP, FN), we can compute evaluation metrics that answer operational questions. Precision (also called positive predictive value) measures how many of the items the model labeled Approve were actually legitimate: Precision = TP / (TP + FP)

Recall (also called Sensitivity or true positive rate) measures how many of the truly legitimate claims the model successfully approved: Recall = TP / (TP + FN)

F1 combines precision and recall into a single harmonic mean to balance both concerns: F1 = 2 · (Precision · Recall) / (Precision + Recall)

Specificity (true negative rate) measures how many fraudulent claims were correctly denied: Specificity = TN / (TN + FP).

Example: using the Claim Approval example above
TP = 6, TN = 2, FP = 1, FN = 1
Precision = 6 / (6 + 1) = 6/7 ≈ 0.857 (about 85.7%)
Recall = 6 / (6 + 1) = 6/7 ≈ 0.857 (about 85.7%)
F1 = 2·(0.857·0.857)/(0.857+0.857) = 0.857 (about 85.7%).
Specificity = 2 / (2 + 1) = 2/3 ≈ 0.667 (about 66.7%)

Interpreting these numbers in the Claim Approval domain: when the system approves, it is correct about 86% of the time (precision); it captures about 86% of legitimate claims (recall); it fails to correctly deny some fraudulent claims (false positive rate), indicating a need for stronger fraud controls if the business cost of FP is high.

Visualization & Code: Confusion Matrix in Python with sklearn

Below is a compact plotting helper that integrates sklearn’s utilities and produces the annotated matrix used in the worked examples. The toy data constructs the exact single-digit counts above.


import matplotlib.pyplot as plt
from sklearn.metrics import ConfusionMatrixDisplay

def plot_confusion_matrix(y_true, y_pred, labels=("Deny", "Approve"), normalize=None, cmap=plt.cm.Blues):
    fig, ax = plt.subplots()
    disp = ConfusionMatrixDisplay.from_predictions(
        y_true, y_pred, display_labels=labels, normalize=normalize, cmap=cmap, ax=ax
    )
    ax.set_title("Claim Approval Confusion Matrix")
    return disp, ax

# Toy arrays matching TP=6, TN=2, FP=1, FN=1 (total 10)
y_true = [1]*7 + [0]*3        # seven true Approve (1), three true Deny (0)
y_pred = [1]*6 + [0]*1 + [1]*1 + [0]*2  # predictions arranged to yield counts above

disp, ax = plot_confusion_matrix(y_true, y_pred, normalize=None)
plt.show()

Want to learn more? Read my original Confusion Matrix article on LinkedIn. It additionally explains Thresholding, ROC and PR Curves, Imbalanced Data and the Accuracy Paradox, Multiclass and Multi-Label Confusion Matrices (Visualization and Interpretation), Cost-Sensitive Decisions: Cost Matrix, Business Tradeoffs, and Setting Operational Thresholds, Calibration, Confidence, and When to Trust Model Probabilities, Practical Tips and Troubleshooting (Data leakage, label noise, sampling effects) — confusion matrix tutorial, debugging checklist for AI Developers and AI QA Testers, Ethics, Fairness and Human Impact in Confusion Matrix Decisions, Performance Monitoring and Alerts for Using Confusion Matrix Metrics in Production (drift detection, rolling windows). Thank you!

Search This Blog

Fourth Industrial Revolution