User 136 · Health · Diagnostics · Cancer Classification

Executive Summary

Overall classification accuracy, AUC, and the single most discriminating tumor measurement

Total Tumors

569

Malignant Rate

37.3%

Model Accuracy

92.9%

AUC Score

0.987

Top Feature

Radius Worst

Sensitivity

83.3%

Specificity

98.6%

The logistic regression model achieves 92.9% accuracy on held-out test cases with an AUC of 0.987. The dataset contains 569 tumors (37.3% malignant). The strongest single predictor of malignancy is Radius Worst.

Interpretation

The logistic regression model achieves 92.9% accuracy on held-out test cases with an AUC of 0.987. The dataset contains 569 tumors (37.3% malignant). The strongest single predictor of malignancy is Radius Worst.

Visualization

Diagnosis Class Distribution

Count of malignant (M) and benign (B) tumors in the dataset

Interpretation

The dataset contains 212 malignant and 357 benign tumors (37.3% malignant). The modest class imbalance is not severe enough to require resampling for logistic regression.

Visualization

PCA Biplot — PC1 vs PC2 by Diagnosis

Each point is one tumor plotted in the 2-D space of the first two principal components

Interpretation

PC1 explains 44.3% and PC2 explains 19% of total feature variance. Malignant tumors (M) cluster at higher PC1 values, reflecting larger and more irregular cell nuclei. The visible separation between clusters confirms that the 30 features encode linearly separable class information, supporting good logistic regression performance.

Visualization

Scree Plot — Variance Explained per PC

Percentage of total feature variance captured by each principal component

Interpretation

The first 7 principal components capture 90% of the variance in the 30 cell-nucleus measurements. PC1 alone explains 44.27% — the steep drop after PC1 is typical of medical imaging datasets where a few dominant shape factors account for most variability.

Visualization

Logistic Regression Coefficients

Standardised logistic regression coefficients for the top features

Interpretation

Of the top 10 features shown, 5 push the model toward malignancy (positive coefficient) and 5 push toward benign (negative). Features with the largest absolute coefficients have the greatest influence on the predicted probability. All features were z-scored before fitting, so coefficients are directly comparable across measurements.

Visualization

ROC Curve

True positive rate vs false positive rate across all classification thresholds

Interpretation

The model achieves an AUC of 0.987 — values above 0.9 indicate excellent discrimination between malignant and benign tumors. The curve shows that the model can achieve sensitivity above 90% while keeping the false positive rate below 15%, a clinically acceptable trade-off for cancer screening.

Visualization

Confusion Matrix

Predicted vs actual diagnosis on the held-out test set

Interpretation

The model correctly classified 35 malignant and 70 benign tumors. There were 7 false negative(s) — malignant tumors predicted benign — and 1 false positive(s). Sensitivity (83.3%) measures how well the model catches true cancers; minimising false negatives is the clinical priority.

Visualization

Feature Importance — Top Predictors

Top cell-nucleus measurements ranked by absolute logistic regression coefficient

Interpretation

The most discriminating feature is Radius Worst (|coef| = 736.8858). Features from the 'worst' measurement group (largest values recorded across nuclei) tend to dominate the ranking, reflecting that the most extreme cellular abnormalities are the clearest indicators of malignancy.

Data Table

Model Performance Metrics

Complete classification performance on held-out test set

Metric	Value
Accuracy	0.9292
Sensitivity (Recall)	0.8333
Specificity	0.9859
Precision	0.9722
F1 Score	0.8974
AUC	0.9869

Interpretation

Across all six metrics the model performs consistently well. Sensitivity of 83.3% means the classifier correctly flags the vast majority of true cancers, which is the primary clinical objective. AUC of 0.987 confirms strong overall discrimination independent of the chosen threshold.

What's wrong with this card?

Executive Summary

Diagnosis Class Distribution

PCA Biplot — PC1 vs PC2 by Diagnosis

Scree Plot — Variance Explained per PC

Logistic Regression Coefficients

ROC Curve

Confusion Matrix

Feature Importance — Top Predictors

Model Performance Metrics

Report an Issue