User 136 · Health · Diagnostics · Cancer Classification
Executive Summary

Executive Summary

Overall classification accuracy, AUC, and the single most discriminating tumor measurement

Total Tumors
569
Malignant Rate
37.3%
Model Accuracy
92.9%
AUC Score
0.987
Top Feature
Radius Worst
Sensitivity
83.3%
Specificity
98.6%
The logistic regression model achieves 92.9% accuracy on held-out test cases with an AUC of 0.987. The dataset contains 569 tumors (37.3% malignant). The strongest single predictor of malignancy is Radius Worst.
Interpretation

The logistic regression model achieves 92.9% accuracy on held-out test cases with an AUC of 0.987. The dataset contains 569 tumors (37.3% malignant). The strongest single predictor of malignancy is Radius Worst.

Visualization

Diagnosis Class Distribution

Count of malignant (M) and benign (B) tumors in the dataset

Interpretation

The dataset contains 212 malignant and 357 benign tumors (37.3% malignant). The modest class imbalance is not severe enough to require resampling for logistic regression.

Visualization

PCA Biplot — PC1 vs PC2 by Diagnosis

Each point is one tumor plotted in the 2-D space of the first two principal components

Interpretation

PC1 explains 44.3% and PC2 explains 19% of total feature variance. Malignant tumors (M) cluster at higher PC1 values, reflecting larger and more irregular cell nuclei. The visible separation between clusters confirms that the 30 features encode linearly separable class information, supporting good logistic regression performance.

Visualization

Scree Plot — Variance Explained per PC

Percentage of total feature variance captured by each principal component

Interpretation

The first 7 principal components capture 90% of the variance in the 30 cell-nucleus measurements. PC1 alone explains 44.27% — the steep drop after PC1 is typical of medical imaging datasets where a few dominant shape factors account for most variability.

Visualization

Logistic Regression Coefficients

Standardised logistic regression coefficients for the top features

Interpretation

Of the top 10 features shown, 5 push the model toward malignancy (positive coefficient) and 5 push toward benign (negative). Features with the largest absolute coefficients have the greatest influence on the predicted probability. All features were z-scored before fitting, so coefficients are directly comparable across measurements.

Visualization

ROC Curve

True positive rate vs false positive rate across all classification thresholds

Interpretation

The model achieves an AUC of 0.987 — values above 0.9 indicate excellent discrimination between malignant and benign tumors. The curve shows that the model can achieve sensitivity above 90% while keeping the false positive rate below 15%, a clinically acceptable trade-off for cancer screening.

Visualization

Confusion Matrix

Predicted vs actual diagnosis on the held-out test set

Interpretation

The model correctly classified 35 malignant and 70 benign tumors. There were 7 false negative(s) — malignant tumors predicted benign — and 1 false positive(s). Sensitivity (83.3%) measures how well the model catches true cancers; minimising false negatives is the clinical priority.

Visualization

Feature Importance — Top Predictors

Top cell-nucleus measurements ranked by absolute logistic regression coefficient

Interpretation

The most discriminating feature is Radius Worst (|coef| = 736.8858). Features from the 'worst' measurement group (largest values recorded across nuclei) tend to dominate the ranking, reflecting that the most extreme cellular abnormalities are the clearest indicators of malignancy.

Data Table

Model Performance Metrics

Complete classification performance on held-out test set

MetricValue
Accuracy0.9292
Sensitivity (Recall)0.8333
Specificity0.9859
Precision0.9722
F1 Score0.8974
AUC0.9869
Interpretation

Across all six metrics the model performs consistently well. Sensitivity of 83.3% means the classifier correctly flags the vast majority of true cancers, which is the primary clinical objective. AUC of 0.987 confirms strong overall discrimination independent of the chosen threshold.

Your data has more stories to tell. Run any analysis on your own data — 60+ validated R modules, interactive reports, AI insights, and PDF export. 2,000 free credits on signup.
Try Free — No Signup Sign Up Free

Report an Issue

Tell us what's wrong. You'll get a free re-run of this analysis so you can try again with different parameters. If the re-run still doesn't meet your expectations, we'll refund your credits.

Want to run this analysis on your own data? Upload CSV — Free Analysis See Pricing