Free — no account required

Breast Cancer Classification with PCA + Logistic Regression In Minutes

Upload your data and get a complete breast cancer classification with pca + logistic regression report. Free.

24,000+ analyses run
Encrypted & deleted in 7 days
PDF & citation included

Drop your CSV here

or click to browse · max 3 MB

📊
-
Rows
-
Columns
-
Numeric

Running breast cancer classification with pca + logistic regression analysis...

Running breast cancer classification with pca + logistic regression analysis...

Your report is ready

Sent to — interactive charts, statistical results, R code, and AI insights.

Analyze another file
Sample Output

Every report includes interactive charts, tables, and AI insights

Upload your data to get your own report

View all case studies See all free tools

How it works

PCA reduces 30 cell nucleus measurements to principal components for visualization, while logistic regression on scaled features provides interpretable binary classification of malignant vs benign tumors with ROC/AUC diagnostics

Use this when you have labeled binary outcome data with many numeric features and want both dimensionality reduction visualization and interpretable linear classification

Do not use if features are non-linearly separable, if you need probability calibration, or if the dataset has fewer than 50 observations per class

Built for: Clinical researchers, bioinformatics analysts, medical data scientists, oncology research fellows, pathology informatics specialists

Typical data source: CSV with 30 cell nucleus measurements from fine needle aspirate (FNA) biopsies plus a diagnosis label column (M=malignant, B=benign)

healthcaremedical researchpharmaceuticalsacademic medicine

What data do you need?

Dataset with 32 columns

patient_id (identifier) diagnosis (categorical) radius_mean (numeric) texture_mean (numeric) perimeter_mean (numeric) area_mean (numeric) smoothness_mean (numeric) compactness_mean (numeric) concavity_mean (numeric) concave_points_mean (numeric) symmetry_mean (numeric) fractal_dimension_mean (numeric) radius_se (numeric) texture_se (numeric) perimeter_se (numeric) area_se (numeric) smoothness_se (numeric) compactness_se (numeric) concavity_se (numeric) concave_points_se (numeric) symmetry_se (numeric) fractal_dimension_se (numeric) radius_worst (numeric) texture_worst (numeric) perimeter_worst (numeric) area_worst (numeric) smoothness_worst (numeric) compactness_worst (numeric) concavity_worst (numeric) concave_points_worst (numeric) symmetry_worst (numeric) fractal_dimension_worst (numeric)

Minimum 50 rows

What's in the report?

569 tumors, 30 numeric features (mean, SE, worst for 10 measurements), binary diagnosis (M=malignant, B=benign). ~63% benign. PCA reduces 30 features to 2-3 components for visualization, logistic regression identifies the most discriminating features.

📊

Diagnosis Class Distribution

Dataset balance between malignant and benign cases and class imbalance risk

🔵

PCA Biplot — PC1 vs PC2 by Diagnosis

PCA 2D visualization showing cluster separation between malignant and benign tumors

📊

Scree Plot — Variance Explained per PC

Scree plot showing variance explained by each principal component

📊

Logistic Regression Coefficients

Logistic regression coefficients showing feature direction and magnitude

🔵

ROC Curve

ROC curve with AUC score for model discrimination ability

🟧

Confusion Matrix

Confusion matrix showing false negatives and sensitivity vs specificity trade-off

📊

Feature Importance — Top Predictors

Top discriminating features ranked by absolute logistic regression coefficient

📋

Model Performance Metrics

Complete model performance metrics including accuracy, sensitivity, specificity, F1, and AUC

🤖

AI Insights

Plain-English interpretation — what the numbers mean, what's significant, and what to do next.

Related tools

Need something simpler? Diabetes Risk Drivers — When you only need to identify which health risk factors drive a clinical outcome, without building a full binary classifier or generating ROC/AUC performance diagnostics

Need more power? Fraud Anomaly — When your classes are heavily imbalanced or partially unlabeled and you need anomaly detection rather than supervised binary classification with labeled training data

Similar: Churn Drivers, Attrition Drivers

The Question This Answers

Breast Tumor Malignancy Classification

Upload a dataset with 30 cell nucleus measurements and a diagnosis label. The module builds a PCA-reduced feature space, fits a logistic regression classifier, and outputs accuracy, AUC, sensitivity, specificity, feature importance, and a full confusion matrix.

Questions?

See our FAQ for details on pricing, data privacy, and how the analysis works. Every report includes a Methodology section showing the statistical test, assumptions checked, and diagnostics run.

Your data has more stories to tell

Run any analysis on your own data — validated R analyses, interactive reports, AI insights, and PDF export.

Try Free — No Credit Card
Powered by MCP Analytics