Analytics · Health · Diabetes · Risk Factors
Executive Summary

Executive Summary

Model performance overview and strongest risk factor

n_observations
768
auc
0.9398
accuracy
0.8815
sensitivity
0.8022
specificity
0.924
diabetes_prevalence
0.349
strongest_predictor
4.1546
ppv
0.8498
tp
215
tn
462
The logistic regression model achieved an AUC of 94%, indicating strong discriminative ability between diabetic and non-diabetic patients. The strongest single predictor was diabetes_pedigree with an odds ratio of 4.15, meaning each unit increase multiplied the odds of diabetes by that factor. Overall model accuracy was 88.1%, with 34.9% of the 768 patients in this dataset diagnosed as diabetic.
Interpretation

The logistic regression model achieved an AUC of 94%, indicating strong discriminative ability between diabetic and non-diabetic patients. The strongest single predictor was diabetes_pedigree with an odds ratio of 4.15, meaning each unit increase multiplied the odds of diabetes by that factor. Overall model accuracy was 88.1%, with 34.9% of the 768 patients in this dataset diagnosed as diabetic.

Visualization

Risk Factor Ranking by Odds Ratio

Relative impact of each biomarker on diabetes risk

Interpretation

Biomarkers are ranked by their odds ratios from the logistic regression model. diabetes_pedigree has the highest odds ratio (4.16), indicating it is the strongest independent predictor of diabetes in this cohort. 8 of 8 predictors reach statistical significance at p < 0.05 after controlling for all other biomarkers in the model.

Visualization

Measurement Distributions by Diabetes Status

Spread of each biomarker in diabetic vs non-diabetic groups

Interpretation

Glucose shows the clearest separation: median 136.3 mg/dL in diabetic vs 105 mg/dL in non-diabetic patients. Box plots show the interquartile range and median for each biomarker split by diagnosis. Taller boxes with greater offset between groups indicate stronger discriminating power. Overlapping distributions suggest a predictor has weaker standalone discriminative ability.

Visualization

Feature Correlation Matrix

Pairwise correlations among biomarkers to assess multicollinearity

Interpretation

The correlation heatmap shows pairwise Pearson correlations among all eight biomarkers. The highest correlation is between bmi and glucose (r = 0.2). Correlations above 0.7 can cause multicollinearity in the regression model, making it difficult to isolate each predictor's independent contribution to diabetes risk.

Visualization

Mean Biomarker Profiles: Diabetic vs Non-Diabetic

Z-score standardized group means for each biomarker

Interpretation

Z-score standardized mean values allow fair comparison across biomarkers with different units. glucose shows the largest group difference (1.08 standard deviations). Diabetic patients consistently score higher on metabolic risk indicators. Bars extending further from zero indicate measurements where the two groups diverge most.

Visualization

Diabetes Prevalence by Age Group

Fraction of patients diagnosed with diabetes by age band

Interpretation

Diabetes prevalence rises sharply with age across decade bands. The youngest group (20-29) has a prevalence of 24.3%, while the oldest recorded group reaches 87.5%. Prevalence first exceeds 50% in the 50-59 age group. This monotone increase aligns with the biological understanding that insulin resistance accumulates over time alongside lifestyle and metabolic changes.

Visualization

Glucose vs BMI by Diabetes Outcome

Joint distribution of glucose and BMI colored by diabetes diagnosis

Interpretation

Each point is one patient, with size proportional to age. Diabetic patients cluster toward higher glucose levels (mean 130 mg/dL vs 95.5 mg/dL in non-diabetics) and higher BMI (mean 34.1). The upper-right region — high glucose combined with elevated BMI — shows the highest concentration of diabetic cases, reflecting the joint metabolic risk these two factors create.

Data Table

Logistic Regression Coefficients

Full regression table with odds ratios, confidence intervals, and p-values

predictor_nameodds_ratioci_lowerci_upperp_valuesignificance_stars
pregnancies1.7461.531.9920***
glucose1.0581.0461.0710***
blood_pressure1.0391.021.0590.0001***
skin_thickness1.1041.0721.1380***
insulin1.0151.0091.020***
bmi1.1111.0731.1510***
diabetes_pedigree4.1551.9658.7850.0002***
age1.0741.0481.1020***
Interpretation

The table shows all eight biomarker coefficients from the logistic regression, expressed as odds ratios with 95% confidence intervals. 8 predictor(s) are statistically significant at p < 0.05; 0 are not significant after controlling for the other variables. diabetes_pedigree has the largest odds ratio (4.155, CI: 1.965–8.785).

Data Table

Model Performance Metrics

Summary of classification performance at the chosen decision threshold

metric_namemetric_value
AUC (ROC)0.9398
Accuracy0.8815
Sensitivity (Recall)0.8022
Specificity0.924
Positive Predictive Value0.8498
N Observations768
Diabetes Prevalence0.349
Interpretation

At a classification threshold of 50%, the model achieves an AUC of 94%, an overall accuracy of 88.1%, a sensitivity of 80.2% (fraction of actual diabetics correctly identified), and a specificity of 92.4% (fraction of non-diabetics correctly classified). A higher AUC threshold would increase specificity at the cost of sensitivity, and vice versa — the optimal threshold depends on the relative cost of false negatives vs false positives.

Your data has more stories to tell. Run any analysis on your own data — 60+ validated R modules, interactive reports, AI insights, and PDF export. 2,000 free credits on signup.
Try Free — No Signup Sign Up Free

Report an Issue

Tell us what's wrong. You'll get a free re-run of this analysis so you can try again with different parameters. If the re-run still doesn't meet your expectations, we'll refund your credits.

Want to run this analysis on your own data? Upload CSV — Free Analysis See Pricing