User 136 · Health · Patients · Diabetes Risk Drivers
Executive Summary

Executive Summary

Key findings from the diabetes risk driver analysis

Observations
768
Diabetic Patients
268
Non-Diabetic Patients
500
AUC
0.8666
Sensitivity
0.8284
Specificity
0.782
Optimal Threshold
0.3227
Top Predictor
Glucose
Zeros Imputed
652
Predictors Used
8
The logistic regression model achieves an AUC of 0.8666 (86.7%), indicating good discrimination between diabetic and non-diabetic patients across all thresholds. At the Youden-optimal threshold of 0.3227, the model reaches sensitivity of 82.8% and specificity of 78.2%. The strongest single predictor of diabetes onset is Glucose. Zero-coded missing values were detected and imputed in 652 observations across five clinical variables before model fitting.
Interpretation

The logistic regression model achieves an AUC of 0.8666 (86.7%), indicating good discrimination between diabetic and non-diabetic patients across all thresholds. At the Youden-optimal threshold of 0.3227, the model reaches sensitivity of 82.8% and specificity of 78.2%. The strongest single predictor of diabetes onset is Glucose. Zero-coded missing values were detected and imputed in 652 observations across five clinical variables before model fitting.

Data Table

Clinical Summary by Diabetes Status

Mean clinical measurements with Welch t-test p-values by diabetes outcome

PredictorMean DiabeticMean NondiabeticP Value
Pregnancies4.8663.2980
Glucose142.3110.60
Blood Pressure75.2770.840
Skin Thickness32.6727.170
Insulin187.6117.20
BMI35.430.850
Diabetes Pedigree Function0.550.430
Age37.0731.190
Interpretation

8 of 8 clinical measurements differ significantly between diabetic and non-diabetic patients (p < 0.05 by Welch t-test). The most significant predictor is Pregnancies (mean diabetic = 4.866 vs. non-diabetic = 3.298, p = 0). Among the 268 diabetic patients and 500 non-diabetic patients, nearly all continuous measurements are elevated in the diabetic group.

Visualization

Zero-Value Missing Data Profile

Count of biologically impossible zero values by clinical variable

Interpretation

Zero-coded missing values are most prevalent in Insulin with 374 zeros (48.7% of patients). In total, 652 zero-coded values were detected across 5 clinical variables before model fitting. All zeros were replaced with class-conditional medians (computed separately for diabetic and non-diabetic patients) prior to analysis.

Visualization

Clinical Measurements by Diabetes Status

Standardised distributions of top predictors by diabetes outcome

Interpretation

Box plots show standardised (z-score) clinical measurements for diabetic vs. non-diabetic patients across the top 5 predictors ranked by model importance. Standardisation places all variables on a common scale, enabling direct visual comparison of distributional separation. Glucose shows the greatest separation between groups, consistent with its position as the strongest predictor in the logistic regression model.

Visualization

Odds Ratios — Logistic Regression Predictors

Exponentiated logistic regression coefficients

Interpretation

6 of 8 clinical predictors have odds ratio confidence intervals that exclude 1.0, indicating statistically significant associations with diabetes at the 95% confidence level. Diabetes Pedigree Function has the highest odds ratio of 2.198 (95% CI: 1.2235 – 3.9853). An odds ratio above 1.0 indicates increased odds of diabetes per unit increase in that predictor; a CI crossing 1.0 means the effect is not statistically distinguishable from zero.

Visualization

ROC Curve — Model Discrimination

Receiver operating characteristic curve with AUC

Interpretation

The ROC curve plots sensitivity (true positive rate) against 1 - specificity (false positive rate) across all possible classification thresholds. The model achieves an AUC of 0.8666 (86.7%), indicating good discriminative ability for clinical screening. At the Youden-optimal threshold, sensitivity is 82.8% and specificity is 78.2%, meaning the model correctly identifies 82.8% of true diabetic patients while correctly ruling out 78.2% of non-diabetic patients.

Visualization

Confusion Matrix at Optimal Threshold

True and false positives/negatives at Youden's J threshold

Interpretation

At the Youden-optimal threshold of 0.323, the model correctly classifies 222 diabetic patients (true positives) and 391 non-diabetic patients (true negatives), for an overall accuracy of 79.8%. 46 diabetic patients are missed (false negatives) and 109 non-diabetic patients are incorrectly flagged (false positives). Sensitivity of 82.8% means the model catches most true cases, which is important for a clinical screening tool where missed diagnoses carry high cost.

Visualization

Variable Importance by Log-Odds Magnitude

Predictors ranked by absolute standardised log-odds

Interpretation

Glucose is the most important predictor with an absolute standardised log-odds of 0.9265, meaning a one-standard-deviation increase corresponds to the largest shift in diabetes log-odds among all clinical variables. The gap to the second-ranked predictor is 0.4459. Standardising by predictor scale puts all coefficients on a fair footing: variables with different measurement units (e.g., glucose in mg/dL vs. pregnancies as a count) are directly comparable in this ranking.

Your data has more stories to tell. Run any analysis on your own data — 60+ validated R modules, interactive reports, AI insights, and PDF export. 2,000 free credits on signup.
Try Free — No Signup Sign Up Free

Report an Issue

Tell us what's wrong. You'll get a free re-run of this analysis so you can try again with different parameters. If the re-run still doesn't meet your expectations, we'll refund your credits.

Want to run this analysis on your own data? Upload CSV — Free Analysis See Pricing