User 136 · Health · Patients · Diabetes Risk Drivers

Executive Summary

Key findings from the diabetes risk driver analysis

Observations

768

Diabetic Patients

268

Non-Diabetic Patients

500

AUC

0.8666

Sensitivity

0.8284

Specificity

0.782

Optimal Threshold

0.3227

Top Predictor

Glucose

Zeros Imputed

652

Predictors Used

8

The logistic regression model achieves an AUC of 0.8666 (86.7%), indicating good discrimination between diabetic and non-diabetic patients across all thresholds. At the Youden-optimal threshold of 0.3227, the model reaches sensitivity of 82.8% and specificity of 78.2%. The strongest single predictor of diabetes onset is Glucose. Zero-coded missing values were detected and imputed in 652 observations across five clinical variables before model fitting.

Interpretation

The logistic regression model achieves an AUC of 0.8666 (86.7%), indicating good discrimination between diabetic and non-diabetic patients across all thresholds. At the Youden-optimal threshold of 0.3227, the model reaches sensitivity of 82.8% and specificity of 78.2%. The strongest single predictor of diabetes onset is Glucose. Zero-coded missing values were detected and imputed in 652 observations across five clinical variables before model fitting.

Data Table

Clinical Summary by Diabetes Status

Mean clinical measurements with Welch t-test p-values by diabetes outcome

Predictor	Mean Diabetic	Mean Nondiabetic
Pregnancies	4.866	3.298
Glucose	142.3	110.6
Blood Pressure	75.27	70.84
Skin Thickness	32.67	27.17
Insulin	187.6	117.2
BMI	35.4	30.85
Diabetes Pedigree Function	0.55	0.43
Age	37.07	31.19

Interpretation

8 of 8 clinical measurements differ significantly between diabetic and non-diabetic patients (p < 0.05 by Welch t-test). The most significant predictor is Pregnancies (mean diabetic = 4.866 vs. non-diabetic = 3.298, p = 0). Among the 268 diabetic patients and 500 non-diabetic patients, nearly all continuous measurements are elevated in the diabetic group.

Visualization

Zero-Value Missing Data Profile

Count of biologically impossible zero values by clinical variable

Interpretation

Zero-coded missing values are most prevalent in Insulin with 374 zeros (48.7% of patients). In total, 652 zero-coded values were detected across 5 clinical variables before model fitting. All zeros were replaced with class-conditional medians (computed separately for diabetic and non-diabetic patients) prior to analysis.

Visualization

Clinical Measurements by Diabetes Status

Standardised distributions of top predictors by diabetes outcome

Interpretation

Box plots show standardised (z-score) clinical measurements for diabetic vs. non-diabetic patients across the top 5 predictors ranked by model importance. Standardisation places all variables on a common scale, enabling direct visual comparison of distributional separation. Glucose shows the greatest separation between groups, consistent with its position as the strongest predictor in the logistic regression model.

Visualization

Odds Ratios — Logistic Regression Predictors

Exponentiated logistic regression coefficients

Interpretation

6 of 8 clinical predictors have odds ratio confidence intervals that exclude 1.0, indicating statistically significant associations with diabetes at the 95% confidence level. Diabetes Pedigree Function has the highest odds ratio of 2.198 (95% CI: 1.2235 – 3.9853). An odds ratio above 1.0 indicates increased odds of diabetes per unit increase in that predictor; a CI crossing 1.0 means the effect is not statistically distinguishable from zero.

Visualization

ROC Curve — Model Discrimination

Receiver operating characteristic curve with AUC

Interpretation

The ROC curve plots sensitivity (true positive rate) against 1 - specificity (false positive rate) across all possible classification thresholds. The model achieves an AUC of 0.8666 (86.7%), indicating good discriminative ability for clinical screening. At the Youden-optimal threshold, sensitivity is 82.8% and specificity is 78.2%, meaning the model correctly identifies 82.8% of true diabetic patients while correctly ruling out 78.2% of non-diabetic patients.

Visualization

Confusion Matrix at Optimal Threshold

True and false positives/negatives at Youden's J threshold

Interpretation

At the Youden-optimal threshold of 0.323, the model correctly classifies 222 diabetic patients (true positives) and 391 non-diabetic patients (true negatives), for an overall accuracy of 79.8%. 46 diabetic patients are missed (false negatives) and 109 non-diabetic patients are incorrectly flagged (false positives). Sensitivity of 82.8% means the model catches most true cases, which is important for a clinical screening tool where missed diagnoses carry high cost.

Visualization

Variable Importance by Log-Odds Magnitude

Predictors ranked by absolute standardised log-odds

Interpretation

Glucose is the most important predictor with an absolute standardised log-odds of 0.9265, meaning a one-standard-deviation increase corresponds to the largest shift in diabetes log-odds among all clinical variables. The gap to the second-ranked predictor is 0.4459. Standardising by predictor scale puts all coefficients on a fair footing: variables with different measurement units (e.g., glucose in mg/dL vs. pregnancies as a count) are directly comparable in this ranking.

What's wrong with this card?

Executive Summary

Clinical Summary by Diabetes Status

Zero-Value Missing Data Profile

Clinical Measurements by Diabetes Status

Odds Ratios — Logistic Regression Predictors

ROC Curve — Model Discrimination

Confusion Matrix at Optimal Threshold

Variable Importance by Log-Odds Magnitude

Report an Issue