User 136 · Health · Patients · Stroke Risk Factors
Executive Summary

Executive Summary

Top-line stroke risk findings: AUC, top odds ratios, and key predictors

Total Patients
5110
Stroke Cases
249
Stroke Rate (%)
4.87
Model AUC
0.8457
BMI Values Imputed
201
Top Risk Predictor
Hypertension
The logistic regression model achieves AUC = 0.8457 on predicting stroke from clinical and demographic predictors in 5110 patients (stroke prevalence 4.87%). The strongest adjusted predictor is Hypertension (OR = 1.485). XGBoost feature importance confirms the top-ranked predictors, with the highest-gain feature being Age. BMI imputation covered 201 missing values using the median.
Interpretation

The logistic regression model achieves AUC = 0.8457 on predicting stroke from clinical and demographic predictors in 5110 patients (stroke prevalence 4.87%). The strongest adjusted predictor is Hypertension (OR = 1.485). XGBoost feature importance confirms the top-ranked predictors, with the highest-gain feature being Age. BMI imputation covered 201 missing values using the median.

Data Table

Risk Factor Summary by Stroke Outcome

Mean clinical values by stroke vs no-stroke group

StrokeAgeBMIHypertensionHeart DiseaseAvg Glucose Level
No Stroke4228.80.0890.047104.8
Stroke67.730.10.2650.189132.5
Interpretation

Stroke patients are on average 67.7 years old versus 42 years for non-stroke patients — a substantial age gap. Mean average glucose level is 132.5 in stroke patients, reflecting the link between hyperglycemia and cerebrovascular risk. Hypertension and heart disease rates are both markedly higher in the stroke group, consistent with their roles as primary cardiovascular risk factors.

Visualization

Age Distribution by Stroke Outcome

Age distribution comparison between stroke and non-stroke patients

Interpretation

The median age for stroke patients is 71 years versus 43 years for non-stroke patients — a difference of 28 years. The age distribution for stroke cases is visibly shifted higher and more right-skewed, confirming age as the single most powerful continuous predictor of stroke risk. Even the lower quartile of the stroke group exceeds the median age of non-stroke patients.

Visualization

Stroke Rate by Clinical Risk Factor Group

Raw stroke incidence rates by clinical and demographic subgroups

Interpretation

Among clinical subgroups, Heart Disease has the highest raw stroke incidence at 17.03%. Patients with hypertension or pre-existing heart disease show stroke rates roughly 3-4x the overall cohort average of 4.87%. These unadjusted rates reflect confounding (e.g., older patients are more likely to have hypertension), so the logistic regression odds ratios provide the adjusted picture.

Visualization

Logistic Regression Odds Ratios

Adjusted odds ratios with 95% confidence intervals

Interpretation

Odds ratios are adjusted for all other predictors in the model. The strongest independent risk factor is Hypertension (OR = 1.485). 4 predictor(s) have 95% CIs entirely above 1.0 (i.e., independently elevate stroke risk): Hypertension, Age, Avg. Glucose Level, NA. Error bars show 95% profile-likelihood confidence intervals.

Visualization

XGBoost Feature Importance

Non-linear feature importance by gain metric from gradient-boosted trees

Interpretation

XGBoost gain importance ranks Age as the most predictive feature (gain = 0.55998), followed by BMI. Gain measures the average improvement in model accuracy when a feature is used in a split. Comparing this ranking with the logistic regression odds ratios reveals whether non-linear effects or interactions change which predictors matter most.

Visualization

Risk Factor Correlation Matrix

Correlation matrix of numeric risk factors to detect multicollinearity

Interpretation

The heatmap shows Pearson correlations among numeric risk factors. The maximum off-diagonal correlation is 0.324. Age and average glucose level tend to be moderately correlated with stroke, but correlations among predictors are generally low, indicating the logistic regression coefficient estimates are not severely inflated by multicollinearity.

Visualization

Stroke Rate by Smoking Status

Stroke prevalence by smoking status category

Interpretation

Among smoking status categories, 'formerly smoked' has the highest stroke rate at 7.91% versus the overall rate of 4.87%. Smoking status is a self-reported behavioral variable with a meaningful 'unknown' category, which may reflect missingness or historical non-recording. The adjusted logistic regression separates smoking's independent effect from its correlation with other risk factors like age and hypertension.

Your data has more stories to tell. Run any analysis on your own data — 60+ validated R modules, interactive reports, AI insights, and PDF export. 2,000 free credits on signup.
Try Free — No Signup Sign Up Free

Report an Issue

Tell us what's wrong. You'll get a free re-run of this analysis so you can try again with different parameters. If the re-run still doesn't meet your expectations, we'll refund your credits.

Want to run this analysis on your own data? Upload CSV — Free Analysis See Pricing