Healthcare · Patients · Hospital Stay Prediction P1778698833
Executive Summary

Executive Summary

Key findings from hospital length of stay prediction model

Patients Analyzed
2000
Median Predicted Stay
3.8
Top Predictor
readmission_count
Model R²
0.846
Test RMSE (Days)
0.97
Analyzed 2000 patient admissions using random forest regression. The model achieved R² of 0.846 with test RMSE of 0.97 days, predicting median stay of 3.8 days. The strongest predictor is readmission_count, indicating comorbidity and lab status drive hospital duration. Model shows promise for clinical triage applications.
Interpretation

Analyzed 2000 patient admissions using random forest regression. The model achieved R² of 0.846 with test RMSE of 0.97 days, predicting median stay of 3.8 days. The strongest predictor is readmission_count, indicating comorbidity and lab status drive hospital duration. Model shows promise for clinical triage applications.

Overview

Analysis Overview

Analysis Overview

Patients Analyzed2000
Median Predicted Stay3.8
Top Predictorreadmission_count
Model R²0.846
Test Rmse (Days)0.97
Data Preparation

Data Quality & Preprocessing

Data Quality

Visualization

Length of Stay Distribution

Distribution of hospital stay duration across all patients

Interpretation

Hospital stays range from 1 to 14 days with median of 4.0 days (mean 4.0). The middle 50% of patients stay between 2 and 5 days (IQR = 3 days). This distribution shows typical short admissions with a tail of extended stays, consistent with acute care mixed with complex cases.

Visualization

Length of Stay by Major Comorbidities

Hospital stay duration stratified by end-stage renal disease and dialysis status

Interpretation

Patients with end-stage renal disease on dialysis (n=72) have median stay of 6.0 days versus 4.0 days for those without (n=1928). This 2.0-day difference highlights renal comorbidity as a major driver of extended hospitalizations. Dialysis patients exhibit higher clinical complexity requiring longer treatment and recovery.

Visualization

Feature Correlation Matrix

Pearson correlations between clinical lab values, vitals, and length of stay

Interpretation

Clinical labs and vitals show varying relationships with length of stay. Serum creatinine (renal function marker) and blood urea nitrogen show strong collinearity as expected (correlation = 0.011). The strongest predictor correlation is blood_urea_nitrogen (0.137), suggesting renal and electrolyte status drive hospital duration. Collinear pairs like creatinine-BUN require careful interpretation in regression.

Visualization

Feature Importance Ranking

Random forest feature importance (Mean Decrease Gini) ranked by predictive power

Interpretation

Random forest identifies readmission_count, major_depression, and hemoglobin as the three strongest predictors of length of stay. These three features account for 72% of the model's importance ranking. Comorbidities (dialysis, pneumonia history) combined with lab markers (renal function) dominate, suggesting clinical complexity drives hospital duration.

Visualization

Actual vs. Predicted Length of Stay

Model predictions vs. observed hospital stay duration on test set

Interpretation

The model achieves R² = 0.846 on test set with mean absolute error of 0.69 days. Predictions cluster reasonably around the diagonal, though some underestimation of very long stays (>7 days) is visible. Mean bias is -0.03 days (negative), suggesting the model is well-calibrated on average.

Visualization

Regression Coefficients (Effect Sizes)

Linear regression coefficients showing marginal effect of each predictor on length of stay (days)

Interpretation

Linear regression reveals clinical effects: Readmission Count (5+) increases stay by 5.45 days per unit, while Readmission Count (Level 1) decreases stay by 0.93 days. Positive coefficients identify risk factors extending hospitalization; negative coefficients suggest protective factors. This complements random forest by quantifying specific effect magnitudes for clinical decision support.

Data Table

Model Performance Metrics

Summary of random forest and linear regression model performance on test set

Metric NameMetric Value
Total Patients2000
Train Set Size1600
Test Set Size400
R² (Test Set)0.846
RMSE (Days)0.97
MAE (Days)0.69
Median Predicted LOS3.8
Random Forest MTry5
Random Forest Trees100
Interpretation

The model was trained on 80% of the 2000 patients and evaluated on test set of 400 patients. Performance metrics show R² = 0.846 with RMSE = 0.97 days, indicating moderate predictive power suitable for clinical triage support (identifying high-risk admissions). Model selection and hyperparameter tuning could further improve accuracy.

Your data has more stories to tell. Run any analysis on your own data — 60+ validated R modules, interactive reports, AI insights, and PDF export. 2,000 free credits on signup.
Try Free — No Signup Sign Up Free

Report an Issue

Tell us what's wrong. You'll get a free re-run of this analysis so you can try again with different parameters. If the re-run still doesn't meet your expectations, we'll refund your credits.

Want to run this analysis on your own data? Upload CSV — Free Analysis See Pricing