Healthcare · Patients · Hospital Stay Prediction P1778698833

Executive Summary

Key findings from hospital length of stay prediction model

Patients Analyzed

2000

Median Predicted Stay

3.8

Top Predictor

readmission_count

Model R²

0.846

Test RMSE (Days)

0.97

Analyzed 2000 patient admissions using random forest regression. The model achieved R² of 0.846 with test RMSE of 0.97 days, predicting median stay of 3.8 days. The strongest predictor is readmission_count, indicating comorbidity and lab status drive hospital duration. Model shows promise for clinical triage applications.

Interpretation

Analyzed 2000 patient admissions using random forest regression. The model achieved R² of 0.846 with test RMSE of 0.97 days, predicting median stay of 3.8 days. The strongest predictor is readmission_count, indicating comorbidity and lab status drive hospital duration. Model shows promise for clinical triage applications.

Overview

Analysis Overview

Patients Analyzed2000

Median Predicted Stay3.8

Top Predictorreadmission_count

Model R²0.846

Test Rmse (Days)0.97

Data Preparation

Data Quality & Preprocessing

Data Quality

Visualization

Length of Stay Distribution

Distribution of hospital stay duration across all patients

Interpretation

Hospital stays range from 1 to 14 days with median of 4.0 days (mean 4.0). The middle 50% of patients stay between 2 and 5 days (IQR = 3 days). This distribution shows typical short admissions with a tail of extended stays, consistent with acute care mixed with complex cases.

Visualization

Length of Stay by Major Comorbidities

Hospital stay duration stratified by end-stage renal disease and dialysis status

Interpretation

Patients with end-stage renal disease on dialysis (n=72) have median stay of 6.0 days versus 4.0 days for those without (n=1928). This 2.0-day difference highlights renal comorbidity as a major driver of extended hospitalizations. Dialysis patients exhibit higher clinical complexity requiring longer treatment and recovery.

Visualization

Feature Correlation Matrix

Pearson correlations between clinical lab values, vitals, and length of stay

Interpretation

Clinical labs and vitals show varying relationships with length of stay. Serum creatinine (renal function marker) and blood urea nitrogen show strong collinearity as expected (correlation = 0.011). The strongest predictor correlation is blood_urea_nitrogen (0.137), suggesting renal and electrolyte status drive hospital duration. Collinear pairs like creatinine-BUN require careful interpretation in regression.

Visualization

Feature Importance Ranking

Random forest feature importance (Mean Decrease Gini) ranked by predictive power

Interpretation

Random forest identifies readmission_count, major_depression, and hemoglobin as the three strongest predictors of length of stay. These three features account for 72% of the model's importance ranking. Comorbidities (dialysis, pneumonia history) combined with lab markers (renal function) dominate, suggesting clinical complexity drives hospital duration.

Visualization

Actual vs. Predicted Length of Stay

Model predictions vs. observed hospital stay duration on test set

Interpretation

The model achieves R² = 0.846 on test set with mean absolute error of 0.69 days. Predictions cluster reasonably around the diagonal, though some underestimation of very long stays (>7 days) is visible. Mean bias is -0.03 days (negative), suggesting the model is well-calibrated on average.

Visualization

Regression Coefficients (Effect Sizes)

Linear regression coefficients showing marginal effect of each predictor on length of stay (days)

Interpretation

Linear regression reveals clinical effects: Readmission Count (5+) increases stay by 5.45 days per unit, while Readmission Count (Level 1) decreases stay by 0.93 days. Positive coefficients identify risk factors extending hospitalization; negative coefficients suggest protective factors. This complements random forest by quantifying specific effect magnitudes for clinical decision support.

Data Table

Model Performance Metrics

Summary of random forest and linear regression model performance on test set

Metric Name	Metric Value
Total Patients	2000
Train Set Size	1600
Test Set Size	400
R² (Test Set)	0.846
RMSE (Days)	0.97
MAE (Days)	0.69
Median Predicted LOS	3.8
Random Forest MTry	5
Random Forest Trees	100

Interpretation

The model was trained on 80% of the 2000 patients and evaluated on test set of 400 patients. Performance metrics show R² = 0.846 with RMSE = 0.97 days, indicating moderate predictive power suitable for clinical triage support (identifying high-risk admissions). Model selection and hyperparameter tuning could further improve accuracy.

What's wrong with this card?

Executive Summary

Analysis Overview

Data Quality & Preprocessing

Length of Stay Distribution

Length of Stay by Major Comorbidities

Feature Correlation Matrix

Feature Importance Ranking

Actual vs. Predicted Length of Stay

Regression Coefficients (Effect Sizes)

Model Performance Metrics

Report an Issue