User 136 · Insurance · Claims · Cost Prediction
Executive Summary

Executive Summary

Key metrics from the medical insurance cost GLM

Observations
1338
R-Squared
0.7216
RMSE
8060.35
Model MAE
4206.44
Smoker Premium
23615.96
The GLM with smoker×BMI interaction explains 72.2% of variance in medical charges (R² = 0.722) across 1338 policyholders. Smokers pay on average $23,616 more per year than non-smokers. The model's root mean squared error is $8,060, indicating the typical prediction error relative to actual annual charges.
Interpretation

The GLM with smoker×BMI interaction explains 72.2% of variance in medical charges (R² = 0.722) across 1338 policyholders. Smokers pay on average $23,616 more per year than non-smokers. The model's root mean squared error is $8,060, indicating the typical prediction error relative to actual annual charges.

Visualization

Distribution of Medical Charges

Histogram of individual annual medical insurance charges

Interpretation

Medical charges are strongly right-skewed: the median is $9,382 but the mean is $13,270 — pulled up by high-cost outliers. The maximum charge reaches $63,770. This skew validates the use of a Gamma GLM with log link, which models multiplicative rather than additive cost drivers.

Visualization

Average Charges by Smoker Status

Mean annual insurance charges comparing smokers to non-smokers

Interpretation

Smokers incur substantially higher medical costs than non-smokers. The average annual charge for smokers is $32,050 vs $8,434 for non-smokers — a gap of $23,616. Smoking status is the single strongest predictor in the model, and this gap is amplified further by the BMI interaction term.

Visualization

BMI vs Charges by Smoking Status

Scatter plot of BMI against annual charges, colored by smoking status

Interpretation

The scatter reveals a clear interaction: among smokers, higher BMI is strongly associated with higher charges (r = 0.81), while among non-smokers the relationship is much weaker (r = 0.08). This diverging slope visually confirms the smoker×BMI interaction term in the GLM is capturing a real phenomenon.

Visualization

GLM Predictor Effects

GLM log-scale coefficients sorted by absolute magnitude

Interpretation

The strongest predictor in the model is 'Region: Southwest' (coefficient = -0.155 on the log scale). In a Gamma GLM with log link, each coefficient represents a multiplicative shift in expected charges: positive values increase cost, negative values reduce it. The smoker×BMI interaction appears among the top effects, confirming that smoking amplifies the cost impact of excess BMI.

Visualization

Average Charges by Region

Mean annual insurance charges by US geographic region

Interpretation

Regional variation in average charges is moderate. The highest-cost region is 'southeast' ($14,735 mean) and the lowest is 'southwest' ($12,347 mean). These regional differences likely reflect differences in healthcare costs and possibly demographic composition, though they are smaller in magnitude than the smoker premium.

Visualization

Actual vs Predicted Charges

Scatter of actual vs model-predicted annual charges

Interpretation

Points cluster around the 45-degree line (perfect prediction), with the model explaining 72.2% of charge variance. The RMSE of $8,060 reflects the typical absolute prediction error. Systematic under-prediction at very high charges is common in Gamma GLMs due to extreme outliers in the right tail of the cost distribution.

Visualization

Residuals vs Fitted Values

GLM diagnostic: residuals plotted against fitted values

Interpretation

Ideally, residuals should be randomly scattered around zero across all fitted values. Patterns in this plot — such as a funnel shape or curve — indicate heteroscedasticity or missing non-linear terms. Any systematic structure at high fitted values may reflect the heavy right tail of charges, which the Gamma distribution partially addresses but cannot eliminate entirely.

Data Table

Descriptive Statistics

Summary statistics for all numeric variables in the dataset

VariableMeanMedianSDMinMax
Age39.213914.051864
BMI30.6630.46.115.9653.13
Children1.0911.2105
Charges1.327e+0493821.211e+0411226.377e+04
Interpretation

The dataset contains 1338 complete observations. Medical charges show the widest relative spread: mean $13,270 vs median $9,382, confirming strong right skew. BMI averages around 30.7 and age spans from 18 to 64 years.

Your data has more stories to tell. Run any analysis on your own data — 60+ validated R modules, interactive reports, AI insights, and PDF export. 2,000 free credits on signup.
Try Free — No Signup Sign Up Free

Report an Issue

Tell us what's wrong. You'll get a free re-run of this analysis so you can try again with different parameters. If the re-run still doesn't meet your expectations, we'll refund your credits.

Want to run this analysis on your own data? Upload CSV — Free Analysis See Pricing