User 136 · Insurance · Claims · Negative Binomial

Executive Summary

Overdispersion confirmation and key NB regression findings

Observations

1338

Variance-Mean Ratio

1.33

NB Theta

2.597

AIC Improvement (NB)

65.2

LRT P-Value

0

Among 1,338 policyholders the variance-to-mean ratio of 1.3 confirms substantial overdispersion, making negative binomial regression the appropriate model choice. The NB model achieves an AIC improvement of 65.2 over Poisson (LRT p = 0.0000), with NB theta = 2.6 indicating moderate overdispersion. The strongest predictor is Region: Northwest (IRR = 1.1), and all predictors with IRR > 1 are associated with higher expected claim frequency.

Interpretation

Among 1,338 policyholders the variance-to-mean ratio of 1.3 confirms substantial overdispersion, making negative binomial regression the appropriate model choice. The NB model achieves an AIC improvement of 65.2 over Poisson (LRT p = 0.0000), with NB theta = 2.6 indicating moderate overdispersion. The strongest predictor is Region: Northwest (IRR = 1.1), and all predictors with IRR > 1 are associated with higher expected claim frequency.

Visualization

Claim Count Distribution

Raw claim count values — the renderer bins them into a histogram

Interpretation

The distribution of claim counts has a mean of 1.09 and variance of 1.45, yielding a variance-to-mean ratio of 1.3 — well above 1 and confirming overdispersion. 42.9% of policyholders have zero claims, and the long right tail shows that a small fraction of policyholders account for a disproportionate share of claim events — a pattern Poisson cannot accommodate without extra variance. The shape strongly supports the negative binomial specification.

Data Table

Model Comparison: Poisson vs Negative Binomial

AIC, log-likelihood, dispersion ratio, and LRT p-value for both models

Model	Aic	Log Likelihood	Dispersion	P Value
Poisson	3900	-1942	1.5	1
Negative Binomial	3835	-1908	1.13	2.401e-16

Interpretation

The Poisson model has AIC = 3899.8 and log-likelihood = -1941.9. The negative binomial model improves AIC by 65.2 units (AIC = 3834.6), a statistically significant improvement (LRT p = 2.401e-16). The Poisson residual deviance ratio of 1.5 far exceeds 1, confirming overdispersion, while the NB ratio of 1.13 indicates a much better calibrated fit.

Visualization

Mean Claim Count by Region

Observed mean claim counts per geographic region

Interpretation

Geographic region shows modest variation in mean claim counts. Policyholders in northwest have the highest average of 1.15 claims, while those in northeast average 1.05 claims. This raw difference does not control for age, BMI, or smoking status — the IRR chart shows the region effect after accounting for all covariates. All regions contain at least 5 policyholders.

Visualization

Mean Claim Count by Policyholder Group

Observed mean claim counts for smoker/sex subgroups

Interpretation

Policyholder subgroups defined by smoking status and sex reveal clear differences in mean claim counts. The Yes / Male subgroup has the highest average of 1.19 claims, compared to an overall mean of 1.09. Smoking status typically drives the largest subgroup difference because smokers tend to have more frequent health-related events requiring claims. These observed means provide intuition; the IRR card shows adjusted effects.

Visualization

Negative Binomial Coefficients (Incidence Rate Ratios)

IRR with 95% CI for each predictor — values > 1 increase expected claim count

Interpretation

Incidence rate ratios from the negative binomial model show the multiplicative effect of each predictor on expected claim count, holding other variables constant. 0 predictor(s) have IRR significantly above 1 (95% CI excludes 1), indicating increased claim frequency, while 0 predictor(s) are associated with significantly lower frequency. The largest effect belongs to Region: Northwest (IRR = 1.1), meaning that group or unit change multiplies the expected claim count by 1.1x relative to the reference.

Visualization

Predicted vs Actual Claim Counts

NB fitted values vs observed counts — close alignment indicates good model calibration

Interpretation

The scatter of predicted vs actual claim counts assesses whether the negative binomial model captures the observed distribution without systematic bias. The correlation between fitted and observed values is 0.059, and the mean absolute error is 0.98 claims. Points aligned along the diagonal indicate accurate predictions; clusters offset to one side would signal under- or over-prediction for a particular policyholder profile. Discrete integer outcomes produce horizontal bands, which is normal for count models.

What's wrong with this card?

Executive Summary

Claim Count Distribution

Model Comparison: Poisson vs Negative Binomial

Mean Claim Count by Region

Mean Claim Count by Policyholder Group

Negative Binomial Coefficients (Incidence Rate Ratios)

Predicted vs Actual Claim Counts

Report an Issue