A 35-year-old smoker with a BMI of 32 walks into your insurance office. Your pricing model—built on age, BMI, and smoking status as independent factors—quotes $18,000 annually. Your competitor quotes $28,000 for the same person. Who's right?
Eighteen months later, you've paid out $42,000 in medical claims for this policyholder. Your competitor, who understood interaction effects, is profitable. You're not. This is the $10,000 lesson that medical insurance cost prediction with GLM interaction terms teaches: risk factors don't just add up—they multiply each other in ways that standard additive models catastrophically underestimate.
When health insurers analyze claims data, they discover something actuaries have known for decades but many pricing models ignore: a smoker with high BMI doesn't cost "smoker premium + BMI premium" more than baseline. The risks interact. Smoking accelerates the cardiovascular and metabolic damage of obesity in an exponential, not linear, fashion. Miss this interaction, and you'll systematically underprice your highest-risk policies by 40-60%.
This article walks through a real medical insurance cost prediction model—built with generalized linear models (GLM) on 1,338 insured individuals—showing exactly how interaction effects between smoking status and BMI reshape the cost landscape. You'll see the distribution of medical charges, the smoking premium, the BMI slope that differs by smoking status, and how a properly specified GLM captures these multiplicative effects. By the end, you'll know when your pricing model needs interaction terms and how to validate whether they're working.
The Experimental Question: Do Risk Factors Interact or Merely Co-occur?
Before we fit any model, let's frame the research question properly. We're not asking "do smokers cost more than non-smokers?" (they obviously do) or "does higher BMI correlate with higher charges?" (it clearly does). We're asking a mechanistic question about functional form: when someone is both a smoker and has high BMI, is their expected cost the sum of two independent effects, or do those effects amplify each other?
This is where most pricing models fail. They assume an additive structure:
Expected Cost = β₀ + β₁(Age) + β₂(BMI) + β₃(Smoker) + β₄(Region) + ε
This model says: "Being a smoker adds $X to your baseline cost. Having a BMI of 35 adds $Y. If you're both, you pay $X + $Y more." But biological reality doesn't work this way. Smoking and obesity interact at the metabolic level—nicotine disrupts insulin sensitivity, obesity increases inflammation, and together they create a cardiovascular risk profile that is exponentially worse than either alone.
The correct model includes an interaction term:
Expected Cost = β₀ + β₁(Age) + β₂(BMI) + β₃(Smoker) + β₄(BMI × Smoker) + β₅(Region) + ε
That β₄ term captures whether the BMI-to-cost slope is different for smokers versus non-smokers. If β₄ is large and significant, it means BMI's effect on cost depends on smoking status—exactly the interaction we hypothesize. Let's see if the data confirm it.
Distribution of Medical Charges
The first thing you notice about medical insurance claims: they're not normally distributed. This histogram of 1,338 policyholders shows a sharp right skew. The majority of insured individuals cluster between $1,000 and $10,000 in annual charges, with a median around $9,400. But a long tail extends past $50,000, with a handful of policyholders exceeding $60,000.
This distribution shape has two critical implications for modeling. First, ordinary least squares regression assumes normally distributed errors. Apply OLS to this data and your residuals will be heavily right-skewed, violating homoscedasticity and making your standard errors unreliable. You'll underestimate uncertainty in the high-cost tail where you most need accurate predictions.
Second, the skew tells you where the financial risk lives. The top 10% of policyholders by cost generate roughly 40-50% of total claims dollars. If your pricing model systematically underestimates costs for this high-risk segment—say, by missing the smoker×BMI interaction—you won't just be a little wrong on average. You'll be catastrophically wrong on the policies that determine your profitability.
This is why we use a generalized linear model with a Gamma distribution and log link rather than standard linear regression. The Gamma distribution naturally handles right-skewed positive continuous data, and the log link ensures predictions stay positive while modeling multiplicative effects—perfect for capturing how smoking and BMI interact to amplify costs exponentially, not additively.
Average Charges by Smoker Status
Smokers pay an average of $32,050 in annual medical charges. Non-smokers pay $8,434. That's a 3.8× multiplier—or a $23,616 smoking premium. This gap is the single largest predictor of medical costs in the dataset, dwarfing age, BMI, and regional differences.
But here's what this simple comparison doesn't tell you: whether that 3.8× multiplier is constant across all BMI levels, or whether it grows as BMI increases. If a 120-pound smoker costs 3.8× more than a 120-pound non-smoker, but a 250-pound smoker costs 6× more than a 250-pound non-smoker, you've got an interaction effect. The smoking premium isn't fixed—it scales with BMI.
This is the limitation of univariate comparisons. They answer "do smokers cost more?" (yes, obviously) but not "does the smoking effect depend on other risk factors?" To answer that, we need to look at the joint distribution of BMI and charges, stratified by smoking status. That's exactly what the next chart reveals.
Why Sample Size Matters for Interaction Effects
Notice that only 274 of the 1,338 policyholders (20.5%) are smokers. To reliably estimate a smoker×BMI interaction, you need sufficient observations in all four quadrants: low-BMI non-smokers, high-BMI non-smokers, low-BMI smokers, and high-BMI smokers. With only 274 smokers, if BMI is evenly distributed, you might have just 50-70 high-BMI smokers driving the interaction term. That's enough for detection here, but barely. If smokers were only 10% of your sample, you'd need 2,000+ total observations to estimate the interaction with adequate power. Underpowered interaction terms show up as statistically insignificant even when the true effect is large.
BMI vs Charges by Smoking Status
Now we see the interaction visually. The scatter plot shows BMI on the x-axis and medical charges on the y-axis, with non-smokers in blue and smokers in orange. Two regression lines trace the relationship between BMI and cost for each group. For non-smokers, the slope is nearly flat—BMI barely predicts cost. Charges stay clustered between $2,000 and $15,000 across the entire BMI range from 15 to 50.
But for smokers, the slope is steep and positive. At BMI 20, smokers average around $18,000-20,000 in annual charges. By BMI 35, that climbs to $35,000-40,000. By BMI 45+, some smokers exceed $50,000. The gap between the blue and orange regression lines widens as BMI increases. This is the visual signature of an interaction effect: the slope of BMI→charges depends on smoking status.
Think about what this means for pricing. If you fit an additive model, you estimate one average BMI slope across all policyholders—something like "+$200 per BMI point." That slope is too steep for non-smokers (who show almost no BMI effect) and too shallow for smokers (whose BMI slope is much steeper). Your model will overprice low-BMI smokers, underprice high-BMI smokers, and get the non-smokers roughly right by accident. The high-BMI smokers—the ones costing $40,000-50,000—are exactly the segment you'll systematically underprice, losing money on every policy you write.
An interaction model, by contrast, estimates two separate slopes: one for non-smokers (flat, near zero) and one for smokers (steep, positive). This flexibility lets the model capture the reality visible in this chart: BMI is a weak predictor for non-smokers and a dominant predictor for smokers. The GLM coefficient for smoker×BMI quantifies exactly how much steeper the smoker slope is.
GLM Predictor Effects
This horizontal bar chart ranks the GLM predictors by their standardized effect size on medical charges. The interaction term—smoker:bmi—is the largest bar, with an effect magnitude around 1.5-1.6 standard deviations. The main effect of smoking (smokeryes) comes second, at roughly 1.2. BMI as a main effect is third, near 0.8. Age and the regional dummies (southeast, southwest, northwest) trail far behind, each contributing less than 0.3 standard deviations.
This ranking confirms what the scatter plot showed: the interaction term dominates the model. In plain language, the GLM is telling you: "The most important thing I need to know to predict your medical costs is not just whether you smoke, or what your BMI is, but specifically the combination of your smoking status and BMI." A non-smoker with BMI 35 is low-risk. A smoker with BMI 25 is moderate-risk. A smoker with BMI 35 is extremely high-risk—higher than either factor alone would suggest.
The fact that the interaction term has a larger effect than the smoking main effect is mathematically possible because of how GLM coefficients compose. With a log link, the predicted cost is:
log(Cost) = β₀ + β₁(Age) + β₂(BMI) + β₃(Smoker) + β₄(BMI × Smoker) + ...
Cost = exp(β₀ + β₁·Age + β₂·BMI + β₃·Smoker + β₄·BMI·Smoker + ...)
When Smoker=1 and BMI=35, the exponent includes both β₃ (the main smoking effect) and β₄·35 (the interaction effect scaled by BMI). If β₄ is large, that interaction term can contribute more to the log-cost than the main effect alone. This is exactly what's happening here: the interaction term amplifies costs for high-BMI smokers beyond what the additive model would predict.
Interpreting GLM Coefficients with a Log Link
In a GLM with a log link, coefficients represent multiplicative effects on the outcome, not additive shifts. If β₃ (the smoking coefficient) is 1.2, that doesn't mean smokers pay $1.20 more—it means smoking multiplies expected cost by exp(1.2) ≈ 3.3×. Similarly, if β₄ (the interaction coefficient) is 0.05, each additional BMI point multiplies cost by an extra exp(0.05) ≈ 1.05× for smokers versus non-smokers. This compounds: at BMI 30, the interaction contributes exp(0.05 × 30) ≈ 4.5× to the cost multiplier. That's why interactions in log-linked models can produce exponential cost increases.
Average Charges by Region
The Southeast region shows the highest average medical charges at $14,735, followed by the Northeast at $13,406, the Northwest at $12,418, and the Southwest at $12,347. The Southeast-to-Southwest gap is about $2,400—or roughly 19% higher costs in the Southeast.
Is this large enough to matter for pricing? It depends on your market strategy. If you're a national insurer writing policies in all four regions, a 19% cost differential is absolutely actionable. You should be charging higher premiums in the Southeast to maintain consistent margins. If you ignore regional variation and charge uniform national rates, you'll overprice in the Southwest (losing market share to competitors) and underprice in the Southeast (losing money on every policy).
But notice: the regional effect is much smaller than the smoking effect (3.8× multiplier) or the smoker×BMI interaction (the largest predictor in the GLM). If you have limited modeling complexity budget—say, you're a small regional insurer building a simple pricing tool—should you include region as a predictor? That's a judgment call. The GLM includes it, and it's statistically significant, but it contributes less than 0.2 standard deviations to the prediction (refer back to the GLM Predictor Effects chart). You could drop region and still capture 90%+ of explainable variance with age, BMI, smoking, and the interaction term.
One hypothesis for the Southeast's higher costs: smoking prevalence and obesity rates are both higher in the Southeast compared to other US regions. If the Southeast has 25% smokers while the Southwest has 15%, and BMI distributions skew higher, then the regional difference might be mediated by smoking and BMI rather than caused by region per se. The GLM coefficients show the direct regional effect after controlling for smoking and BMI; the total regional difference (visible in this chart) includes both the direct effect and the compositional effect of more smokers and higher BMIs in the Southeast.
Actual vs Predicted Charges
This scatter plot shows actual charges (x-axis) versus GLM-predicted charges (y-axis), with the 45-degree diagonal line representing perfect predictions. Points above the line are over-predictions (the model expected higher costs than occurred); points below are under-predictions (actual costs exceeded the model's forecast).
The first thing to check: is there systematic bias? If the model consistently under-predicted high-cost policies, you'd see most points in the $30,000-60,000 range falling below the diagonal. Visually, the points scatter fairly symmetrically around the line across the full range of charges, suggesting no strong systematic bias. The model isn't consistently over- or under-pricing any particular cost segment.
The second thing to check: how tight is the scatter around the diagonal? Prediction intervals are wide at the individual level—many points deviate ±$5,000 to ±$10,000 from the diagonal—but that's expected given the right-skewed distribution of charges. Medical costs for any individual are inherently variable: two identical 40-year-old smokers with BMI 30 might have $25,000 and $45,000 in charges due to randomness in whether they develop complications this year. The GLM predicts the expected value (the center of the distribution), not the individual outcome.
For pricing purposes, what matters is portfolio-level accuracy. If you write 500 policies predicted to average $20,000 in charges, do they actually average close to $20,000? The law of large numbers means individual-level noise cancels out at the portfolio level. This chart suggests the model is well-calibrated: no visible trend in residuals, no funnel-shaped heteroscedasticity, no clear outlier clusters where the model fails.
One test you should run but can't see in this chart: out-of-sample validation. This model was fit on 1,338 observations. Did the analyst hold out 20% of data for validation, or is this chart showing in-sample fit? In-sample R² is always optimistic; the true test is whether the model generalizes to new policyholders. For a production pricing model, you should re-fit quarterly on historical data and validate on the most recent quarter's actuals. If out-of-sample prediction error stays below 15-20%, the model is ready for pricing. If it drifts above 25%, re-fit immediately—your cost landscape has shifted.
From Predictions to Prices: Adding the Risk Load
The GLM predicts expected medical costs—the average you'll pay out per policyholder. But premiums aren't just expected cost; they include a risk load (profit margin and reserve for uncertainty) and administrative overhead. A typical formula: Premium = Expected Cost × (1 + Risk Load) + Admin Fee. For individual health insurance, risk loads are 15-25% of expected cost. If the GLM predicts $18,000 in expected charges and you target a 20% margin, you'd quote $18,000 × 1.20 + $500 = $22,100. The GLM gives you the expected cost component; you add business judgment for the margin based on your risk appetite and competitive environment.
How to Interpret Your Results
You've now walked through six analysis cards showing the full lifecycle of a medical insurance cost prediction model: the skewed distribution of charges, the smoking premium, the visual evidence of interaction effects, the GLM coefficient ranking, regional variation, and prediction accuracy. Let's consolidate this into a decision framework for when you should use this approach—and what to watch out for.
Use GLM with interaction effects when:
- You suspect multiplicative risk. If domain knowledge suggests risk factors amplify each other (smoking × obesity, age × pre-existing conditions), test for interactions. Don't assume additive structure just because it's simpler.
- You have sufficient sample size. With 1,300+ observations and 20% smokers, this dataset had barely enough power to estimate the smoker×BMI interaction reliably. If you have fewer than 500 observations or rare risk factor combinations (e.g., 5% prevalence), interactions will be unstable. Rule of thumb: you need at least 30-50 observations in each combination cell (smoker×high-BMI, non-smoker×high-BMI, etc.) to estimate interactions.
- Your outcome is right-skewed and positive. Medical charges, insurance claims, time-to-event data—these call for GLM with Gamma or inverse Gaussian distributions and a log link. Don't use OLS on right-skewed outcomes; residuals will violate homoscedasticity and you'll underestimate uncertainty in the high-cost tail.
- Pricing accuracy matters financially. If underpricing high-risk policies costs you millions, invest in the modeling. If you're running a retrospective analysis for a research paper, simpler models may suffice. But for actuarial pricing? Get the interactions right or pay the price in adverse selection.
Red flags that your model needs re-examination:
- Interaction terms are insignificant. If you include smoker×BMI and it's not statistically significant (p > 0.05), either the interaction truly doesn't exist (unlikely given medical evidence), or you lack power to detect it. Check your sample size in each risk subgroup. If you have only 20 high-BMI smokers, the coefficient will be noisy.
- Residuals show patterns. Plot residuals versus fitted values and versus each predictor. If you see a funnel shape (variance increasing with fitted values), your error distribution is wrong—try a different GLM family. If residuals trend up or down across a predictor range, you're missing a nonlinear term or interaction.
- Out-of-sample error exceeds 25%. In-sample R² of 0.75 looks good, but if out-of-sample mean absolute error is 30% of actual charges, your model isn't generalizing. This usually means overfitting (too many predictors for your sample size) or distributional shift (your validation set comes from a different population).
- High-cost policies are systematically under-predicted. If actual charges in the top decile average $55,000 but your predictions average $40,000, you're underpricing the segment that drives profitability. Re-examine whether your GLM family is appropriate (Gamma vs. Tweedie) and whether you need polynomial or spline terms for continuous predictors.
When Interaction Effects Are Worth the Complexity Cost
Every additional term you add to a model—especially interactions—increases the risk of overfitting and makes the model harder to explain to stakeholders. A pricing manager can understand "smokers pay 3× more and we add $200 per BMI point." Explaining "the BMI coefficient is 0.02 for non-smokers and 0.05 for smokers due to a significant interaction term" requires more statistical sophistication. So when is the complexity worth it?
Run this test: fit two models on the same training data. Model A is additive (no interactions). Model B includes the smoker×BMI interaction. Hold out 20% of your data for validation. Compare out-of-sample prediction accuracy. If Model B reduces mean absolute error by less than 5%, the interaction isn't worth the added complexity—stick with Model A for simplicity and interpretability. If Model B reduces error by 10-15% or more, the interaction is capturing real signal and you should use it.
In this dataset, the interaction term was the largest predictor in the GLM—larger than the main smoking effect. That's a slam-dunk case for inclusion. But not every interaction will be that strong. Test rigorously, validate out-of-sample, and don't add interactions just because you can. Add them when they materially improve prediction accuracy on held-out data.
Run This Analysis on Your Insurance Data
Upload your policy data (age, BMI, smoking status, region, medical charges) and get a full GLM cost prediction report in 60 seconds. The analysis automatically tests for smoker×BMI and age×BMI interactions, fits the optimal GLM family (Gamma vs. Tweedie), and generates all six analysis cards shown in this article—plus coefficient tables, residual diagnostics, and out-of-sample validation metrics.
The Three Questions Every Pricing Model Must Answer
Before you deploy any insurance cost prediction model into production pricing, validate it against these three questions. They separate research-grade models from business-ready tools.
1. Does the model predict accurately across all risk segments, or just on average?
Overall R² of 0.75 looks good, but if you're systematically under-predicting smokers with BMI > 35 (the highest-cost, highest-margin segment), you'll lose money even with a high R². Stratify your validation data by risk segment (low-cost, medium-cost, high-cost tertiles; smokers vs. non-smokers; high-BMI vs. low-BMI) and calculate mean absolute percentage error (MAPE) separately for each. Your MAPE should be consistent across segments—ideally within 5-10 percentage points. If high-cost smokers have 40% MAPE but low-cost non-smokers have 15% MAPE, your model is miscalibrated where it matters most.
2. Is the model stable over time, or does it degrade as the population shifts?
Fit your model on Year 1 data. Validate on Year 2 data. If prediction error doubles, your model isn't capturing the true cost drivers—it's fitting noise or temporary patterns. Medical costs trend upward 3-5% annually due to healthcare inflation. Your model should either include a year/trend term or be re-fit quarterly on rolling windows of recent data. A model fit once in 2024 and never updated will systematically under-predict in 2026 unless it accounts for trend.
3. Can the model detect adverse selection before it kills your portfolio?
If you price policies at predicted cost + 20% margin, competitors who price at predicted cost + 25% will attract different customers. You'll attract price-sensitive shoppers who know they're high-risk; competitors will attract low-risk customers who are willing to pay a bit more for brand or service. Within six months, your portfolio's average cost will rise above your predictions because you've adversely selected high-risk lives. Monitor your portfolio's actual-to-expected cost ratio monthly. If it drifts above 1.10, you're experiencing adverse selection and need to re-price or re-segment.
Frequently Asked Questions
Why do interaction effects matter in insurance pricing?
Interaction effects capture how risk factors combine in non-additive ways. A high BMI smoker doesn't just cost more than a low BMI smoker plus the BMI effect—the smoking and BMI risks multiply each other. Standard additive models underestimate this exponential cost increase by 40-60%, leading to systematic underpricing of the highest-risk policies.
What sample size do I need for reliable GLM interaction terms?
For detecting a smoker×BMI interaction with 80% power, you need at least 400-600 observations with balanced representation across smoking status and BMI ranges. If smokers represent only 10% of your data, you'll need 2,000+ total observations to ensure enough high-BMI smokers to estimate the interaction reliably. Underpowered models will show unstable interaction coefficients.
How do I know if my GLM predictions are accurate enough for pricing?
Check three metrics: (1) R² above 0.70 indicates the model explains most cost variance, (2) residual plots should show no systematic over/under-prediction patterns across risk segments, and (3) out-of-sample validation error should be within 15-20% of actual charges. If your model consistently under-predicts for high-cost groups, you're pricing too low and will lose money.
Can I use this model for individual premium quotes or just portfolio analysis?
Both, but with different confidence intervals. For individual quotes, prediction intervals are wide (±$3,000-5,000) because individual medical costs vary dramatically. For portfolio pricing across 500+ similar policies, the law of large numbers narrows your confidence interval to ±$500-800. Use the model for individual quotes but validate portfolio performance quarterly.
Should I re-fit this model monthly, quarterly, or annually?
Re-fit quarterly and validate monthly. Medical cost trends shift with healthcare inflation (3-5% annually), policy mix changes, and seasonal utilization patterns. Monitor your model's prediction error each month—if actual costs drift more than 10% from predicted in any risk segment, re-fit immediately. Annual re-fitting is too slow for dynamic insurance markets.
What to Do Next
You've seen how a GLM with interaction effects reveals the multiplicative structure of insurance risk—how smoking and BMI combine to create exponential cost increases that additive models miss. You've learned when interactions are worth the complexity cost (when they reduce out-of-sample error by 10%+ and are powered by sufficient sample size), and how to validate whether your model is ready for production pricing (check segment-level accuracy, temporal stability, and adverse selection monitoring).
Now apply this to your own portfolio. Pull your last 12-24 months of claims data. Include age, BMI, smoking status, and any other risk factors your underwriting collects. Fit two models—one additive, one with interactions—and compare out-of-sample accuracy on a held-out validation set. If the interaction model wins by a meaningful margin, you've just found 10-15% improvement in pricing accuracy. That translates directly to better risk selection, higher margins, and fewer underpriced policies.
Before you build your own model from scratch, run the sample analysis shown in this article. Upload your data to the MCP Analytics insurance cost prediction tool, which automatically tests for interactions, selects the optimal GLM family, and generates the six analysis cards you saw here—plus coefficient tables, residual diagnostics, and validation metrics. You'll have a full report in 60 seconds, and you can use it as a baseline to compare against any custom models you build.