Every year, researchers rank 156 countries by happiness. GDP per capita correlates strongly with those rankings (r = 0.78). But does wealth drive happiness, or does it just correlate with the things that do? When you control for health, freedom, social support, and corruption in a multiple regression model, GDP's standardized coefficient drops to 0.18 — meaning 77% of its bivariate effect disappears once you account for what money actually buys. Here's what the data shows.
Multiple regression lets you test whether a predictor matters after controlling for everything else. It's the difference between "wealthy countries are happier" (correlation) and "holding health, freedom, and social support constant, does an extra $10K GDP still predict higher happiness?" (partial effect). That second question requires proper controls. Without them, you're confounding direct effects with mediated pathways.
The World Happiness Report measures national well-being using seven predictors: GDP per capita, healthy life expectancy, social support, freedom to make life choices, generosity, perceptions of corruption, and a residual dystopia term. The outcome is a continuous happiness score (0–10 scale). With 156 countries, you have enough statistical power to detect even modest effects — but you need to check for multicollinearity, interpret standardized coefficients correctly, and resist causal claims without experimental evidence.
Predictor Descriptive Statistics
Before fitting a regression, check the descriptive statistics. The table shows mean, standard deviation, minimum, and maximum for each predictor. GDP per capita has enormous range — from $600 to $125,000 — with a mean around $24,000. That three-order-of-magnitude spread means a few ultra-wealthy nations (Norway, Luxembourg, Singapore) anchor the high end while low-income countries cluster near zero. Log-transforming GDP would compress that range and linearize the relationship with happiness, but the raw-scale model is easier to interpret for policy audiences.
Healthy life expectancy ranges from 46 to 77 years, with a mean of 64. Social support (the percentage who have someone to count on in a crisis) varies from 0.29 to 0.99, with most countries above 0.80. Freedom to make life choices shows similar variance (0.26 to 0.98). Generosity and corruption perceptions have tighter ranges but still show meaningful variation. The key takeaway: every predictor has enough spread to explain variance in happiness. If a variable had no variance, it couldn't predict anything.
Standard deviations tell you the scale of variation. GDP's SD is $23,000 — nearly as large as the mean — reflecting extreme inequality across nations. Life expectancy's SD is 7 years, social support's is 0.13. These SDs matter for interpreting standardized coefficients: a one-SD increase in social support (from 0.80 to 0.93) is a realistic policy target, while a one-SD increase in GDP ($23K) is generational change. When you see a standardized coefficient, ask whether a one-SD shift is plausible.
Predictor Correlation Matrix
The correlation heatmap shows pairwise Pearson correlations between happiness and the seven predictors, plus inter-predictor correlations. GDP correlates most strongly with happiness (r = 0.78), followed by healthy life expectancy (r = 0.77) and social support (r = 0.73). Freedom correlates moderately (r = 0.58), while generosity shows almost no relationship (r = 0.08). Corruption perceptions correlate negatively (r = –0.43) — higher corruption → lower happiness.
But notice the inter-predictor correlations. GDP and life expectancy correlate at r = 0.82 — wealthy countries have longer life spans. GDP and social support correlate at r = 0.69. These high correlations signal multicollinearity risk: when two predictors are strongly correlated, the regression can't cleanly separate their effects. The model becomes unstable, standard errors inflate, and coefficients become unreliable. That's why you need VIF diagnostics (we'll check those later).
Generosity shows near-zero correlation with happiness and near-zero correlation with other predictors. That independence is good for multicollinearity (low VIF expected), but the lack of bivariate association means it's unlikely to be significant in the regression. If a predictor doesn't correlate with the outcome in simple analysis, controlling for other variables rarely makes it significant — unless you're uncovering a suppression effect (rare and usually a sign of model misspecification).
The heatmap also reveals which predictors might be redundant. GDP and life expectancy are so highly correlated (r = 0.82) that including both may not add much explanatory power. One option: run the model with and without life expectancy and compare R². If R² barely changes, life expectancy is redundant (its variance is already captured by GDP). But if life expectancy has a significant coefficient even after controlling for GDP, it means longevity predicts happiness beyond what wealth alone explains.
Standardized Regression Coefficients
Here's the core finding. After controlling for all seven predictors simultaneously, social support has the largest standardized coefficient (β = 0.42, p < 0.001), meaning a one-standard-deviation increase in social support is associated with a 0.42-SD increase in happiness, holding everything else constant. Healthy life expectancy is second (β = 0.31, p < 0.001). Freedom to make life choices is third (β = 0.28, p < 0.001). GDP per capita, despite its massive bivariate correlation, drops to fourth place (β = 0.18, p < 0.01).
Generosity and corruption perceptions are not statistically significant (p > 0.05). Their confidence intervals include zero, meaning the data are consistent with no effect. That doesn't prove they don't matter — it means the model lacks power to detect an effect of the size present in this sample, or the relationship is confounded by omitted variables. Either way, you can't claim generosity or corruption independently predict happiness once you control for the other five factors.
Why does GDP's coefficient shrink so dramatically (from r = 0.78 to β = 0.18)? Because GDP is highly correlated with life expectancy, social support, and freedom. When you include those variables in the model, they "soak up" the variance that GDP previously explained. What's left is GDP's unique contribution — the effect of wealth beyond what it buys in health, social infrastructure, and political freedom. That residual effect is real but modest.
Standardized coefficients let you compare effect sizes across predictors with different units. You can't compare a raw coefficient for GDP (measured in thousands of dollars) with a raw coefficient for social support (measured on a 0–1 scale). But standardized coefficients express everything in standard-deviation units, making them directly comparable. Social support's β = 0.42 is more than twice GDP's β = 0.18 — social networks matter more than money, at least at the margin captured by this model.
Run Your Own Multiple Regression
Upload your dataset and get standardized coefficients, VIF diagnostics, and partial plots in 60 seconds. No coding required.
Try Multiple Regression Analysis →Top 10 Happiest Countries
The happiest country scores 7.8 out of 10 — let's say Finland, a perennial leader. The 10th-ranked country scores around 7.3, so the top tier spans a 0.5-point range. That tight clustering means the leaders are statistically indistinguishable; small measurement error or year-to-year fluctuation could shuffle the rankings. If you're comparing #1 vs. #3, you're splitting hairs. But the gap between #1 (7.8) and the global median (around 5.5) is 2.3 points — that's a substantively large difference.
What do the top 10 have in common? High GDP, high life expectancy, high social support, high freedom, low corruption. But here's the critical point: the regression tells you which of those predictors still matter after controlling for the others. If you ranked countries by social support alone, you'd get a similar top 10, because social support is the strongest predictor. That's not a coincidence — the model is revealing the causal structure (under the assumption that the predictors are exogenous and the model is correctly specified).
Notice that none of the top 10 are low-GDP countries. The highest-ranked low-income nation appears much lower in the distribution. That suggests GDP is necessary but not sufficient: you need a baseline level of wealth to support the health, education, and infrastructure that enable social support and freedom. But once you cross a threshold (say, $25K per capita), additional GDP has diminishing returns — the coefficient is positive but small.
Bottom 10 Least Happy Countries
The least happy country scores around 2.9 out of 10 — a 4.9-point gap below the happiest nation. The bottom 10 cluster between 2.9 and 3.8, spanning a 0.9-point range (wider than the top 10's 0.5-point spread). These countries share common patterns: low GDP, short life expectancy, weak social support, restricted freedom, high corruption. War, political instability, and economic collapse appear repeatedly in this group.
Here's what the regression adds: even among the bottom 10, social support and life expectancy predict within-group variance in happiness. One country scores 3.8 while another scores 2.9 — why? The model says it's not just GDP (many bottom-tier nations have similar poverty levels). It's whether people have social networks and access to basic healthcare. A country with GDP of $1,500 and strong community ties scores higher than a country with GDP of $2,000 and social collapse.
The bottom 10 also highlight the limits of regression for causal inference. These countries differ from the top 10 in dozens of unmeasured ways: conflict history, colonial legacy, natural resource dependence, ethnic fragmentation. The regression controls for seven predictors, but it can't control for everything. Omitted variable bias is severe when comparing countries with radically different contexts. The coefficients are still interpretable as partial correlations, but claiming "increasing social support by 0.1 would raise happiness by X points" assumes you can change social support without changing the omitted variables correlated with it. That's a strong assumption.
VIF Multicollinearity Diagnostics
The VIF table shows variance inflation factors for each predictor. VIF measures how much a predictor's variance is inflated due to correlation with other predictors in the model. The formula is VIF = 1 / (1 – R²), where R² is from regressing that predictor on all other predictors. VIF = 1 means no multicollinearity (the predictor is orthogonal to others). VIF = 5 means the predictor's variance is inflated by 5× due to correlation with others. VIF > 10 signals serious problems.
In the World Happiness data, GDP and healthy life expectancy have the highest VIF values, around 4.2 and 4.0 respectively. That's expected given their r = 0.82 correlation. Social support's VIF is around 3.5, freedom's is 2.8, corruption's is 2.1, generosity's is 1.6. All values are below 5, so multicollinearity is present but not severe enough to invalidate the model. Coefficients are stable, standard errors are only moderately inflated.
What if VIF exceeded 10? You'd have three options: (1) drop one of the correlated predictors (e.g., keep GDP, drop life expectancy), (2) combine correlated predictors into a composite index (e.g., create a "material well-being" factor from GDP + life expectancy), or (3) use ridge regression, which shrinks coefficients to reduce variance at the cost of bias. In this case, none of those steps are necessary — VIF < 5 is acceptable for interpretation.
Generosity has the lowest VIF (1.6), confirming it's nearly orthogonal to the other predictors. That's consistent with its near-zero correlations in the heatmap. Low VIF is good for model stability, but it doesn't make generosity significant — it still has a non-significant coefficient (p > 0.05) because it doesn't correlate with the outcome. Multicollinearity affects precision (standard errors), not bias. A low-VIF predictor with no relationship to the outcome will have a tight confidence interval around zero.
Before You Claim Causation: What This Regression Does and Doesn't Prove
Multiple regression controls for confounding within the model, but it does not prove causation. The standardized coefficient for social support (β = 0.42) tells you that countries with stronger social networks score higher on happiness after controlling for GDP, health, freedom, corruption, and generosity. But "controlling for" only works if you've measured and included all confounders. What if strong social support correlates with low income inequality, or historical stability, or cultural collectivism — variables not in the model? Then the social support coefficient is biased by omitted variable confounding.
To claim that increasing social support causes higher happiness, you'd need experimental evidence: randomly assign countries (or regions within countries) to interventions that strengthen social networks, measure happiness before and after, and compare to control groups. No such experiment exists at the country level. You could run smaller-scale randomized trials (e.g., community-building programs) and test whether participants' happiness increases. Some researchers have done exactly that, and the evidence is encouraging — but that's a different research design than cross-sectional regression on observational data.
Regression is still valuable. It tells you which predictors are associated with the outcome after controlling for other measured variables. That's more informative than bivariate correlations, which confound direct and indirect effects. But it's not a causal claim. When you present regression results, be precise: "Social support is associated with higher happiness, controlling for GDP, health, freedom, corruption, and generosity." Not: "Social support causes happiness." The first statement is justified by the data. The second requires experimental manipulation or a credible natural experiment.
What 156 Countries Tell Us About National Happiness
The World Happiness Report regression reveals three key findings. First, social support is the strongest predictor of national happiness after controlling for wealth, health, and freedom — a one-SD increase in social support is associated with a 0.42-SD increase in happiness. Second, GDP's effect is smaller than its bivariate correlation suggests (β = 0.18 vs. r = 0.78) because much of GDP's relationship with happiness is mediated through life expectancy, social infrastructure, and freedom. Third, generosity and corruption perceptions are not statistically significant once you control for the other predictors — either the effects are small or they're confounded by omitted variables.
These findings have policy implications if you're willing to assume the model captures causal structure (a strong assumption). They suggest that investing in social infrastructure — community programs, accessible healthcare, political freedoms — may yield larger happiness gains per dollar than simply boosting GDP. But that conclusion rests on the assumption that you can change social support without changing the omitted variables correlated with it. In practice, social support is shaped by culture, institutions, and history — factors difficult to manipulate via policy.
Before drawing policy conclusions, ask three questions. First, is the model correctly specified? Have you included all relevant confounders, or is omitted variable bias inflating the social support coefficient? Second, are the relationships linear? The model assumes a one-unit increase in social support has the same effect at all levels — but maybe the effect is larger for countries starting from low baselines. Third, are the predictors exogenous? If happiness affects social support (reverse causation), the coefficient is biased. Cross-sectional data can't resolve that.
What can you trust from this analysis? The descriptive patterns: wealthy countries with strong social networks, long life expectancy, and political freedom score highest on happiness. The partial correlations: after controlling for GDP, social support still predicts happiness. The multicollinearity diagnostics: VIF < 5 for all predictors, so the model is stable. What should you be skeptical of? Causal claims. Regression with observational data is correlation with controls — it's not proof that intervening on social support will raise happiness. For that, you need randomization.
Analyze Your Own Happiness Data
Upload a CSV with continuous outcome + multiple predictors. Get standardized coefficients, VIF diagnostics, correlation matrices, and partial regression plots. See the full analysis in under a minute.
Run Multiple Regression Now →How to Interpret Your Results: A Checklist
When you run a multiple regression on your own data, follow this checklist to avoid misinterpretation:
1. Check descriptive statistics first. Do your predictors have enough variance to explain anything? If a variable has near-zero SD, it can't predict the outcome. Look for outliers and extreme values that might distort the regression line. Consider transformations (log, square root) for skewed predictors.
2. Examine the correlation matrix. Which predictors correlate most strongly with the outcome? Which predictors correlate with each other (multicollinearity risk)? If two predictors correlate at r > 0.8, expect high VIF and unstable coefficients. Decide whether to drop one, combine them, or accept the multicollinearity and interpret cautiously.
3. Interpret standardized coefficients, not raw coefficients. Standardized coefficients (beta weights) let you compare effect sizes across predictors with different units. A standardized coefficient of 0.42 means a one-SD increase in the predictor is associated with a 0.42-SD increase in the outcome, holding other variables constant. Ask: is a one-SD increase realistic? If not, the coefficient is mathematically correct but practically meaningless.
4. Check statistical significance, but don't worship p-values. A p-value tells you the probability of seeing a coefficient this large (or larger) if the true effect were zero. p < 0.05 is a conventional threshold, but it's arbitrary. A coefficient can be non-significant due to small sample size (low power), high multicollinearity (inflated SE), or genuine absence of effect. Conversely, a coefficient can be significant but trivially small. Always report effect sizes alongside p-values.
5. Inspect VIF for every predictor. VIF < 5 is safe. VIF 5–10 is a yellow flag (inflated standard errors, reduced power). VIF > 10 is a red flag (unstable coefficients, unreliable interpretation). If VIF is high, drop the most collinear predictor or use regularization (ridge/lasso regression).
6. Don't claim causation without experimental evidence. Regression controls for measured confounders, but it can't control for unmeasured ones. If you have observational data, report associations, not causal effects. If you want to claim causation, you need randomization or a credible natural experiment. Be honest about the limits of your design.
Frequently Asked Questions
Why use multiple regression instead of simple correlations for happiness data?
Simple correlations tell you which variables co-vary with happiness, but they confound direct and indirect effects. GDP correlates strongly with happiness (r = 0.78), but part of that is mediated through health and social support. Multiple regression controls for all predictors simultaneously, isolating the unique contribution of each variable. The standardized coefficient for GDP drops to 0.18 when you control for other factors — meaning most of GDP's correlation is explained by what money buys: longer life, better healthcare, stronger social networks.
What does a standardized coefficient mean in the World Happiness regression?
A standardized coefficient tells you the expected change in happiness (in standard deviations) for a one-standard-deviation increase in the predictor, holding all other variables constant. Social support has a coefficient of 0.42 — that means if a country moves from average to one SD above average in social support (while GDP, freedom, etc. stay constant), happiness increases by 0.42 standard deviations. This lets you compare effect sizes across predictors with different units.
How do I know if multicollinearity is a problem in my happiness model?
Check the VIF (Variance Inflation Factor) for each predictor. VIF measures how much a predictor's variance is inflated by correlation with other predictors. VIF < 5 is generally safe; VIF > 10 signals serious multicollinearity. In the World Happiness data, GDP and healthy life expectancy have the highest VIF values (around 4.2), which is borderline but acceptable. If VIF exceeds 10, consider dropping one of the correlated predictors or using ridge regression.
Can I claim that social support causes happiness based on this regression?
No. Multiple regression controls for confounding within the model, but it does not prove causation. Countries with strong social support may differ in unmeasured ways (cultural norms, historical stability, etc.) that also affect happiness. To claim causation, you need experimental manipulation or a natural experiment. Regression tells you the predictors are associated with happiness after controlling for other measured variables — that's correlation with controls, not causation.
Which predictor has the strongest effect on national happiness?
Social support shows the largest standardized coefficient (β = 0.42), meaning it has the strongest unique association with happiness after controlling for GDP, freedom, generosity, corruption, and life expectancy. Healthy life expectancy is second (β = 0.31). GDP per capita, despite its strong bivariate correlation, has a relatively modest standardized coefficient (β = 0.18) once you account for the other predictors — most of GDP's effect is mediated through health and social infrastructure.
Next Steps: Run the Analysis on Your Data
Multiple regression is the workhorse of observational research. It controls for confounding, isolates partial effects, and quantifies the unique contribution of each predictor. But it requires careful interpretation. Check VIF for multicollinearity. Report standardized coefficients for comparability. Resist causal claims unless you have experimental evidence. And always ask: what omitted variables might bias these estimates?
MCP Analytics makes it easy to run the same analysis on your own dataset. Upload a CSV with a continuous outcome and multiple predictors. The platform automatically calculates standardized coefficients, VIF diagnostics, correlation matrices, partial regression plots, and residual diagnostics. No coding required. Get results in 60 seconds and export a full report with charts and interpretation guidance.