User 136 · Research · Regression · Elastic Net

Executive Summary

Key findings from the elastic net regression on wine quality

N Observations

1599

Elastic Net RMSE

0.647

Elastic Net R-squared

0.3577

OLS RMSE

0.6456

OLS R-squared

0.3606

Optimal Log Lambda

-3.9845

Features Retained

9

Alpha (Mixing Param)

0.5

Elastic net regression on 1599 wines retains 9 of 11 physicochemical features at the optimal lambda. The strongest predictor is Alcohol. Elastic net achieves an RMSE of 0.647 vs OLS RMSE of 0.646 — a 0.2% difference from regularization. The model explains 35.8% of variance in wine quality scores.

Interpretation

Elastic net regression on 1599 wines retains 9 of 11 physicochemical features at the optimal lambda. The strongest predictor is Alcohol. Elastic net achieves an RMSE of 0.647 vs OLS RMSE of 0.646 — a 0.2% difference from regularization. The model explains 35.8% of variance in wine quality scores.

Visualization

Wine Quality Distribution

Histogram of wine quality scores across all observations

Interpretation

Wine quality scores range from 3 to 8 with a mean of 5.64 and standard deviation of 0.81. 82.5% of wines score 5 or 6, creating a roughly bell-shaped but slightly left-skewed distribution. The concentration of scores in the middle range (5–6) limits the model's ability to predict extreme quality scores (3–4 or 7–8), which have fewer training examples.

Visualization

Feature Correlation Matrix

Pairwise Pearson correlations among all features and the target

Interpretation

The correlation matrix reveals 6 feature pairs with |r| > 0.5, indicating meaningful multicollinearity. The strongest correlation is between Fixed Acidity and Ph (r = -0.683). High inter-feature correlations justify elastic net over OLS: L2 regularization stabilizes coefficients when predictors share explanatory power. Quality shows its own pattern of correlations with chemical features, visible in the bottom row.

Visualization

Regularization Path (Coefficient Shrinkage)

How feature coefficients shrink as the regularization penalty increases

Interpretation

As log(lambda) increases from left to right, the penalty grows stronger and more coefficients shrink toward zero. At the optimal log(lambda) = -3.985, 9 of 11 features survive with non-zero coefficients. Features that remain non-zero across a wide range of lambdas are the most robust predictors of wine quality, while those that drop out early contribute little beyond what other features already capture.

Visualization

Cross-Validation Error Curve

CV mean squared error vs log(lambda) with one-standard-error band

Interpretation

Cross-validation MSE reaches its minimum of 0.4252 at log(lambda) = -3.985 (the optimal penalty). The lambda.1se rule selects log(lambda) = -2.217 (MSE = 0.4419), which is within one standard error of the minimum — this simpler model is preferred when parsimony matters. The confidence band narrows around the minimum, indicating reliable lambda selection, and widens at extreme values where the model is either under- or over-regularized.

Visualization

Elastic Net Coefficients (Surviving Features)

Non-zero elastic net coefficients at the optimal lambda

Interpretation

9 features survive elastic net regularization at the optimal lambda. The strongest predictor is Alcohol with coefficient 0.3005. 5 feature(s) push quality scores upward; 4 push quality downward. Features not shown were reduced to zero by the L1 penalty and are not needed to explain wine quality variation in this dataset.

Visualization

OLS vs Elastic Net — Model Performance

Comparison of RMSE, MAE, and R-squared between OLS and elastic net

Interpretation

Elastic net RMSE = 0.647 vs OLS RMSE = 0.6456 (difference: -0.0014). Elastic net R-squared = 0.3577 vs OLS R-squared = 0.3606 (difference: -0.0029). Note that in-sample metrics favor OLS (it minimizes training error exactly); the elastic net's regularization pays off in out-of-sample generalization, especially when features are correlated.

Visualization

Predicted vs Actual Quality (Elastic Net)

Scatter of predicted vs actual wine quality scores

Interpretation

Each point compares the elastic net's predicted quality score to the actual score. Points on the diagonal are perfect predictions; points above over-predict, below under-predict. Pearson correlation between actual and predicted is 0.598. Among the plotted points, 96 are over-predicted by more than 0.5 and 112 under-predicted by more than 0.5, with clustering around scores 5–6.

Visualization

Residual Distribution

Distribution of elastic net residuals to assess model adequacy

Interpretation

Elastic net residuals have mean = 0 and standard deviation = 0.647. 88.9% of residuals fall within 1 quality-score unit of zero. Shapiro-Wilk test: W = 0.9903, p = 0 — distribution is showing mild deviation from normality. A roughly bell-shaped distribution centered at zero confirms model adequacy; fat tails or asymmetry would suggest the model systematically misses extreme quality wines.

What's wrong with this card?

Executive Summary

Wine Quality Distribution

Feature Correlation Matrix

Regularization Path (Coefficient Shrinkage)

Cross-Validation Error Curve

Elastic Net Coefficients (Surviving Features)

OLS vs Elastic Net — Model Performance

Predicted vs Actual Quality (Elastic Net)

Residual Distribution

Report an Issue