Executive Summary
Key findings from the elastic net regression on wine quality
Elastic net regression on 1599 wines retains 9 of 11 physicochemical features at the optimal lambda. The strongest predictor is Alcohol. Elastic net achieves an RMSE of 0.647 vs OLS RMSE of 0.646 — a 0.2% difference from regularization. The model explains 35.8% of variance in wine quality scores.
Wine Quality Distribution
Histogram of wine quality scores across all observations
Wine quality scores range from 3 to 8 with a mean of 5.64 and standard deviation of 0.81. 82.5% of wines score 5 or 6, creating a roughly bell-shaped but slightly left-skewed distribution. The concentration of scores in the middle range (5–6) limits the model's ability to predict extreme quality scores (3–4 or 7–8), which have fewer training examples.
Feature Correlation Matrix
Pairwise Pearson correlations among all features and the target
The correlation matrix reveals 6 feature pairs with |r| > 0.5, indicating meaningful multicollinearity. The strongest correlation is between Fixed Acidity and Ph (r = -0.683). High inter-feature correlations justify elastic net over OLS: L2 regularization stabilizes coefficients when predictors share explanatory power. Quality shows its own pattern of correlations with chemical features, visible in the bottom row.
Regularization Path (Coefficient Shrinkage)
How feature coefficients shrink as the regularization penalty increases
As log(lambda) increases from left to right, the penalty grows stronger and more coefficients shrink toward zero. At the optimal log(lambda) = -3.985, 9 of 11 features survive with non-zero coefficients. Features that remain non-zero across a wide range of lambdas are the most robust predictors of wine quality, while those that drop out early contribute little beyond what other features already capture.
Cross-Validation Error Curve
CV mean squared error vs log(lambda) with one-standard-error band
Cross-validation MSE reaches its minimum of 0.4252 at log(lambda) = -3.985 (the optimal penalty). The lambda.1se rule selects log(lambda) = -2.217 (MSE = 0.4419), which is within one standard error of the minimum — this simpler model is preferred when parsimony matters. The confidence band narrows around the minimum, indicating reliable lambda selection, and widens at extreme values where the model is either under- or over-regularized.
Elastic Net Coefficients (Surviving Features)
Non-zero elastic net coefficients at the optimal lambda
9 features survive elastic net regularization at the optimal lambda. The strongest predictor is Alcohol with coefficient 0.3005. 5 feature(s) push quality scores upward; 4 push quality downward. Features not shown were reduced to zero by the L1 penalty and are not needed to explain wine quality variation in this dataset.
OLS vs Elastic Net — Model Performance
Comparison of RMSE, MAE, and R-squared between OLS and elastic net
Elastic net RMSE = 0.647 vs OLS RMSE = 0.6456 (difference: -0.0014). Elastic net R-squared = 0.3577 vs OLS R-squared = 0.3606 (difference: -0.0029). Note that in-sample metrics favor OLS (it minimizes training error exactly); the elastic net's regularization pays off in out-of-sample generalization, especially when features are correlated.
Predicted vs Actual Quality (Elastic Net)
Scatter of predicted vs actual wine quality scores
Each point compares the elastic net's predicted quality score to the actual score. Points on the diagonal are perfect predictions; points above over-predict, below under-predict. Pearson correlation between actual and predicted is 0.598. Among the plotted points, 96 are over-predicted by more than 0.5 and 112 under-predicted by more than 0.5, with clustering around scores 5–6.
Residual Distribution
Distribution of elastic net residuals to assess model adequacy
Elastic net residuals have mean = 0 and standard deviation = 0.647. 88.9% of residuals fall within 1 quality-score unit of zero. Shapiro-Wilk test: W = 0.9903, p = 0 — distribution is showing mild deviation from normality. A roughly bell-shaped distribution centered at zero confirms model adequacy; fat tails or asymmetry would suggest the model systematically misses extreme quality wines.