User 136 · Food · Wine · Quality Drivers
Executive Summary

Executive Summary

Key findings from random forest feature importance and linear regression analysis

Number of Wines
1599
Quality Range
3–8
Top RF Predictor
Sulphates
Model R-Squared
0.361
Adj R-Squared
0.356
Top LM Predictor
Alcohol
Across 1599 red wines rated 3–8, the random forest model identifies Sulphates as the single strongest predictor of quality by mean decrease in accuracy. The linear regression model (R² = 0.361, explaining 36.1% of quality variance) confirms Alcohol as the predictor with the largest standardized effect, with volatile acidity acting as a key negative driver. Both methods consistently point to alcohol content and volatile acidity as the dominant physicochemical levers for wine quality.
Interpretation

Across 1599 red wines rated 3–8, the random forest model identifies Sulphates as the single strongest predictor of quality by mean decrease in accuracy. The linear regression model (R² = 0.361, explaining 36.1% of quality variance) confirms Alcohol as the predictor with the largest standardized effect, with volatile acidity acting as a key negative driver. Both methods consistently point to alcohol content and volatile acidity as the dominant physicochemical levers for wine quality.

Visualization

Quality Score Distribution

Frequency distribution of expert quality scores across 1,599 red wines

Interpretation

The dataset contains 1599 red wine samples rated on a 3–8 quality scale. Quality score 5 is the most common with 681 wines. Mid-range scores of 5 and 6 together account for 82.5% of the dataset, meaning the models primarily differentiate average-quality wines from one another. Scores of 3 and 8 represent edge cases with very few examples.

Visualization

Feature Correlation Matrix

Pairwise Pearson correlations across all 11 physicochemical features and quality score

Interpretation

The correlation heatmap shows pairwise Pearson correlations between all 12 variables. The feature most positively correlated with quality is Alcohol (r = 0.476). The strongest negative correlation with quality belongs to Volatile Acidity (r = -0.391), confirming it as a quality-reducing factor. High correlations between predictor pairs (such as total and free sulfur dioxide) indicate multicollinearity that the regression model must navigate.

Visualization

Random Forest Feature Importance

Mean decrease in accuracy for each of the 11 physicochemical features

Interpretation

The random forest ranks all 11 physicochemical features by mean decrease in accuracy — how much prediction quality drops when each feature's values are randomly shuffled. Sulphates scores highest with an importance of 54.459, making it the dominant predictor of wine quality in this dataset. Residual Sugar ranks lowest, contributing least incremental predictive power. Features with near-zero importance provide little signal beyond what others already capture.

Visualization

Alcohol Content by Quality Score

Box plot showing alcohol content distribution for each expert quality rating

Interpretation

Box plots show the distribution of alcohol content (% by volume) separately for each quality score. Wines rated 8 have a median alcohol content of 12.2%, compared to 9.9% for wines rated 3. The upward trend in median alcohol across quality scores confirms alcohol as a strong positive driver. The overlap between adjacent quality groups reflects that alcohol alone does not fully determine quality.

Visualization

Volatile Acidity by Quality Score

Box plot showing volatile acidity distribution for each expert quality rating

Interpretation

Volatile acidity (acetic acid content) shows a clear decreasing pattern as quality rises. Wines rated 8 have a median volatile acidity of 0.37 g/L, substantially lower than the 0.845 g/L median for wines rated 3. This confirms volatile acidity as a key negative quality driver: higher levels introduce a vinegar-like taste that experts consistently penalize. Low-quality wines display greater spread, suggesting other confounders are also at play.

Visualization

Linear Regression Coefficients

Standardized beta coefficients for all 11 physicochemical predictors

Interpretation

Standardized beta coefficients show each feature's effect on quality in comparable units, regardless of measurement scale. The overall regression model explains 36.1% of quality variance (R² = 0.361). Alcohol has the largest positive effect (β = 0.3645), while Volatile Acidity (β = -0.2403) negatively impacts quality. Bars pointing right indicate quality-boosting properties; bars pointing left indicate quality-reducing ones.

Visualization

Alcohol vs Volatile Acidity by Quality Tier

Scatter plot of the two dominant quality drivers, colored by quality tier

Interpretation

This scatter plot maps each wine's alcohol content (x-axis) against volatile acidity (y-axis), with color indicating quality tier: High (7–8), Medium (5–6), or Low (3–4). High-quality wines cluster in the upper-right of the alcohol axis and lower volatile acidity region (mean alcohol: 11.5%, mean volatile acidity: 0.406 g/L). Low-quality wines trend toward lower alcohol and higher volatile acidity (mean alcohol: 10.2%, mean volatile acidity: 0.724 g/L). The diagonal separation confirms that the two strongest individual predictors together create clear visual quality clusters.

Data Table

Mean Physicochemical Profile by Quality Score

Mean alcohol, volatile acidity, sulphates, and citric acid by quality rating group

Quality ScoreCountMean AlcoholMean Volatile AcidityMean SulphatesMean Citric Acid
3109.960.8840.570.171
45310.270.6940.5960.174
56819.90.5770.6210.244
663810.630.4970.6750.274
719911.470.4040.7410.375
81812.090.4230.7680.391
Interpretation

This table summarizes mean values of four key physicochemical properties across each quality score group. Quality score 8 wines have the highest mean alcohol content (12.09%), while quality score 7 wines show the lowest mean volatile acidity (0.404 g/L). Sulphates and citric acid both show a general upward trend with quality, though with less dramatic separation than alcohol and volatile acidity. This table covers 1599 wines, excluding any quality groups with fewer than 5 samples.

Your data has more stories to tell. Run any analysis on your own data — 60+ validated R modules, interactive reports, AI insights, and PDF export. 2,000 free credits on signup.
Try Free — No Signup Sign Up Free

Report an Issue

Tell us what's wrong. You'll get a free re-run of this analysis so you can try again with different parameters. If the re-run still doesn't meet your expectations, we'll refund your credits.

Want to run this analysis on your own data? Upload CSV — Free Analysis See Pricing