Analytics · Retail · Diamonds · Price Drivers
Executive Summary

Executive Summary

Top-line finding on the strongest diamond price driver and overall model accuracy

n_observations
500
rf_r2
0
lm_r2
0.9546
lm_rmse_price
1095
median_price
514
mean_carat
0.752
top_importance
18.7
clarity is the dominant price driver, accounting for 18.7% of random forest importance across 500 diamonds. Linear regression explains 95.5% of log-price variation (R² = 0.955). Median retail price in this dataset is $514.
Interpretation

clarity is the dominant price driver, accounting for 18.7% of random forest importance across 500 diamonds. Linear regression explains 95.5% of log-price variation (R² = 0.955). Median retail price in this dataset is $514.

Visualization

Feature Importance Ranking

Random forest importance showing which diamond attributes explain the most price variation

Interpretation

clarity is the most important feature, explaining 18.7% of random forest node impurity reduction. carat is second at 17.8%. Importance scores are normalized to sum to 100% so relative rankings are directly comparable. The random forest achieved an out-of-bag R² of 0.

Visualization

Regression Coefficients

OLS coefficients showing direction and magnitude of each attribute's effect on log(price)

Interpretation

Coefficients are from an OLS regression of log(price) on encoded features. length_mm has the largest positive coefficient (2.3112), corresponding to approximately 908.7% price change per grade. Model R² = 0.955; positive values increase price, negative values decrease it.

Visualization

Carat vs Price

Scatter of carat vs price by cut grade, showing the carat-price relationship across quality tiers

Interpretation

Scatter plot of carat weight versus retail price, colored by cut grade, sampled to 500 points. Pearson r = 0.821 — carat and price are strongly correlated. Within each cut tier, higher-carat diamonds show wider price dispersion, reflecting the multiplicative interaction between weight and quality.

Visualization

Price Distribution by Cut

Box plots of price distributions by cut grade, showing median and spread differences

Interpretation

Box plots show the price distribution within each cut grade across 500 diamonds. Good has the highest median price ($698) and Premium has the lowest ($398). Note that carat weight confounds this comparison — larger diamonds can be cut at any quality level, widening the price range within each tier.

Visualization

Average Price by Color Grade

Mean price per color grade (D through J), testing the monotonic decline hypothesis

Interpretation

Average retail price by GIA color grade, ordered from D (colorless, most valuable) to J (most color). D-grade diamonds average $2,620 vs $1,077 for J-grade in this dataset. The decline is not strictly monotone, likely due to confounding with carat size within each color tier.

Data Table

Model Performance

Model fit metrics for both random forest and linear regression (R-squared, RMSE)

metricvalue
N (observations)500
LM R-squared0.955
LM RMSE (price, USD)$1095
RF OOB R-squared0.000
Interpretation

Summary of model fit statistics across 500 diamonds. The linear regression explains 95.5% of log-price variation, which is a strong fit suitable for price estimation. Random forest OOB R² = 0.000.

Your data has more stories to tell. Run any analysis on your own data — 60+ validated R modules, interactive reports, AI insights, and PDF export. 2,000 free credits on signup.
Try Free — No Signup Sign Up Free

Report an Issue

Tell us what's wrong. You'll get a free re-run of this analysis so you can try again with different parameters. If the re-run still doesn't meet your expectations, we'll refund your credits.

Want to run this analysis on your own data? Upload CSV — Free Analysis See Pricing