Executive Summary
OLS regression findings on demographic and preparation score drivers
Across 1000 students, OLS regression explains 25.5% of math score variance (R² = 0.255) using demographic and preparation factors alone. Completing the test preparation course is associated with a 5.6-point lift in math scores. Students on a standard lunch program score approximately 8.6 composite points higher than those on free/reduced lunch, reflecting the socioeconomic gradient captured by the lunch proxy variable.
Math Score Drivers (OLS Coefficients)
Regression coefficients with 95% CI showing each factor's effect on math scores
Each bar shows the OLS coefficient for one predictor level — the expected point change in math score relative to the reference category, holding all other factors constant. The largest effect is Lunch: Standard (+10.9 points). 7 of 12 predictor levels are statistically significant at the 95% level. Bars crossing zero indicate no statistically distinguishable effect.
Reading Score Drivers (OLS Coefficients)
Regression coefficients with 95% CI showing each factor's effect on reading scores
Reading score coefficients follow a similar pattern to math but typically show stronger effects for the test preparation and socioeconomic proxies. The largest predictor effect is Test Prep: None (-7.4 points). 8 of 12 predictor levels reach statistical significance. Confidence intervals that do not include zero represent reliable, reproducible differences.
Writing Score Drivers (OLS Coefficients)
Regression coefficients with 95% CI showing each factor's effect on writing scores
Writing scores tend to show the strongest test prep effect among all three subjects. The largest OLS coefficient is Test Prep: None (-10.1 points). 9 of 12 predictor levels are statistically significant at the 95% confidence level. Comparing writing coefficients to math and reading reveals whether preparation interventions have uniform or subject-specific impacts.
Test Prep Score Lift Across Subjects
Mean score difference: test prep completed vs. not completed
Students who completed the test preparation course outperform non-completers by an average of 7.6 points across all three subjects. The largest lift is in Writing (9.9 points). These are unadjusted differences in group means, so they include any selection effect (students who chose to complete prep may differ in other ways). The regression coefficients on the previous cards show the prep effect after controlling for other demographic variables.
Composite Score by Parental Education Level
Average composite (math + reading + writing) score by highest parental education
There is a clear education gradient: students whose parents have Master's Degree achieve the highest composite scores, while those with High School parents score lowest. The gap between the top and bottom parental education tiers is approximately 10.5 composite score points. This gradient highlights the socioeconomic dimension of academic performance and is consistent with findings in the broader education research literature.
Lunch Program Effect on Composite Score
Average composite score by lunch program participation (SES proxy)
Students on a standard lunch program score an average of 8.6 composite points higher than those on free or reduced-price lunch. Lunch program participation is a well-established proxy for household income in US education data. Students with standard lunch (Standard) consistently outperform their peers across all three subjects, confirming that socioeconomic status is a major driver of test performance independent of test preparation.
OLS Model Fit Summary
R-squared, adjusted R-squared, RMSE, and observation count per outcome
| Outcome | R Squared | Adj R Squared | RMSE | N Obs |
|---|---|---|---|---|
| Math | 0.255 | 0.246 | 13.08 | 1000 |
| Reading | 0.227 | 0.218 | 12.83 | 1000 |
| Writing | 0.334 | 0.326 | 12.39 | 1000 |
The demographic and preparation variables collectively explain 27.2% of score variance on average across all three subjects (mean R² = 0.272). The average RMSE of 12.77 points means model predictions are typically within about 12.77 score points of the true value. The similar R-squared values across math, reading, and writing suggest the same demographic factors drive all three outcomes roughly equally, with no subject that is dramatically easier or harder to predict from demographics alone.