Analysis overview and configuration
| Parameter | Value | _row |
|---|---|---|
| significance_level | 0.05 | significance_level |
| min_group_size | 5 | min_group_size |
This analysis tests whether math scores differ significantly across five student ethnic groups using the Kruskal-Wallis non-parametric test. The test is appropriate for comparing medians across multiple groups when data may not meet normality assumptions. Understanding whether meaningful score differences exist between groups is critical for identifying potential equity gaps in educational outcomes.
The analysis confirms statistically significant differences in math scores across ethnic groups. However, the small effect size indicates that while differences are real and reproducible, group membership alone is a
Data preprocessing and column mapping
| Metric | Value |
|---|---|
| Initial Rows | 1,000 |
| Final Rows | 1,000 |
| Rows Removed | 0 |
| Retention Rate | 100% |
This section documents the data cleaning and preparation phase for the Kruskal-Wallis statistical analysis comparing five groups. Perfect retention indicates no missing values, duplicates, or outliers were removed during preprocessing, which is critical for maintaining statistical power in the hypothesis test and ensuring the final sample accurately represents the underlying population.
The complete retention of all observations strengthens the Kruskal-Wallis test results (H=57.08, p<0.001) by maximizing sample size and statistical power. With no rows excluded, the group distributions reflect the raw data without artificial filtering. However, the absence of any data cleaning raises questions about whether missing values, outliers, or data quality issues were genuinely absent or simply not addressed during preprocessing.
The lack of train/test split is appropriate for this statistical inference task,
| Finding | Value |
|---|---|
| K-W Test Result | Significant Group Differences Found (p < 0.001) |
| Effect Size | eps2=0.0571 (Small effect) |
| Post-Hoc Pairs | 5/10 significant (Dunn, Bonferroni) |
| Groups Analyzed | 5 |
| Total N | 1000 |
This analysis tested whether five distinct groups exhibit statistically different distributions across a measured outcome variable using the Kruskal-Wallis non-parametric test. The results confirm significant group-level differences, establishing that group membership is a meaningful predictor of performance or outcome variation across the 1,000 observations analyzed.
The analysis confirms genuine group-level differences in outcome distributions, though the small effect size indicates these differences, while statistically robust, account for modest variance. Group E substantially outperforms other groups, particularly Groups A
Kruskal-Wallis H-test results with H statistic, degrees of freedom, p-value, and effect size
| statistic | degrees_freedom | p_value | effect_size | effect_magnitude | significant | n_total | n_groups |
|---|---|---|---|---|---|---|---|
| 57.08 | 4 | 0 | 0.0571 | Small | True | 1000 | 5 |
This section tests whether meaningful differences exist in the distribution of values across five groups using a non-parametric approach. The Kruskal-Wallis H-test is ideal for this dataset because it evaluates rank-based distributions without assuming normality, making it robust for the observed data structure with 1,000 observations across unequal group sizes.
The test confirms statistically significant differences in distributions across the five groups. However, the small effect size indicates these differences, while real, account for minimal variance in the outcome. Post-hoc Dunn comparisons reveal that Group E drives most significance, showing substantial separation from Groups A, B, C, and D, whereas early groups cluster more closely together.
Group medians with interquartile range (IQR) error bars for visual comparison
This section visualizes the central tendency and spread of each group's distribution using medians and interquartile ranges (IQR). By displaying where the middle 50% of data falls for each group, it provides a non-parametric view of group differences that complements the Kruskal-Wallis test result and enables visual identification of potential pairwise differences before post-hoc testing.
The data reveals a clear upward trend in central tendency from Group A through Group E, with Group E substantially elevated. The Kruskal-Wallis test (H=57.08,
Overlaid distributions per group with density curves to visualize shape differences
This section visualizes the complete distributional characteristics across five groups to identify whether differences stem from location shifts (medians), spread variations, or shape asymmetries. The Kruskal-Wallis test detects all these distributional differences simultaneously, so examining overlaid distributions helps pinpoint the specific nature of group disparities beyond central tendency alone.
The significant Kruskal-Wallis test (p≈0) indicates that at least one group's distribution differs meaningfully from others. Given the overall dataset symmetry, these differences likely manifest as location shifts (higher/lower central values) rather than shape distortions. The median summary data shows Group E (median=74.
Dunn post-hoc pairwise comparisons with Bonferroni-adjusted p-values and significance indicators
| comparison | group1 | group2 | z_stat | p_value | p_adj | significant | sig_label |
|---|---|---|---|---|---|---|---|
| group A - group B | group A | group B | -1.299 | 0.194 | 1 | False | ns |
| group A - group C | group A | group C | -1.829 | 0.0674 | 0.6741 | False | ns |
| group B - group C | group B | group C | -0.5719 | 0.5674 | 1 | False | ns |
| group A - group D | group A | group D | -3.473 | 5.00e-04 | 0.0051 | True | ** |
| group B - group D | group B | group D | -2.721 | 0.0065 | 0.065 | False | ns |
| group C - group D | group C | group D | -2.482 | 0.0131 | 0.1308 | False | ns |
| group A - group E | group A | group E | -6.174 | 0 | 0 | True | *** |
| group B - group E | group B | group E | -6.017 | 0 | 0 | True | *** |
| group C - group E | group C | group E | -6.094 | 0 | 0 | True | *** |
| group D - group E | group D | group E | -3.925 | 1.00e-04 | 9.00e-04 | True | *** |
Dunn's post-hoc test identifies which specific group pairs differ significantly, following the omnibus Kruskal-Wallis result (H=57.08, p<0.001). The Bonferroni correction controls false positives across all 10 pairwise comparisons by adjusting p-values, ensuring the 5% significance threshold applies to the entire test family rather than individual comparisons.
The Kruskal-Wallis test confirmed overall group differences; Dunn's test reveals Group E is distinctly elevated compared to all others, while groups A–D form a relatively homogeneous cluster. The conservative Bonferroni correction means only robust differences survive, reducing false positives but potentially masking subtle effects. This pattern
Rank distributions per group — shows the actual ranks used in the Kruskal-Wallis test
This section displays the rank distributions that form the foundation of the Kruskal-Wallis test. By converting raw values to ranks (1–1000), the test becomes robust to outliers and non-normal distributions. Groups with systematically higher ranks indicate higher underlying values, while similar rank distributions suggest no meaningful group differences.
The rank data confirms that the Kruskal-Wallis test (H = 57.08, p < 0.001) compared how each group's observations cluster within the overall ranking. Group E shows systematically higher ranks (median 74.5 on original scale), while Groups A–C occupy lower rank positions. This rank separation directly drives the significant test result, validating that group differences are not due to chance.
Ranks
Descriptive statistics per group — median, IQR, mean, SD, min, max
| group | n | median_val | q1 | q3 | iqr | mean_val | sd_val | min_val | max_val |
|---|---|---|---|---|---|---|---|---|---|
| group A | 89 | 61 | 51 | 71 | 20 | 61.63 | 14.52 | 28 | 100 |
| group B | 190 | 63 | 54 | 74 | 20 | 63.45 | 15.47 | 8 | 97 |
| group C | 319 | 65 | 55 | 74 | 19 | 64.46 | 14.85 | 0 | 98 |
| group D | 262 | 69 | 59 | 77 | 18 | 67.36 | 13.77 | 26 | 100 |
| group E | 140 | 74.5 | 64.75 | 85 | 20.25 | 73.82 | 15.53 | 30 | 100 |
This section presents non-parametric descriptive statistics for each of the five groups, with emphasis on medians and interquartile ranges (IQR) rather than means and standard deviations. Since the Kruskal-Wallis test is rank-based, these robust measures are the appropriate summary statistics to interpret alongside the K-W results. Comparing medians across groups reveals the direction and magnitude of differences detected by the statistical test.
The Kruskal-Wallis test (H=57.08, p<0.001) confirms statistically significant differences among groups. The median progression from Group A (61) to Group