Kruskal-Wallis Test — Compare Groups Without Assuming Normality

You have three or more groups and a measurement that is not normally distributed — satisfaction ratings on a 1-5 scale, response times with heavy right skew, or pain levels recorded as ordinal categories. The Kruskal-Wallis test is the non-parametric alternative to ANOVA. It compares groups using ranks instead of means, so it works on skewed data, ordinal scales, and distributions that violate every assumption ANOVA requires. Upload a CSV and find out which groups differ in under 60 seconds.

What Is the Kruskal-Wallis Test?

The Kruskal-Wallis test answers the same fundamental question as ANOVA — are the differences between my groups real, or just noise? — but it does it without assuming your data follows a bell curve. Instead of comparing group means, it ranks every observation from lowest to highest across all groups combined, then checks whether the average ranks differ between groups. If one group consistently ranks higher than the others, the test detects it.

Think of it this way. You survey employee satisfaction across four departments using a 1-5 Likert scale. The Engineering department has a median rating of 4, while Customer Support sits at 2. But this is ordinal data — the distance between a 1 and a 2 is not necessarily the same as the distance between a 4 and a 5. Computing means on this scale is questionable. The Kruskal-Wallis test sidesteps the problem entirely by converting every response to a rank. It does not care whether the gap between "Satisfied" and "Very Satisfied" is the same as between "Neutral" and "Dissatisfied." It only cares about ordering.

The test produces an H-statistic (also called the chi-squared statistic), which measures how much the group rank distributions diverge from what you would expect if all groups were identical. A large H means the groups are ranked very differently. The p-value tells you the probability of seeing an H this large under the null hypothesis that all groups come from the same distribution. A p-value below 0.05 means at least one group is significantly different from the rest.

When to Use It

The Kruskal-Wallis test is the right choice in three common situations. First, when your data is measured on an ordinal scale — survey ratings, severity levels (low/medium/high/critical), pain scores, or any ranking where the intervals between values are not guaranteed to be equal. ANOVA assumes equal intervals; Kruskal-Wallis does not.

Second, when your continuous data is heavily skewed or has extreme outliers. Response times are a classic example: most tickets are resolved in minutes, but a few take days. The mean is dragged upward by those outliers, making ANOVA misleading. Kruskal-Wallis, working on ranks, is barely affected by outliers because a response time of 2 days and 200 days both just become "the highest ranks."

Third, when your groups have small sample sizes and you cannot reliably verify the normality assumption that ANOVA requires. With 8-10 observations per group, a Shapiro-Wilk test has low power to detect non-normality, so you might pass the check even when normality is questionable. Kruskal-Wallis does not need that assumption, making it a safer default for small samples.

Real-world examples where Kruskal-Wallis fits naturally: comparing customer satisfaction ratings (1-5 stars) across four product lines, comparing incident response times across three service tiers (Basic, Pro, Enterprise), comparing pain reduction scores across five treatment protocols in a clinical trial, or comparing employee engagement scores across regional offices.

What Data Do You Need?

You need a CSV with at least two columns. The outcome column holds the numeric or ordinal measurement you want to compare — satisfaction score, response time, pain level, revenue, or any value that can be ranked. The predictor column (called predictor_1 in the column mapping) defines your groups — department, treatment, product tier, or region. You can include additional predictor columns (predictor_[N]) if you want to run separate Kruskal-Wallis tests across multiple grouping variables.

The module also accepts two optional parameters. significance_level sets your alpha threshold (default 0.05). min_group_size controls the minimum number of observations required per group — groups below this threshold are flagged or excluded to ensure reliable results.

For best results, aim for at least three groups with five or more observations each. The test works with unequal group sizes, which is an advantage over some parametric alternatives. Very small groups (under five observations) can still produce results but the test loses statistical power, meaning it may miss real differences.

How to Read the Report

Executive Summary

The report opens with a plain-language summary of the key findings — whether significant differences were found, which groups stand out, and what the practical implications are. This is generated by AI after analyzing all the statistical results, so it highlights what matters most for your specific data rather than giving generic guidance.

Analysis Overview

The overview card shows the structure of your analysis at a glance: how many groups were compared, total observations, the outcome variable, and the grouping variable. It also reports the overall H-statistic and p-value, so you can immediately see whether the test found significant differences.

Data Preprocessing

Before running the test, the module validates your data. This card shows any rows that were excluded (missing values, groups below the minimum size threshold) and confirms what data entered the analysis. Transparency matters — you should always know exactly what was analyzed.

Kruskal-Wallis Test Results

This is the core statistical output. The H-statistic (chi-squared), degrees of freedom, and p-value are presented together. The degrees of freedom equal the number of groups minus one. A significant p-value (below your alpha) means at least one group's distribution differs from the others — but it does not tell you which one. For that, you need the post-hoc comparisons.

Group Medians

Since Kruskal-Wallis is a rank-based test, medians are the natural summary statistic — not means. This card shows the median value for each group along with the interquartile range (IQR). Groups with non-overlapping IQRs are likely to differ significantly, while groups with similar medians and wide IQRs probably overlap.

Distribution Comparison

Visual comparison of the distributions across groups. This typically uses box plots or violin plots to show the full shape of each group's data — median, spread, skewness, and outliers. Even before reading the statistical output, these visualizations often make the story obvious. A group whose entire box sits above the others is clearly different; overlapping boxes suggest similarity.

Dunn Post-Hoc Comparisons

When the overall Kruskal-Wallis test is significant, you need to know which specific pairs of groups differ. The Dunn test is the standard post-hoc method for Kruskal-Wallis. It tests every pairwise combination (Group A vs. Group B, Group A vs. Group C, etc.) and adjusts p-values for multiple comparisons using the Bonferroni or Holm method. Pairs with adjusted p-values below your alpha are significantly different. This is analogous to Tukey HSD for ANOVA, but designed for rank-based comparisons.

Rank Distributions

Since the Kruskal-Wallis test works on ranks, this card shows how ranks are distributed within each group. If one group consistently receives higher ranks than the others, that group has systematically higher values on the outcome measure. This visualization makes the mechanics of the test transparent — you can see exactly what the H-statistic is measuring.

Group Statistics

Detailed descriptive statistics for each group: count, median, mean, standard deviation, min, max, and quartiles. While the Kruskal-Wallis test focuses on ranks and medians, having the full statistical profile helps you interpret the practical significance of any differences found.

When to Use Something Else

If your data is approximately normally distributed and measured on a continuous scale with equal intervals, use ANOVA instead. ANOVA is more statistically powerful than Kruskal-Wallis when its assumptions are met — it is more likely to detect real differences with the same sample size. The Kruskal-Wallis test trades some power for robustness, so use it when you need that robustness, not as a default.

If you only have two groups, use the Mann-Whitney U test. It is the two-group version of Kruskal-Wallis, just as the t-test is the two-group version of ANOVA. The Mann-Whitney gives you a directional result (which group ranks higher) and is simpler to interpret when you only need to compare a single pair.

If your data involves repeated measures on the same subjects — for example, the same patients measured at three time points — use a Friedman test instead. Kruskal-Wallis assumes independent groups, meaning each observation comes from a different subject. Using it on repeated measures violates that assumption and can give misleading results.

If you want to compare groups while controlling for a continuous covariate — for example, comparing satisfaction across departments while accounting for tenure — you need a different approach. Consider a permutation test, which can handle non-normal data with covariates, or rank-transform the outcome and use ANCOVA on the ranks.

The R Code Behind the Analysis

Every report includes the exact R code used to produce the results — reproducible, auditable, and citable. This is not AI-generated code that changes every run. The same data produces the same analysis every time.

The analysis uses kruskal.test() from base R for the omnibus H-test — the same function used in statistics textbooks and peer-reviewed research. For post-hoc pairwise comparisons, it uses dunn.test() from the dunn.test package, which performs Dunn's test with p-value adjustment for multiple comparisons. Descriptive statistics use median(), IQR(), and rank() from base R. Every step is visible in the code tab of your report, so you or a statistician can verify exactly what was done. No black boxes, no proprietary algorithms — just standard, well-documented statistical methods.