Kruskal-Wallis Test vs One-Way ANOVA: When to Use Each
The Kruskal-Wallis test is the non-parametric alternative to one-way ANOVA for three or more independent groups. Here is how each works, when to switch, the H statistic by hand, and the correct post-hoc.
What You'll Learn
- ✓Choose between one-way ANOVA and Kruskal-Wallis based on assumptions, not reflex.
- ✓Compute the Kruskal-Wallis H statistic and interpret it against the chi-square distribution.
- ✓Select the correct post-hoc procedure after a significant Kruskal-Wallis result.
1. Direct Answer: ANOVA or Kruskal-Wallis
One-way ANOVA compares the MEANS of three or more independent groups, assuming the residuals are approximately normal and the group variances are roughly equal (homoscedasticity). The Kruskal-Wallis test is its non-parametric counterpart: it ranks all observations together and compares the MEAN RANKS across groups, so it works when the data are ordinal, skewed, or contain outliers that would distort means and inflate variances. Use ANOVA when its assumptions hold — it has more power and yields interpretable mean differences. Switch to Kruskal-Wallis when normality is doubtful (especially with small, unequal groups) or the response is ordinal. Kruskal-Wallis with exactly two groups is algebraically the Mann-Whitney U test, so it is the natural multi-group extension of that test.
Key Points
- •ANOVA compares means; Kruskal-Wallis compares mean ranks.
- •Kruskal-Wallis handles ordinal data, skew, and outliers gracefully.
- •With k = 2 groups, Kruskal-Wallis reduces to the Mann-Whitney U test.
2. One-Way ANOVA in Brief
ANOVA partitions total variation into between-group and within-group pieces. The F statistic is F = MS_between / MS_within, where MS_between = SS_between/(k−1) and MS_within = SS_within/(N−k), with k groups and N total observations. A large F means the group means are spread far apart relative to the noise inside groups. Compare F to the F-distribution with (k−1, N−k) degrees of freedom. The assumptions worth checking are independence (design), normality of residuals (Q-Q plot), and equal variances (Levene’s test). When variances are unequal but data are otherwise normal, Welch’s ANOVA is often a better fix than jumping to a rank test.
Key Points
- •F = MS_between / MS_within, df = (k−1, N−k).
- •Check normality of residuals and equality of variances.
- •Unequal variances with normal data → consider Welch’s ANOVA before Kruskal-Wallis.
3. The Kruskal-Wallis H Statistic
Rank every observation across ALL groups combined from 1 (smallest) to N (largest), assigning average ranks to ties. Sum the ranks within each group to get R_i for group i with sample size n_i. The statistic is H = [12 / (N(N+1))] × Σ (R_i² / n_i) − 3(N+1). Under the null hypothesis that all groups share the same distribution, H follows an approximate chi-square distribution with k − 1 degrees of freedom (the approximation is good when each group has at least 5 observations). If there are many ties, divide H by the tie-correction factor 1 − (Σ(t³ − t)) / (N³ − N), where t is the size of each tie group; this inflates H slightly and is worth applying when ties are common, as with ordinal scales.
Key Points
- •Rank ALL observations together, then sum ranks within each group.
- •H = [12/(N(N+1))] Σ(R_i²/n_i) − 3(N+1), compared to chi-square with k−1 df.
- •Apply the tie correction when ranks are heavily tied.
4. Worked Example: Three Teaching Methods
Three teaching methods, four students each (N = 12), exam scores. Method A: 55, 60, 62, 58. Method B: 70, 72, 68, 75. Method C: 80, 78, 85, 82. Rank all 12 from lowest: A scores take ranks 1–4 (55=1, 58=2, 60=3, 62=4), B takes 5–8 (68=5, 70=6, 72=7, 75=8), C takes 9–12 (78=9, 80=10, 82=11, 85=12). Rank sums: R_A = 1+2+3+4 = 10, R_B = 5+6+7+8 = 26, R_C = 9+10+11+12 = 42. H = [12/(12×13)] × (10²/4 + 26²/4 + 42²/4) − 3×13 = [12/156] × (25 + 169 + 441) − 39 = 0.0769 × 635 − 39 = 48.85 − 39 = 9.85. With k − 1 = 2 df, the chi-square critical value at α = 0.05 is 5.99. Since 9.85 > 5.99, reject the null: the methods differ. (No ties here, so no correction needed.)
Key Points
- •Pool and rank all 12 scores, then sum ranks per group.
- •H = 9.85 with 2 df exceeds the 5.99 critical value → significant.
- •Cleanly separated groups produce non-overlapping rank blocks and a large H.
5. The Post-Hoc: Use Dunn’s Test, Not Tukey
A significant Kruskal-Wallis result tells you the groups are not all the same — it does not tell you WHICH pairs differ. The correct follow-up is Dunn’s test, which compares the mean ranks of each pair using the rank information already computed, with a multiple-comparison adjustment (Bonferroni or Benjamini-Hochberg) across the pairs. Do NOT use Tukey’s HSD here — Tukey is built on the studentized range of normally distributed means and belongs to ANOVA, not to a rank-based test. Another acceptable option is pairwise Mann-Whitney tests with a Bonferroni correction, though Dunn’s test is preferred because it reuses the joint ranking rather than re-ranking each pair.
Key Points
- •Kruskal-Wallis is an omnibus test — it does not identify which pairs differ.
- •Use Dunn’s test (with Bonferroni or BH adjustment) for the post-hoc.
- •Tukey’s HSD is for ANOVA, not for rank-based tests.
6. Assumptions and a Common Misreading
Kruskal-Wallis still has assumptions: observations must be independent, and groups should have similar distribution SHAPES if you want to interpret it as a test of medians. If shapes differ wildly (one group skewed left, another right), a significant result means the distributions differ in some way — not necessarily that the medians differ. This is the same subtlety that trips people up with the Mann-Whitney test. Reporting group medians and a boxplot alongside the H statistic keeps the interpretation honest. And note what Kruskal-Wallis does NOT need: normality or equal variances in the ANOVA sense.
Key Points
- •Independence is required; similar group shapes are needed for a clean "median" interpretation.
- •With very different shapes, a significant H means "distributions differ," not strictly "medians differ."
- •No normality or equal-variance assumption like ANOVA.
7. Running It in StatsIQ
Snap a photo of a multi-group comparison and StatsIQ checks the residuals and variances, recommends one-way ANOVA, Welch’s ANOVA, or Kruskal-Wallis, and runs the choice end to end — for the rank test it shows the pooled ranking, rank sums, the H calculation with any tie correction, and a Dunn’s post-hoc table when H is significant. This content is for educational purposes only.
Key Points
- •Assumption checks drive the ANOVA-vs-Kruskal-Wallis recommendation.
- •Full ranking and H computation shown step by step.
- •Dunn’s post-hoc generated automatically after a significant H.
Key Takeaways
- ★Kruskal-Wallis is the non-parametric one-way ANOVA; it compares mean ranks across 3+ independent groups.
- ★H = [12/(N(N+1))] Σ(R_i²/n_i) − 3(N+1), approximately chi-square with k−1 df.
- ★With k = 2 groups, Kruskal-Wallis equals the Mann-Whitney U test.
- ★Post-hoc after Kruskal-Wallis is Dunn’s test (not Tukey, which belongs to ANOVA).
- ★No normality or equal-variance assumption; independence and similar shapes still matter.
Practice Questions
1. You have 4 groups of 6 ordinal satisfaction ratings each. Which omnibus test, and why?
2. Kruskal-Wallis gives H = 3.1 with 3 groups. Is it significant at α = 0.05?
3. After a significant Kruskal-Wallis with 5 groups, a student runs Tukey’s HSD. What is the problem?
FAQs
Common questions about this topic
Strictly it tests whether the groups come from the same distribution against the alternative that at least one is stochastically larger. It is interpreted as a test of medians only when the groups have similar distribution shapes. If the shapes differ substantially, report it as a difference in distributions and show medians and boxplots for transparency.
The chi-square approximation for H is reasonable when each group has at least 5 observations. Below that, use exact distribution tables or a permutation-based version, since the approximation can be off for tiny groups.
When your data are approximately normal but the variances are unequal. Welch’s ANOVA corrects for heteroscedasticity while still comparing means, which is usually more informative than switching to ranks. Reach for Kruskal-Wallis when the non-normality itself (skew, outliers, ordinal scale) is the problem.
Because the test asks whether the groups occupy different positions in the COMBINED ordering. Ranking within groups would erase exactly the between-group information the test is built to detect. The joint ranking is what makes the rank sums comparable across groups.
Yes. Snap a photo of the grouped data and StatsIQ pools and ranks the observations, computes H with the appropriate tie correction, compares it to the chi-square distribution, and — when significant — produces a Dunn’s test table identifying which specific pairs differ. This content is for educational purposes only.