🎯

advancedintermediate20-25 min

Kruskal-Wallis Test vs One-Way ANOVA: When to Use Each

Q: Does Kruskal-Wallis test medians or distributions?

Strictly it tests whether the groups come from the same distribution against the alternative that at least one is stochastically larger. It is interpreted as a test of medians only when the groups have similar distribution shapes. If the shapes differ substantially, report it as a difference in distributions and show medians and boxplots for transparency.

Q: How small can groups be for Kruskal-Wallis?

The chi-square approximation for H is reasonable when each group has at least 5 observations. Below that, use exact distribution tables or a permutation-based version, since the approximation can be off for tiny groups.

Q: When should I use Welch’s ANOVA instead of Kruskal-Wallis?

When your data are approximately normal but the variances are unequal. Welch’s ANOVA corrects for heteroscedasticity while still comparing means, which is usually more informative than switching to ranks. Reach for Kruskal-Wallis when the non-normality itself (skew, outliers, ordinal scale) is the problem.

Q: Why rank all the data together instead of within groups?

Because the test asks whether the groups occupy different positions in the COMBINED ordering. Ranking within groups would erase exactly the between-group information the test is built to detect. The joint ranking is what makes the rank sums comparable across groups.

Q: Can StatsIQ run Kruskal-Wallis and the post-hoc for me?

Yes. Snap a photo of the grouped data and StatsIQ pools and ranks the observations, computes H with the appropriate tie correction, compares it to the chi-square distribution, and — when significant — produces a Dunn’s test table identifying which specific pairs differ. This content is for educational purposes only.

The Kruskal-Wallis test is the non-parametric alternative to one-way ANOVA for three or more independent groups. Here is how each works, when to switch, the H statistic by hand, and the correct post-hoc.

What You'll Learn

✓Choose between one-way ANOVA and Kruskal-Wallis based on assumptions, not reflex.
✓Compute the Kruskal-Wallis H statistic and interpret it against the chi-square distribution.
✓Select the correct post-hoc procedure after a significant Kruskal-Wallis result.

1. Direct Answer: ANOVA or Kruskal-Wallis

One-way ANOVA compares the MEANS of three or more independent groups, assuming the residuals are approximately normal and the group variances are roughly equal (homoscedasticity). The Kruskal-Wallis test is its non-parametric counterpart: it ranks all observations together and compares the MEAN RANKS across groups, so it works when the data are ordinal, skewed, or contain outliers that would distort means and inflate variances. Use ANOVA when its assumptions hold — it has more power and yields interpretable mean differences. Switch to Kruskal-Wallis when normality is doubtful (especially with small, unequal groups) or the response is ordinal. Kruskal-Wallis with exactly two groups is algebraically the Mann-Whitney U test, so it is the natural multi-group extension of that test.

Key Points

•ANOVA compares means; Kruskal-Wallis compares mean ranks.
•Kruskal-Wallis handles ordinal data, skew, and outliers gracefully.
•With k = 2 groups, Kruskal-Wallis reduces to the Mann-Whitney U test.

2. One-Way ANOVA in Brief

ANOVA partitions total variation into between-group and within-group pieces. The F statistic is F = MS_between / MS_within, where MS_between = SS_between/(k−1) and MS_within = SS_within/(N−k), with k groups and N total observations. A large F means the group means are spread far apart relative to the noise inside groups. Compare F to the F-distribution with (k−1, N−k) degrees of freedom. The assumptions worth checking are independence (design), normality of residuals (Q-Q plot), and equal variances (Levene’s test). When variances are unequal but data are otherwise normal, Welch’s ANOVA is often a better fix than jumping to a rank test.

Key Points

•F = MS_between / MS_within, df = (k−1, N−k).
•Check normality of residuals and equality of variances.
•Unequal variances with normal data → consider Welch’s ANOVA before Kruskal-Wallis.

3. The Kruskal-Wallis H Statistic

Rank every observation across ALL groups combined from 1 (smallest) to N (largest), assigning average ranks to ties. Sum the ranks within each group to get R_i for group i with sample size n_i. The statistic is H = [12 / (N(N+1))] × Σ (R_i² / n_i) − 3(N+1). Under the null hypothesis that all groups share the same distribution, H follows an approximate chi-square distribution with k − 1 degrees of freedom (the approximation is good when each group has at least 5 observations). If there are many ties, divide H by the tie-correction factor 1 − (Σ(t³ − t)) / (N³ − N), where t is the size of each tie group; this inflates H slightly and is worth applying when ties are common, as with ordinal scales.

Key Points

•Rank ALL observations together, then sum ranks within each group.
•H = [12/(N(N+1))] Σ(R_i²/n_i) − 3(N+1), compared to chi-square with k−1 df.
•Apply the tie correction when ranks are heavily tied.

4. Worked Example: Three Teaching Methods

Three teaching methods, four students each (N = 12), exam scores. Method A: 55, 60, 62, 58. Method B: 70, 72, 68, 75. Method C: 80, 78, 85, 82. Rank all 12 from lowest: A scores take ranks 1–4 (55=1, 58=2, 60=3, 62=4), B takes 5–8 (68=5, 70=6, 72=7, 75=8), C takes 9–12 (78=9, 80=10, 82=11, 85=12). Rank sums: R_A = 1+2+3+4 = 10, R_B = 5+6+7+8 = 26, R_C = 9+10+11+12 = 42. H = [12/(12×13)] × (10²/4 + 26²/4 + 42²/4) − 3×13 = [12/156] × (25 + 169 + 441) − 39 = 0.0769 × 635 − 39 = 48.85 − 39 = 9.85. With k − 1 = 2 df, the chi-square critical value at α = 0.05 is 5.99. Since 9.85 > 5.99, reject the null: the methods differ. (No ties here, so no correction needed.)

Key Points

•Pool and rank all 12 scores, then sum ranks per group.
•H = 9.85 with 2 df exceeds the 5.99 critical value → significant.
•Cleanly separated groups produce non-overlapping rank blocks and a large H.

5. The Post-Hoc: Use Dunn’s Test, Not Tukey

A significant Kruskal-Wallis result tells you the groups are not all the same — it does not tell you WHICH pairs differ. The correct follow-up is Dunn’s test, which compares the mean ranks of each pair using the rank information already computed, with a multiple-comparison adjustment (Bonferroni or Benjamini-Hochberg) across the pairs. Do NOT use Tukey’s HSD here — Tukey is built on the studentized range of normally distributed means and belongs to ANOVA, not to a rank-based test. Another acceptable option is pairwise Mann-Whitney tests with a Bonferroni correction, though Dunn’s test is preferred because it reuses the joint ranking rather than re-ranking each pair.

Key Points

•Kruskal-Wallis is an omnibus test — it does not identify which pairs differ.
•Use Dunn’s test (with Bonferroni or BH adjustment) for the post-hoc.
•Tukey’s HSD is for ANOVA, not for rank-based tests.

6. Assumptions and a Common Misreading

Kruskal-Wallis still has assumptions: observations must be independent, and groups should have similar distribution SHAPES if you want to interpret it as a test of medians. If shapes differ wildly (one group skewed left, another right), a significant result means the distributions differ in some way — not necessarily that the medians differ. This is the same subtlety that trips people up with the Mann-Whitney test. Reporting group medians and a boxplot alongside the H statistic keeps the interpretation honest. And note what Kruskal-Wallis does NOT need: normality or equal variances in the ANOVA sense.

Key Points

•Independence is required; similar group shapes are needed for a clean "median" interpretation.
•With very different shapes, a significant H means "distributions differ," not strictly "medians differ."
•No normality or equal-variance assumption like ANOVA.

7. Running It in StatsIQ

Snap a photo of a multi-group comparison and StatsIQ checks the residuals and variances, recommends one-way ANOVA, Welch’s ANOVA, or Kruskal-Wallis, and runs the choice end to end — for the rank test it shows the pooled ranking, rank sums, the H calculation with any tie correction, and a Dunn’s post-hoc table when H is significant. This content is for educational purposes only.

Key Points

•Assumption checks drive the ANOVA-vs-Kruskal-Wallis recommendation.
•Full ranking and H computation shown step by step.
•Dunn’s post-hoc generated automatically after a significant H.

Key Takeaways

★Kruskal-Wallis is the non-parametric one-way ANOVA; it compares mean ranks across 3+ independent groups.
★H = [12/(N(N+1))] Σ(R_i²/n_i) − 3(N+1), approximately chi-square with k−1 df.
★With k = 2 groups, Kruskal-Wallis equals the Mann-Whitney U test.
★Post-hoc after Kruskal-Wallis is Dunn’s test (not Tukey, which belongs to ANOVA).
★No normality or equal-variance assumption; independence and similar shapes still matter.

Practice Questions

1. You have 4 groups of 6 ordinal satisfaction ratings each. Which omnibus test, and why?

Kruskal-Wallis. The response is ordinal, so means are not meaningful and ANOVA’s normality assumption does not apply; ranking is the appropriate summary across the four groups.

2. Kruskal-Wallis gives H = 3.1 with 3 groups. Is it significant at α = 0.05?

df = k − 1 = 2, critical chi-square = 5.99. Since 3.1 < 5.99, fail to reject — no evidence the groups differ. Do not run post-hoc tests.

3. After a significant Kruskal-Wallis with 5 groups, a student runs Tukey’s HSD. What is the problem?

Tukey’s HSD assumes normally distributed means and the studentized range distribution — it is an ANOVA post-hoc. The correct follow-up is Dunn’s test (or pairwise Mann-Whitney with a Bonferroni correction).

Study with AI

Get personalized help and instant answers anytime.

Download StatsIQ

FAQs

Common questions about this topic

Strictly it tests whether the groups come from the same distribution against the alternative that at least one is stochastically larger. It is interpreted as a test of medians only when the groups have similar distribution shapes. If the shapes differ substantially, report it as a difference in distributions and show medians and boxplots for transparency.

The chi-square approximation for H is reasonable when each group has at least 5 observations. Below that, use exact distribution tables or a permutation-based version, since the approximation can be off for tiny groups.

When your data are approximately normal but the variances are unequal. Welch’s ANOVA corrects for heteroscedasticity while still comparing means, which is usually more informative than switching to ranks. Reach for Kruskal-Wallis when the non-normality itself (skew, outliers, ordinal scale) is the problem.

Because the test asks whether the groups occupy different positions in the COMBINED ordering. Ranking within groups would erase exactly the between-group information the test is built to detect. The joint ranking is what makes the rank sums comparable across groups.

Yes. Snap a photo of the grouped data and StatsIQ pools and ranks the observations, computes H with the appropriate tie correction, compares it to the chi-square distribution, and — when significant — produces a Dunn’s test table identifying which specific pairs differ. This content is for educational purposes only.

Related Study Guides

🧪 fundamentals

Browse All Study Guides

🎯 AP Statistics 🔬 Introduction to 📈 Regression Analysis 🎲 Probability Foundations 📊 Understanding Statistical 🧪 ANOVA and 📉 Data Visualization 🔄 Bayesian vs 📊 What Is 📐 What Is 🔗 Correlation vs 📐 Central Limit 📏 Confidence Intervals:📐 P-Values and 📐 Chi-Square Tests ⚠️ Type I 🎲 Sampling Methods 📈 Introduction to 📏 Effect Size 📉 Multiple Regression:🔀 Non-Parametric Tests:🎯 How to 🧪 A/B Testing 🧹 Data Cleaning ⏱️ Survival Analysis:🔗 Introduction to 📈 Time Series 🔬 Principal Component 🔀 How to 📐 Two-Sample t-Test 📊 How to 🔀 Paired vs 📋 How to 📊 Z-Scores and 📈 R Squared 🎲 Binomial Probability 🎲 Expected Value 📐 Standard Error 🎯 Margin of 📊 Contingency Tables 📉 Poisson Distribution:📏 Cohen's d 🔗 Pearson vs ⚖️ One-Tailed vs 🔔 Normal Distribution 📉 Linear Regression 📊 Mean vs 🎯 Confidence vs 📊 Two-Way ANOVA:⚡ Statistical Power 🎯 Conditional Probability 🎲 Permutations vs 📈 Log Transformations 🔄 Simpson's Paradox:🧪 Hypothesis Testing:🎲 Probability Distributions:📈 Central Limit ⚖️ Type I 🎯 P-Value Interpretation:↔️ One-Tailed vs 🎲 Binomial vs 📊 Normal Distribution 📈 Discrete vs 📊 Chi-Square Goodness-of-Fit 🔬 Mann-Whitney U ⏱️ Exponential Distribution:🎯 Geometric vs 🎯 Wilcoxon Signed-Rank 🎯 Kruskal-Wallis Test 🎯 Tukey HSD 🎯 Relative Risk 🔁 Friedman Test 📈 Spearman vs 🎚️ Bonferroni vs 🎯 Confidence vs ⚡ A-Priori vs

Kruskal-Wallis Test vs One-Way ANOVA: When to Use Each

What You'll Learn

1. Direct Answer: ANOVA or Kruskal-Wallis

Key Points

2. One-Way ANOVA in Brief

Key Points

3. The Kruskal-Wallis H Statistic

Key Points

4. Worked Example: Three Teaching Methods

Key Points

5. The Post-Hoc: Use Dunn’s Test, Not Tukey

Key Points

6. Assumptions and a Common Misreading

Key Points

7. Running It in StatsIQ

Key Points

Key Takeaways

Practice Questions

Study with AI

FAQs

Does Kruskal-Wallis test medians or distributions?

How small can groups be for Kruskal-Wallis?

When should I use Welch’s ANOVA instead of Kruskal-Wallis?

Why rank all the data together instead of within groups?

Can StatsIQ run Kruskal-Wallis and the post-hoc for me?

Related Study Guides

Hypothesis Testing: The Complete Guide With 6 Worked Tests

Non-Parametric Tests: When to Use Mann-Whitney, Wilcoxon, and Kruskal-Wallis

ANOVA and Experimental Design

How to Read ANOVA Output: Sum of Squares, Mean Square, F-Statistic, and Post-Hoc Tests

Mann-Whitney U Test vs t-Test: When to Use Which (Worked Examples)

Browse All Study Guides