🎯

advancedintermediate20-25 min

Wilcoxon Signed-Rank Test vs Paired t-Test: When to Use Each

Q: Is the Wilcoxon signed-rank test the same as the Mann-Whitney U test?

No. The signed-rank test is for PAIRED (related) data — two measurements on the same units. The Mann-Whitney U test is for two INDEPENDENT groups. They are often confused because both are rank-based non-parametric tests, but the data structure is completely different. Using Mann-Whitney on paired data throws away the pairing and loses power.

Q: Does the Wilcoxon signed-rank test compare medians?

Approximately, and only under an added assumption. Strictly it tests whether the distribution of differences is symmetric about zero. If you are willing to assume the differences are symmetrically distributed, that null is equivalent to "the median difference is zero." Without symmetry it is a test of stochastic tendency, not strictly of the median.

Q: What sample size do I need for the signed-rank test?

It works for small samples, but there is a floor: with n < 6 non-zero pairs you cannot reach two-tailed significance at α = 0.05 no matter how lopsided the data, because the smallest possible W still is not extreme enough in the exact distribution. From about n = 20 upward the normal approximation is accurate.

Q: Can I just always use the non-parametric test to be safe?

It is a defensible default when you are unsure about normality, since it sacrifices only about 5% power under normality. But when the differences truly are normal and n is small, the paired t-test extracts a bit more information and gives a directly interpretable mean difference with a confidence interval. Match the test to what you can defend about the data.

Q: How does StatsIQ choose between the two tests for me?

Snap a photo of the paired dataset or problem and StatsIQ evaluates the differences for normality and outliers, recommends the paired t-test or Wilcoxon signed-rank accordingly, and runs the full procedure with the ranking table and effect size shown. It flags ordinal scales and zero differences that change the method. This content is for educational purposes only.

The paired t-test and the Wilcoxon signed-rank test both compare two related measurements. Here is exactly how each works, when the non-parametric version wins, and two fully worked examples.

What You'll Learn

✓Decide between the paired t-test and the Wilcoxon signed-rank test from the data, not habit.
✓Run the signed-rank procedure by hand, including ranking, sign attachment, and the normal approximation.
✓Report an appropriate effect size for the non-parametric case.

1. Direct Answer: Which Paired Test to Use

Use the paired t-test when you compare two related measurements (before/after, left/right, matched pairs) and the DIFFERENCES are approximately normal — it tests whether the mean difference is zero. Use the Wilcoxon signed-rank test, its non-parametric counterpart, when the differences are skewed, have outliers, are ordinal, or your sample is small enough that normality cannot be trusted — it tests whether the distribution of differences is symmetric about zero (in practice, whether the median difference is zero). Both require that pairs are independent of one another. The trade-off is simple: when normality holds, the t-test has slightly more power; when it does not, the t-test can be badly misled by a single outlier while the signed-rank test barely flinches. The signed-rank test keeps about 95% of the t-test’s power under normality, so the cost of choosing it defensively is small.

Key Points

•Paired t-test: tests the mean difference, assumes the differences are roughly normal.
•Wilcoxon signed-rank: tests the median/symmetry of differences, robust to outliers and skew.
•Both assume pairs are independent of each other.

2. How the Paired t-Test Works

Collapse each pair to a single difference d = (after − before). You now have one sample of differences. Compute the mean difference d̄ and the standard deviation of the differences s_d. The test statistic is t = d̄ / (s_d / √n) with df = n − 1. Compare to the t-distribution. The only normality assumption is on the differences — not on the original two measurements, which is a point students routinely get wrong. Check it with a histogram or Q-Q plot of the d values. With n ≥ ~30 the Central Limit Theorem makes the t-test fairly forgiving; with n = 8 a single skewed difference can dominate d̄ and s_d.

Key Points

•Reduce each pair to one difference, then it is a one-sample t-test on the differences.
•t = d̄ / (s_d / √n), df = n − 1.
•Normality applies to the differences, not the raw measurements.

3. How the Wilcoxon Signed-Rank Test Works

Step 1: compute each difference d. Step 2: discard any pair with d = 0 (a zero carries no directional information) and reduce n accordingly. Step 3: rank the ABSOLUTE differences from smallest to largest, assigning average ranks to ties. Step 4: reattach the sign of each difference to its rank. Step 5: sum the positive ranks (W+) and the negative ranks (W−). The test statistic W is typically the smaller of W+ and W−. For small n compare W to a signed-rank critical-value table; for n larger than about 20 use the normal approximation: mean = n(n+1)/4, standard deviation = √[n(n+1)(2n+1)/24], so z = (W+ − n(n+1)/4) / sd, with a continuity correction of 0.5 improving accuracy.

Key Points

•Rank the absolute differences, then reattach signs.
•W is usually the smaller of the positive-rank sum and negative-rank sum.
•Large-sample normal approximation: mean n(n+1)/4, variance n(n+1)(2n+1)/24.

4. Worked Example 1: Blood Pressure Before and After (Paired t)

Ten patients have systolic BP measured before and after a 6-week program. The ten differences (after − before, mmHg) are: −8, −12, −5, −15, −3, −10, −7, −9, −6, −11. The differences look roughly symmetric with no wild outlier, so the paired t-test is reasonable. d̄ = −8.6 mmHg. s_d ≈ 3.57. SE = 3.57/√10 = 1.13. t = −8.6 / 1.13 = −7.61 with df = 9. The two-tailed p-value is far below 0.001, so the program produced a statistically significant mean reduction of about 8.6 mmHg. A 95% CI for the mean change is −8.6 ± 2.262 × 1.13 = (−11.2, −6.0) mmHg — clinically meaningful, not just statistically significant.

Key Points

•Symmetric, outlier-free differences justify the t-test.
•t = −7.61, df = 9, p < 0.001 — a clear effect.
•Always report the confidence interval for the mean change, not just the p-value.

5. Worked Example 2: Ordinal Pain Scores (Wilcoxon)

Now eight patients rate pain on a 0–10 ordinal scale before and after treatment. Differences (after − before): −2, −1, −4, 0, −3, +1, −2, −5. The scale is ordinal and one pair shows a 0, so the signed-rank test fits better than a t-test. Drop the zero (n becomes 7). Absolute differences: 2, 1, 4, 3, 1, 2, 5. Ranks of absolute values (averaging ties): the two 1’s get rank 1.5 each; the two 2’s get rank 3.5 each; 3 gets 5; 4 gets 6; 5 gets 7. Reattach signs: negatives at ranks 3.5, 1.5, 6, 5, 3.5, 7 and the single positive at rank 1.5. W+ = 1.5 (the lone positive). W− = 26.5. W = 1.5. For n = 7 the critical value at α = 0.05 (two-tailed) is 2, and W = 1.5 ≤ 2, so reject the null: pain dropped significantly. Note the t-test would have been questionable here — the data are ordinal, not interval.

Key Points

•Ordinal data and a zero difference point to the signed-rank test.
•Drop zeros before ranking and reduce n.
•W = 1.5 ≤ critical value 2 (n = 7) → reject the null.

6. Ties, Zeros, and Effect Size

Zeros: the standard Wilcoxon procedure discards them, though some implementations (Pratt’s method) keep and rank them — check which your software uses, because the p-value can shift. Ties in absolute differences: assign average ranks and apply a tie correction to the variance in the normal approximation. For effect size, report the matched-pairs rank-biserial correlation r = (W+ − W−) / (W+ + W−), or the standardized r = Z/√N from the normal approximation, where N is the number of non-zero pairs; r around 0.1, 0.3, and 0.5 maps loosely to small, medium, and large. A p-value alone tells you an effect exists; the effect size tells you whether anyone should care.

Key Points

•Zeros are usually dropped (Wilcoxon) but Pratt’s method keeps them — confirm the convention.
•Average ranks for ties, with a variance correction in the z approximation.
•Report rank-biserial r or Z/√N as the effect size.

7. Running the Comparison in StatsIQ

Snap a photo of a paired-data problem and StatsIQ checks the differences for normality, recommends the paired t-test or the Wilcoxon signed-rank test accordingly, then runs the chosen procedure step by step — including the ranking table, sign attachment, and the normal-approximation z with continuity correction. It also reports the matching effect size so your conclusion is about magnitude, not just significance. This content is for educational purposes only.

Key Points

•Automatic normality check on the paired differences.
•Full ranking-and-signs table shown for the non-parametric route.
•Effect size reported alongside the p-value.

Key Takeaways

★Paired t-test assumes normal DIFFERENCES (not normal raw scores); t = d̄/(s_d/√n), df = n−1.
★Wilcoxon signed-rank tests symmetry/median of differences using signed ranks of absolute differences.
★W is usually min(W+, W−); large-sample mean = n(n+1)/4, sd = √[n(n+1)(2n+1)/24].
★Drop zero differences before ranking (standard method); average ranks for ties.
★Signed-rank retains ~95% of t-test power under normality — cheap insurance against outliers.

Practice Questions

1. Six matched differences are 4, 5, −1, 7, 6, 8. Why would you lean toward a paired t-test here?

The differences are interval-scaled, all but one positive, and show no extreme outlier; with no reason to doubt symmetry the t-test is appropriate and slightly more powerful. You would still inspect a Q-Q plot given the small n.

2. In a signed-rank test n = 5 non-zero pairs all have negative differences. What is W+?

W+ = 0, because there are no positive ranks to sum. W = min(W+, W−) = 0, which is the most extreme possible value and will reject for any reasonable α at n = 5 (critical value is 0 at α = 0.05 two-tailed for n = 6, so n = 5 cannot reach two-tailed significance — a reminder that very small samples limit the non-parametric test).

3. Your paired differences are right-skewed with one value triple the others. Which test, and why?

Wilcoxon signed-rank. The outlier and skew violate the t-test’s normality assumption and would inflate s_d, shrinking t; the rank-based test caps the outlier’s influence at its rank.

Study with AI

Get personalized help and instant answers anytime.

Download StatsIQ

FAQs

Common questions about this topic

No. The signed-rank test is for PAIRED (related) data — two measurements on the same units. The Mann-Whitney U test is for two INDEPENDENT groups. They are often confused because both are rank-based non-parametric tests, but the data structure is completely different. Using Mann-Whitney on paired data throws away the pairing and loses power.

Approximately, and only under an added assumption. Strictly it tests whether the distribution of differences is symmetric about zero. If you are willing to assume the differences are symmetrically distributed, that null is equivalent to "the median difference is zero." Without symmetry it is a test of stochastic tendency, not strictly of the median.

It works for small samples, but there is a floor: with n < 6 non-zero pairs you cannot reach two-tailed significance at α = 0.05 no matter how lopsided the data, because the smallest possible W still is not extreme enough in the exact distribution. From about n = 20 upward the normal approximation is accurate.

It is a defensible default when you are unsure about normality, since it sacrifices only about 5% power under normality. But when the differences truly are normal and n is small, the paired t-test extracts a bit more information and gives a directly interpretable mean difference with a confidence interval. Match the test to what you can defend about the data.

Snap a photo of the paired dataset or problem and StatsIQ evaluates the differences for normality and outliers, recommends the paired t-test or Wilcoxon signed-rank accordingly, and runs the full procedure with the ranking table and effect size shown. It flags ordinal scales and zero differences that change the method. This content is for educational purposes only.

Related Study Guides

🧪 fundamentals

Browse All Study Guides

🎯 AP Statistics 🔬 Introduction to 📈 Regression Analysis 🎲 Probability Foundations 📊 Understanding Statistical 🧪 ANOVA and 📉 Data Visualization 🔄 Bayesian vs 📊 What Is 📐 What Is 🔗 Correlation vs 📐 Central Limit 📏 Confidence Intervals:📐 P-Values and 📐 Chi-Square Tests ⚠️ Type I 🎲 Sampling Methods 📈 Introduction to 📏 Effect Size 📉 Multiple Regression:🔀 Non-Parametric Tests:🎯 How to 🧪 A/B Testing 🧹 Data Cleaning ⏱️ Survival Analysis:🔗 Introduction to 📈 Time Series 🔬 Principal Component 🔀 How to 📐 Two-Sample t-Test 📊 How to 🔀 Paired vs 📋 How to 📊 Z-Scores and 📈 R Squared 🎲 Binomial Probability 🎲 Expected Value 📐 Standard Error 🎯 Margin of 📊 Contingency Tables 📉 Poisson Distribution:📏 Cohen's d 🔗 Pearson vs ⚖️ One-Tailed vs 🔔 Normal Distribution 📉 Linear Regression 📊 Mean vs 🎯 Confidence vs 📊 Two-Way ANOVA:⚡ Statistical Power 🎯 Conditional Probability 🎲 Permutations vs 📈 Log Transformations 🔄 Simpson's Paradox:🧪 Hypothesis Testing:🎲 Probability Distributions:📈 Central Limit ⚖️ Type I 🎯 P-Value Interpretation:↔️ One-Tailed vs 🎲 Binomial vs 📊 Normal Distribution 📈 Discrete vs 📊 Chi-Square Goodness-of-Fit 🔬 Mann-Whitney U ⏱️ Exponential Distribution:🎯 Geometric vs 🎯 Wilcoxon Signed-Rank 🎯 Kruskal-Wallis Test 🎯 Tukey HSD 🎯 Relative Risk 🔁 Friedman Test 📈 Spearman vs 🎚️ Bonferroni vs 🎯 Confidence vs ⚡ A-Priori vs

Wilcoxon Signed-Rank Test vs Paired t-Test: When to Use Each

What You'll Learn

1. Direct Answer: Which Paired Test to Use

Key Points

2. How the Paired t-Test Works

Key Points

3. How the Wilcoxon Signed-Rank Test Works

Key Points

4. Worked Example 1: Blood Pressure Before and After (Paired t)

Key Points

5. Worked Example 2: Ordinal Pain Scores (Wilcoxon)

Key Points

6. Ties, Zeros, and Effect Size

Key Points

7. Running the Comparison in StatsIQ

Key Points

Key Takeaways

Practice Questions

Study with AI

FAQs

Is the Wilcoxon signed-rank test the same as the Mann-Whitney U test?

Does the Wilcoxon signed-rank test compare medians?

What sample size do I need for the signed-rank test?

Can I just always use the non-parametric test to be safe?

How does StatsIQ choose between the two tests for me?

Related Study Guides

Hypothesis Testing: The Complete Guide With 6 Worked Tests

Non-Parametric Tests: When to Use Mann-Whitney, Wilcoxon, and Kruskal-Wallis

Paired vs Independent t-Test: When to Use Which and Why It Matters for Your Results

Mann-Whitney U Test vs t-Test: When to Use Which (Worked Examples)

How to Choose the Right Statistical Test: A Decision Flowchart for Every Common Scenario

Browse All Study Guides