🎯
advancedintermediate20-25 min

Wilcoxon Signed-Rank Test vs Paired t-Test: When to Use Each

The paired t-test and the Wilcoxon signed-rank test both compare two related measurements. Here is exactly how each works, when the non-parametric version wins, and two fully worked examples.

What You'll Learn

  • βœ“Decide between the paired t-test and the Wilcoxon signed-rank test from the data, not habit.
  • βœ“Run the signed-rank procedure by hand, including ranking, sign attachment, and the normal approximation.
  • βœ“Report an appropriate effect size for the non-parametric case.

1. Direct Answer: Which Paired Test to Use

Use the paired t-test when you compare two related measurements (before/after, left/right, matched pairs) and the DIFFERENCES are approximately normal β€” it tests whether the mean difference is zero. Use the Wilcoxon signed-rank test, its non-parametric counterpart, when the differences are skewed, have outliers, are ordinal, or your sample is small enough that normality cannot be trusted β€” it tests whether the distribution of differences is symmetric about zero (in practice, whether the median difference is zero). Both require that pairs are independent of one another. The trade-off is simple: when normality holds, the t-test has slightly more power; when it does not, the t-test can be badly misled by a single outlier while the signed-rank test barely flinches. The signed-rank test keeps about 95% of the t-test’s power under normality, so the cost of choosing it defensively is small.

Key Points

  • β€’Paired t-test: tests the mean difference, assumes the differences are roughly normal.
  • β€’Wilcoxon signed-rank: tests the median/symmetry of differences, robust to outliers and skew.
  • β€’Both assume pairs are independent of each other.

2. How the Paired t-Test Works

Collapse each pair to a single difference d = (after βˆ’ before). You now have one sample of differences. Compute the mean difference dΜ„ and the standard deviation of the differences s_d. The test statistic is t = dΜ„ / (s_d / √n) with df = n βˆ’ 1. Compare to the t-distribution. The only normality assumption is on the differences β€” not on the original two measurements, which is a point students routinely get wrong. Check it with a histogram or Q-Q plot of the d values. With n β‰₯ ~30 the Central Limit Theorem makes the t-test fairly forgiving; with n = 8 a single skewed difference can dominate dΜ„ and s_d.

Key Points

  • β€’Reduce each pair to one difference, then it is a one-sample t-test on the differences.
  • β€’t = dΜ„ / (s_d / √n), df = n βˆ’ 1.
  • β€’Normality applies to the differences, not the raw measurements.

3. How the Wilcoxon Signed-Rank Test Works

Step 1: compute each difference d. Step 2: discard any pair with d = 0 (a zero carries no directional information) and reduce n accordingly. Step 3: rank the ABSOLUTE differences from smallest to largest, assigning average ranks to ties. Step 4: reattach the sign of each difference to its rank. Step 5: sum the positive ranks (W+) and the negative ranks (Wβˆ’). The test statistic W is typically the smaller of W+ and Wβˆ’. For small n compare W to a signed-rank critical-value table; for n larger than about 20 use the normal approximation: mean = n(n+1)/4, standard deviation = √[n(n+1)(2n+1)/24], so z = (W+ βˆ’ n(n+1)/4) / sd, with a continuity correction of 0.5 improving accuracy.

Key Points

  • β€’Rank the absolute differences, then reattach signs.
  • β€’W is usually the smaller of the positive-rank sum and negative-rank sum.
  • β€’Large-sample normal approximation: mean n(n+1)/4, variance n(n+1)(2n+1)/24.

4. Worked Example 1: Blood Pressure Before and After (Paired t)

Ten patients have systolic BP measured before and after a 6-week program. The ten differences (after βˆ’ before, mmHg) are: βˆ’8, βˆ’12, βˆ’5, βˆ’15, βˆ’3, βˆ’10, βˆ’7, βˆ’9, βˆ’6, βˆ’11. The differences look roughly symmetric with no wild outlier, so the paired t-test is reasonable. dΜ„ = βˆ’8.6 mmHg. s_d β‰ˆ 3.57. SE = 3.57/√10 = 1.13. t = βˆ’8.6 / 1.13 = βˆ’7.61 with df = 9. The two-tailed p-value is far below 0.001, so the program produced a statistically significant mean reduction of about 8.6 mmHg. A 95% CI for the mean change is βˆ’8.6 Β± 2.262 Γ— 1.13 = (βˆ’11.2, βˆ’6.0) mmHg β€” clinically meaningful, not just statistically significant.

Key Points

  • β€’Symmetric, outlier-free differences justify the t-test.
  • β€’t = βˆ’7.61, df = 9, p < 0.001 β€” a clear effect.
  • β€’Always report the confidence interval for the mean change, not just the p-value.

5. Worked Example 2: Ordinal Pain Scores (Wilcoxon)

Now eight patients rate pain on a 0–10 ordinal scale before and after treatment. Differences (after βˆ’ before): βˆ’2, βˆ’1, βˆ’4, 0, βˆ’3, +1, βˆ’2, βˆ’5. The scale is ordinal and one pair shows a 0, so the signed-rank test fits better than a t-test. Drop the zero (n becomes 7). Absolute differences: 2, 1, 4, 3, 1, 2, 5. Ranks of absolute values (averaging ties): the two 1’s get rank 1.5 each; the two 2’s get rank 3.5 each; 3 gets 5; 4 gets 6; 5 gets 7. Reattach signs: negatives at ranks 3.5, 1.5, 6, 5, 3.5, 7 and the single positive at rank 1.5. W+ = 1.5 (the lone positive). Wβˆ’ = 26.5. W = 1.5. For n = 7 the critical value at Ξ± = 0.05 (two-tailed) is 2, and W = 1.5 ≀ 2, so reject the null: pain dropped significantly. Note the t-test would have been questionable here β€” the data are ordinal, not interval.

Key Points

  • β€’Ordinal data and a zero difference point to the signed-rank test.
  • β€’Drop zeros before ranking and reduce n.
  • β€’W = 1.5 ≀ critical value 2 (n = 7) β†’ reject the null.

6. Ties, Zeros, and Effect Size

Zeros: the standard Wilcoxon procedure discards them, though some implementations (Pratt’s method) keep and rank them β€” check which your software uses, because the p-value can shift. Ties in absolute differences: assign average ranks and apply a tie correction to the variance in the normal approximation. For effect size, report the matched-pairs rank-biserial correlation r = (W+ βˆ’ Wβˆ’) / (W+ + Wβˆ’), or the standardized r = Z/√N from the normal approximation, where N is the number of non-zero pairs; r around 0.1, 0.3, and 0.5 maps loosely to small, medium, and large. A p-value alone tells you an effect exists; the effect size tells you whether anyone should care.

Key Points

  • β€’Zeros are usually dropped (Wilcoxon) but Pratt’s method keeps them β€” confirm the convention.
  • β€’Average ranks for ties, with a variance correction in the z approximation.
  • β€’Report rank-biserial r or Z/√N as the effect size.

7. Running the Comparison in StatsIQ

Snap a photo of a paired-data problem and StatsIQ checks the differences for normality, recommends the paired t-test or the Wilcoxon signed-rank test accordingly, then runs the chosen procedure step by step β€” including the ranking table, sign attachment, and the normal-approximation z with continuity correction. It also reports the matching effect size so your conclusion is about magnitude, not just significance. This content is for educational purposes only.

Key Points

  • β€’Automatic normality check on the paired differences.
  • β€’Full ranking-and-signs table shown for the non-parametric route.
  • β€’Effect size reported alongside the p-value.

Key Takeaways

  • β˜…Paired t-test assumes normal DIFFERENCES (not normal raw scores); t = dΜ„/(s_d/√n), df = nβˆ’1.
  • β˜…Wilcoxon signed-rank tests symmetry/median of differences using signed ranks of absolute differences.
  • β˜…W is usually min(W+, Wβˆ’); large-sample mean = n(n+1)/4, sd = √[n(n+1)(2n+1)/24].
  • β˜…Drop zero differences before ranking (standard method); average ranks for ties.
  • β˜…Signed-rank retains ~95% of t-test power under normality β€” cheap insurance against outliers.

Practice Questions

1. Six matched differences are 4, 5, βˆ’1, 7, 6, 8. Why would you lean toward a paired t-test here?
The differences are interval-scaled, all but one positive, and show no extreme outlier; with no reason to doubt symmetry the t-test is appropriate and slightly more powerful. You would still inspect a Q-Q plot given the small n.
2. In a signed-rank test n = 5 non-zero pairs all have negative differences. What is W+?
W+ = 0, because there are no positive ranks to sum. W = min(W+, Wβˆ’) = 0, which is the most extreme possible value and will reject for any reasonable Ξ± at n = 5 (critical value is 0 at Ξ± = 0.05 two-tailed for n = 6, so n = 5 cannot reach two-tailed significance β€” a reminder that very small samples limit the non-parametric test).
3. Your paired differences are right-skewed with one value triple the others. Which test, and why?
Wilcoxon signed-rank. The outlier and skew violate the t-test’s normality assumption and would inflate s_d, shrinking t; the rank-based test caps the outlier’s influence at its rank.

Study with AI

Get personalized help and instant answers anytime.

Download StatsIQ

FAQs

Common questions about this topic

No. The signed-rank test is for PAIRED (related) data β€” two measurements on the same units. The Mann-Whitney U test is for two INDEPENDENT groups. They are often confused because both are rank-based non-parametric tests, but the data structure is completely different. Using Mann-Whitney on paired data throws away the pairing and loses power.

Approximately, and only under an added assumption. Strictly it tests whether the distribution of differences is symmetric about zero. If you are willing to assume the differences are symmetrically distributed, that null is equivalent to "the median difference is zero." Without symmetry it is a test of stochastic tendency, not strictly of the median.

It works for small samples, but there is a floor: with n < 6 non-zero pairs you cannot reach two-tailed significance at Ξ± = 0.05 no matter how lopsided the data, because the smallest possible W still is not extreme enough in the exact distribution. From about n = 20 upward the normal approximation is accurate.

It is a defensible default when you are unsure about normality, since it sacrifices only about 5% power under normality. But when the differences truly are normal and n is small, the paired t-test extracts a bit more information and gives a directly interpretable mean difference with a confidence interval. Match the test to what you can defend about the data.

Snap a photo of the paired dataset or problem and StatsIQ evaluates the differences for normality and outliers, recommends the paired t-test or Wilcoxon signed-rank accordingly, and runs the full procedure with the ranking table and effect size shown. It flags ordinal scales and zero differences that change the method. This content is for educational purposes only.

Related Study Guides

Browse All Study Guides

🎯 AP StatisticsπŸ”¬ Introduction toπŸ“ˆ Regression Analysis🎲 Probability FoundationsπŸ“Š Understanding StatisticalπŸ§ͺ ANOVA andπŸ“‰ Data VisualizationπŸ”„ Bayesian vsπŸ“Š What IsπŸ“ What IsπŸ”— Correlation vsπŸ“ Central LimitπŸ“ Confidence Intervals:πŸ“ P-Values andπŸ“ Chi-Square Tests⚠️ Type I🎲 Sampling MethodsπŸ“ˆ Introduction toπŸ“ Effect SizeπŸ“‰ Multiple Regression:πŸ”€ Non-Parametric Tests:🎯 How toπŸ§ͺ A/B Testing🧹 Data Cleaning⏱️ Survival Analysis:πŸ”— Introduction toπŸ“ˆ Time SeriesπŸ”¬ Principal ComponentπŸ”€ How toπŸ“ Two-Sample t-TestπŸ“Š How toπŸ”€ Paired vsπŸ“‹ How toπŸ“Š Z-Scores andπŸ“ˆ R Squared🎲 Binomial Probability🎲 Expected ValueπŸ“ Standard Error🎯 Margin ofπŸ“Š Contingency TablesπŸ“‰ Poisson Distribution:πŸ“ Cohen's dπŸ”— Pearson vsβš–οΈ One-Tailed vsπŸ”” Normal DistributionπŸ“‰ Linear RegressionπŸ“Š Mean vs🎯 Confidence vsπŸ“Š Two-Way ANOVA:⚑ Statistical Power🎯 Conditional Probability🎲 Permutations vsπŸ“ˆ Log TransformationsπŸ”„ Simpson's Paradox:πŸ§ͺ Hypothesis Testing:🎲 Probability Distributions:πŸ“ˆ Central Limitβš–οΈ Type I🎯 P-Value Interpretation:↔️ One-Tailed vs🎲 Binomial vsπŸ“Š Normal DistributionπŸ“ˆ Discrete vsπŸ“Š Chi-Square Goodness-of-FitπŸ”¬ Mann-Whitney U⏱️ Exponential Distribution:🎯 Geometric vs🎯 Wilcoxon Signed-Rank🎯 Kruskal-Wallis Test🎯 Tukey HSD🎯 Relative Risk