Wilcoxon Signed-Rank Test vs Paired t-Test: When to Use Each
The paired t-test and the Wilcoxon signed-rank test both compare two related measurements. Here is exactly how each works, when the non-parametric version wins, and two fully worked examples.
What You'll Learn
- βDecide between the paired t-test and the Wilcoxon signed-rank test from the data, not habit.
- βRun the signed-rank procedure by hand, including ranking, sign attachment, and the normal approximation.
- βReport an appropriate effect size for the non-parametric case.
1. Direct Answer: Which Paired Test to Use
Use the paired t-test when you compare two related measurements (before/after, left/right, matched pairs) and the DIFFERENCES are approximately normal β it tests whether the mean difference is zero. Use the Wilcoxon signed-rank test, its non-parametric counterpart, when the differences are skewed, have outliers, are ordinal, or your sample is small enough that normality cannot be trusted β it tests whether the distribution of differences is symmetric about zero (in practice, whether the median difference is zero). Both require that pairs are independent of one another. The trade-off is simple: when normality holds, the t-test has slightly more power; when it does not, the t-test can be badly misled by a single outlier while the signed-rank test barely flinches. The signed-rank test keeps about 95% of the t-testβs power under normality, so the cost of choosing it defensively is small.
Key Points
- β’Paired t-test: tests the mean difference, assumes the differences are roughly normal.
- β’Wilcoxon signed-rank: tests the median/symmetry of differences, robust to outliers and skew.
- β’Both assume pairs are independent of each other.
2. How the Paired t-Test Works
Collapse each pair to a single difference d = (after β before). You now have one sample of differences. Compute the mean difference dΜ and the standard deviation of the differences s_d. The test statistic is t = dΜ / (s_d / βn) with df = n β 1. Compare to the t-distribution. The only normality assumption is on the differences β not on the original two measurements, which is a point students routinely get wrong. Check it with a histogram or Q-Q plot of the d values. With n β₯ ~30 the Central Limit Theorem makes the t-test fairly forgiving; with n = 8 a single skewed difference can dominate dΜ and s_d.
Key Points
- β’Reduce each pair to one difference, then it is a one-sample t-test on the differences.
- β’t = dΜ / (s_d / βn), df = n β 1.
- β’Normality applies to the differences, not the raw measurements.
3. How the Wilcoxon Signed-Rank Test Works
Step 1: compute each difference d. Step 2: discard any pair with d = 0 (a zero carries no directional information) and reduce n accordingly. Step 3: rank the ABSOLUTE differences from smallest to largest, assigning average ranks to ties. Step 4: reattach the sign of each difference to its rank. Step 5: sum the positive ranks (W+) and the negative ranks (Wβ). The test statistic W is typically the smaller of W+ and Wβ. For small n compare W to a signed-rank critical-value table; for n larger than about 20 use the normal approximation: mean = n(n+1)/4, standard deviation = β[n(n+1)(2n+1)/24], so z = (W+ β n(n+1)/4) / sd, with a continuity correction of 0.5 improving accuracy.
Key Points
- β’Rank the absolute differences, then reattach signs.
- β’W is usually the smaller of the positive-rank sum and negative-rank sum.
- β’Large-sample normal approximation: mean n(n+1)/4, variance n(n+1)(2n+1)/24.
4. Worked Example 1: Blood Pressure Before and After (Paired t)
Ten patients have systolic BP measured before and after a 6-week program. The ten differences (after β before, mmHg) are: β8, β12, β5, β15, β3, β10, β7, β9, β6, β11. The differences look roughly symmetric with no wild outlier, so the paired t-test is reasonable. dΜ = β8.6 mmHg. s_d β 3.57. SE = 3.57/β10 = 1.13. t = β8.6 / 1.13 = β7.61 with df = 9. The two-tailed p-value is far below 0.001, so the program produced a statistically significant mean reduction of about 8.6 mmHg. A 95% CI for the mean change is β8.6 Β± 2.262 Γ 1.13 = (β11.2, β6.0) mmHg β clinically meaningful, not just statistically significant.
Key Points
- β’Symmetric, outlier-free differences justify the t-test.
- β’t = β7.61, df = 9, p < 0.001 β a clear effect.
- β’Always report the confidence interval for the mean change, not just the p-value.
5. Worked Example 2: Ordinal Pain Scores (Wilcoxon)
Now eight patients rate pain on a 0β10 ordinal scale before and after treatment. Differences (after β before): β2, β1, β4, 0, β3, +1, β2, β5. The scale is ordinal and one pair shows a 0, so the signed-rank test fits better than a t-test. Drop the zero (n becomes 7). Absolute differences: 2, 1, 4, 3, 1, 2, 5. Ranks of absolute values (averaging ties): the two 1βs get rank 1.5 each; the two 2βs get rank 3.5 each; 3 gets 5; 4 gets 6; 5 gets 7. Reattach signs: negatives at ranks 3.5, 1.5, 6, 5, 3.5, 7 and the single positive at rank 1.5. W+ = 1.5 (the lone positive). Wβ = 26.5. W = 1.5. For n = 7 the critical value at Ξ± = 0.05 (two-tailed) is 2, and W = 1.5 β€ 2, so reject the null: pain dropped significantly. Note the t-test would have been questionable here β the data are ordinal, not interval.
Key Points
- β’Ordinal data and a zero difference point to the signed-rank test.
- β’Drop zeros before ranking and reduce n.
- β’W = 1.5 β€ critical value 2 (n = 7) β reject the null.
6. Ties, Zeros, and Effect Size
Zeros: the standard Wilcoxon procedure discards them, though some implementations (Prattβs method) keep and rank them β check which your software uses, because the p-value can shift. Ties in absolute differences: assign average ranks and apply a tie correction to the variance in the normal approximation. For effect size, report the matched-pairs rank-biserial correlation r = (W+ β Wβ) / (W+ + Wβ), or the standardized r = Z/βN from the normal approximation, where N is the number of non-zero pairs; r around 0.1, 0.3, and 0.5 maps loosely to small, medium, and large. A p-value alone tells you an effect exists; the effect size tells you whether anyone should care.
Key Points
- β’Zeros are usually dropped (Wilcoxon) but Prattβs method keeps them β confirm the convention.
- β’Average ranks for ties, with a variance correction in the z approximation.
- β’Report rank-biserial r or Z/βN as the effect size.
7. Running the Comparison in StatsIQ
Snap a photo of a paired-data problem and StatsIQ checks the differences for normality, recommends the paired t-test or the Wilcoxon signed-rank test accordingly, then runs the chosen procedure step by step β including the ranking table, sign attachment, and the normal-approximation z with continuity correction. It also reports the matching effect size so your conclusion is about magnitude, not just significance. This content is for educational purposes only.
Key Points
- β’Automatic normality check on the paired differences.
- β’Full ranking-and-signs table shown for the non-parametric route.
- β’Effect size reported alongside the p-value.
Key Takeaways
- β Paired t-test assumes normal DIFFERENCES (not normal raw scores); t = dΜ/(s_d/βn), df = nβ1.
- β Wilcoxon signed-rank tests symmetry/median of differences using signed ranks of absolute differences.
- β W is usually min(W+, Wβ); large-sample mean = n(n+1)/4, sd = β[n(n+1)(2n+1)/24].
- β Drop zero differences before ranking (standard method); average ranks for ties.
- β Signed-rank retains ~95% of t-test power under normality β cheap insurance against outliers.
Practice Questions
1. Six matched differences are 4, 5, β1, 7, 6, 8. Why would you lean toward a paired t-test here?
2. In a signed-rank test n = 5 non-zero pairs all have negative differences. What is W+?
3. Your paired differences are right-skewed with one value triple the others. Which test, and why?
FAQs
Common questions about this topic
No. The signed-rank test is for PAIRED (related) data β two measurements on the same units. The Mann-Whitney U test is for two INDEPENDENT groups. They are often confused because both are rank-based non-parametric tests, but the data structure is completely different. Using Mann-Whitney on paired data throws away the pairing and loses power.
Approximately, and only under an added assumption. Strictly it tests whether the distribution of differences is symmetric about zero. If you are willing to assume the differences are symmetrically distributed, that null is equivalent to "the median difference is zero." Without symmetry it is a test of stochastic tendency, not strictly of the median.
It works for small samples, but there is a floor: with n < 6 non-zero pairs you cannot reach two-tailed significance at Ξ± = 0.05 no matter how lopsided the data, because the smallest possible W still is not extreme enough in the exact distribution. From about n = 20 upward the normal approximation is accurate.
It is a defensible default when you are unsure about normality, since it sacrifices only about 5% power under normality. But when the differences truly are normal and n is small, the paired t-test extracts a bit more information and gives a directly interpretable mean difference with a confidence interval. Match the test to what you can defend about the data.
Snap a photo of the paired dataset or problem and StatsIQ evaluates the differences for normality and outliers, recommends the paired t-test or Wilcoxon signed-rank accordingly, and runs the full procedure with the ranking table and effect size shown. It flags ordinal scales and zero differences that change the method. This content is for educational purposes only.