๐Ÿ“
fundamentalsbeginner20 min

Two-Sample t-Test Step by Step: Hypotheses, Calculation, and Interpretation With a Worked Example

A complete step-by-step walkthrough of the independent two-sample t-test โ€” from stating hypotheses through calculating the test statistic, finding the p-value, and writing the conclusion. Includes a fully worked numerical example that you can follow along with.

What You'll Learn

  • โœ“State the null and alternative hypotheses for a two-sample t-test
  • โœ“Calculate the pooled standard error, t-statistic, and degrees of freedom
  • โœ“Determine the p-value and make a decision at a given significance level
  • โœ“Write a conclusion in context that answers the original research question

1. The Direct Answer: What the Two-Sample t-Test Does

The independent two-sample t-test determines whether two group means are significantly different from each other. Example: does a new drug lower blood pressure more than a placebo? The test compares the difference between the sample means to the variability within the samples. If the difference is large relative to the variability, the test concludes that the groups are truly different (not just random fluctuation). The formula: t = (xฬ„โ‚ - xฬ„โ‚‚) / โˆš(sยฒp/nโ‚ + sยฒp/nโ‚‚), where xฬ„โ‚ and xฬ„โ‚‚ are the sample means, sยฒp is the pooled variance, and nโ‚ and nโ‚‚ are the sample sizes. The assumptions: both groups are independent (no subject appears in both), the dependent variable is continuous and approximately normally distributed in each group (or n > 30 per group), and the variances in both groups are approximately equal (check with Levene's test โ€” if variances are unequal, use Welch's t-test instead). Snap a photo of any t-test problem and StatsIQ identifies the test type, checks the assumptions, calculates the test statistic, determines the p-value, and writes the conclusion โ€” step by step with every formula shown.

Key Points

  • โ€ขThe t-test compares two group means: is the difference real or just random variation?
  • โ€ขt = (mean difference) / (standard error of the difference). Larger t = more evidence of a real difference.
  • โ€ขAssumptions: independent groups, continuous DV, approximately normal, equal variances
  • โ€ขIf variances are unequal (Levene's p < 0.05), use Welch's t-test instead of the pooled version

2. Worked Example: Step by Step With Real Numbers

Research question: Does caffeine improve reaction time? Two groups: caffeine (n=10) and placebo (n=10). Reaction time measured in milliseconds (lower = faster). Data: Caffeine group: xฬ„โ‚ = 245 ms, sโ‚ = 30 ms. Placebo group: xฬ„โ‚‚ = 268 ms, sโ‚‚ = 35 ms. **Step 1: State hypotheses.** Hโ‚€: ฮผโ‚ = ฮผโ‚‚ (no difference in mean reaction time between groups) Hโ‚: ฮผโ‚ โ‰  ฮผโ‚‚ (there is a difference) โ€” this is a two-tailed test. Significance level: ฮฑ = 0.05. **Step 2: Calculate pooled variance.** sยฒp = [(nโ‚-1)sโ‚ยฒ + (nโ‚‚-1)sโ‚‚ยฒ] / (nโ‚ + nโ‚‚ - 2) sยฒp = [(9)(900) + (9)(1225)] / 18 sยฒp = [8100 + 11025] / 18 = 19125 / 18 = 1062.5 **Step 3: Calculate the t-statistic.** t = (xฬ„โ‚ - xฬ„โ‚‚) / โˆš(sยฒp/nโ‚ + sยฒp/nโ‚‚) t = (245 - 268) / โˆš(1062.5/10 + 1062.5/10) t = -23 / โˆš(106.25 + 106.25) t = -23 / โˆš212.5 t = -23 / 14.58 t = -1.578 **Step 4: Degrees of freedom and p-value.** df = nโ‚ + nโ‚‚ - 2 = 10 + 10 - 2 = 18 Using a t-table or calculator with df=18 and t=-1.578 (two-tailed): p โ‰ˆ 0.132 **Step 5: Decision and conclusion.** p = 0.132 > ฮฑ = 0.05. Fail to reject Hโ‚€. Conclusion: There is not sufficient evidence at the 0.05 significance level to conclude that caffeine significantly affects reaction time. The 23 ms difference between groups could be due to random variation. StatsIQ shows every one of these steps when you snap a photo of a t-test problem โ€” including the pooled variance calculation that most students find hardest.

Key Points

  • โ€ขStep 1: Hโ‚€ (no difference) vs Hโ‚ (difference exists). Set ฮฑ (usually 0.05).
  • โ€ขStep 2: Pooled variance combines both groups' variability: sยฒp = weighted average of sโ‚ยฒ and sโ‚‚ยฒ
  • โ€ขStep 3: t = mean difference รท standard error. Larger |t| = stronger evidence against Hโ‚€.
  • โ€ขStep 4: df = nโ‚+nโ‚‚-2. Look up p-value. Step 5: if p < ฮฑ, reject Hโ‚€. If p โ‰ฅ ฮฑ, fail to reject.

3. Writing the Conclusion: What to Say and What Not to Say

The conclusion must include four elements: the decision (reject or fail to reject Hโ‚€), the significance level, the test statistic and p-value, and an interpretation in context. Good conclusion for our example: "An independent samples t-test was conducted to compare reaction times between the caffeine group (M = 245, SD = 30) and the placebo group (M = 268, SD = 35). There was no statistically significant difference between the groups, t(18) = -1.578, p = .132. Caffeine did not significantly improve reaction time in this sample." What NOT to say: "We accept the null hypothesis." You never accept Hโ‚€ โ€” you either reject it or fail to reject it. Failing to reject means the evidence was not strong enough to conclude a difference, not that you have proven the groups are equal. The distinction matters because a small sample might simply lack the power to detect a real difference. Also do not say: "The result is insignificant." Say "not statistically significant." Insignificant implies the result does not matter. Not statistically significant means the evidence did not reach the threshold โ€” a subtle but important difference. With a larger sample, the same 23 ms difference might reach significance because the standard error would be smaller. Effect size complements the p-value: Cohen's d = (xฬ„โ‚ - xฬ„โ‚‚) / sp = -23 / โˆš1062.5 = -23 / 32.6 = -0.71. This is a medium-to-large effect. The practical difference (23 ms faster) might be meaningful even though it is not statistically significant with n=10 per group โ€” the study may have been underpowered.

Key Points

  • โ€ขReport: test type, t-value, df, p-value, group means and SDs, and interpretation in context
  • โ€ขNEVER say "accept Hโ‚€" โ€” say "fail to reject Hโ‚€" (insufficient evidence, not proof of no difference)
  • โ€ข"Not statistically significant" โ‰  "no difference." It means the evidence was not strong enough at this sample size.
  • โ€ขCohen's d measures effect SIZE independent of sample size: small (0.2), medium (0.5), large (0.8)

4. Common Variations and When to Use Each

Paired (dependent) t-test: when the same subjects are measured twice (before/after, or matched pairs). The formula uses the differences within each pair rather than comparing group means: t = dฬ„ / (sd / โˆšn), where dฬ„ is the mean of the differences and sd is the standard deviation of the differences. Use when: pre-test/post-test designs, matched-pair experiments, or when each subject serves as their own control. Welch's t-test: when the equal variance assumption is violated (Levene's test p < 0.05 or one SD is more than double the other). Welch's does not pool the variances โ€” it calculates the standard error from each group's variance separately and adjusts the degrees of freedom downward. Most statistical software defaults to Welch's because it performs well even when variances are equal. If your professor does not specify, Welch's is the safer choice. One-sample t-test: comparing a single group mean to a known value (not another group). Example: is this class's average test score significantly different from the national average of 75? t = (xฬ„ - ฮผโ‚€) / (s / โˆšn). Use when: you have one sample and a hypothesized population mean. The choice between these three depends on study design: independent groups โ†’ independent t-test (or Welch's). Same subjects measured twice โ†’ paired t-test. One group vs a known value โ†’ one-sample t-test. StatsIQ identifies which variant applies from the problem description and solves accordingly.

Key Points

  • โ€ขPaired t-test: same subjects, two measurements. Uses within-pair differences.
  • โ€ขWelch's t-test: unequal variances. Does not pool โ€” adjusts df downward. Safer default.
  • โ€ขOne-sample t-test: one group vs a known population value (e.g., national average).
  • โ€ขStudy design determines the variant: independent groups, paired/repeated, or single group vs known value.

Key Takeaways

  • โ˜…t = (xฬ„โ‚ - xฬ„โ‚‚) / SE. Larger |t| = stronger evidence against Hโ‚€.
  • โ˜…Pooled variance: sยฒp = [(nโ‚-1)sโ‚ยฒ + (nโ‚‚-1)sโ‚‚ยฒ] / (nโ‚+nโ‚‚-2). Weights by sample size.
  • โ˜…df = nโ‚ + nโ‚‚ - 2 for pooled t-test. Welch's has adjusted (usually lower) df.
  • โ˜…Never "accept Hโ‚€" โ€” only reject or fail to reject. Failing to reject โ‰  proving no difference.
  • โ˜…Cohen's d: effect size independent of sample size. d = 0.2 small, 0.5 medium, 0.8 large.

Practice Questions

1. Group A (n=15): mean = 82, SD = 12. Group B (n=15): mean = 75, SD = 10. Test at ฮฑ = 0.05 whether the means differ.
Pooled variance: sยฒp = [(14)(144) + (14)(100)] / 28 = [2016 + 1400] / 28 = 122. t = (82-75) / โˆš(122/15 + 122/15) = 7 / โˆš16.27 = 7 / 4.03 = 1.737. df = 28. Critical t (two-tailed, ฮฑ=0.05, df=28) โ‰ˆ 2.048. Since |1.737| < 2.048, fail to reject Hโ‚€. p โ‰ˆ 0.093. Not significant at ฮฑ = 0.05. Cohen's d = 7/โˆš122 = 7/11.05 = 0.63 (medium effect โ€” may reach significance with larger n).

Study with AI

Get personalized help and instant answers anytime.

Download StatsIQ

FAQs

Common questions about this topic

A z-test is used when the population standard deviation (ฯƒ) is known. A t-test is used when ฯƒ is unknown and must be estimated from the sample (s). In practice, ฯƒ is almost never known, so the t-test is used in virtually all real applications. The z-test appears in textbooks primarily as an introduction to hypothesis testing before the t-test is taught.

Yes. Snap a photo of any t-test problem and StatsIQ identifies the variant (independent, paired, one-sample, Welch's), states the hypotheses, calculates the pooled variance and t-statistic, determines the p-value, computes Cohen's d, and writes the conclusion โ€” all step by step with every formula shown.

More Study Guides