📐

fundamentalsbeginner20 min

Two-Sample t-Test Step by Step: Hypotheses, Calculation, and Interpretation With a Worked Example

Q: What is the difference between a t-test and a z-test?

A z-test is used when the population standard deviation (σ) is known. A t-test is used when σ is unknown and must be estimated from the sample (s). In practice, σ is almost never known, so the t-test is used in virtually all real applications. The z-test appears in textbooks primarily as an introduction to hypothesis testing before the t-test is taught.

Q: Can StatsIQ solve t-test problems?

Yes. Snap a photo of any t-test problem and StatsIQ identifies the variant (independent, paired, one-sample, Welch's), states the hypotheses, calculates the pooled variance and t-statistic, determines the p-value, computes Cohen's d, and writes the conclusion — all step by step with every formula shown.

A complete step-by-step walkthrough of the independent two-sample t-test — from stating hypotheses through calculating the test statistic, finding the p-value, and writing the conclusion. Includes a fully worked numerical example that you can follow along with.

What You'll Learn

✓State the null and alternative hypotheses for a two-sample t-test
✓Calculate the pooled standard error, t-statistic, and degrees of freedom
✓Determine the p-value and make a decision at a given significance level
✓Write a conclusion in context that answers the original research question

1. The Direct Answer: What the Two-Sample t-Test Does

The independent two-sample t-test determines whether two group means are significantly different from each other. Example: does a new drug lower blood pressure more than a placebo? The test compares the difference between the sample means to the variability within the samples. If the difference is large relative to the variability, the test concludes that the groups are truly different (not just random fluctuation). The formula: t = (x̄₁ - x̄₂) / √(s²p/n₁ + s²p/n₂), where x̄₁ and x̄₂ are the sample means, s²p is the pooled variance, and n₁ and n₂ are the sample sizes. The assumptions: both groups are independent (no subject appears in both), the dependent variable is continuous and approximately normally distributed in each group (or n > 30 per group), and the variances in both groups are approximately equal (check with Levene's test — if variances are unequal, use Welch's t-test instead). Snap a photo of any t-test problem and StatsIQ identifies the test type, checks the assumptions, calculates the test statistic, determines the p-value, and writes the conclusion — step by step with every formula shown.

Key Points

•The t-test compares two group means: is the difference real or just random variation?
•t = (mean difference) / (standard error of the difference). Larger t = more evidence of a real difference.
•Assumptions: independent groups, continuous DV, approximately normal, equal variances
•If variances are unequal (Levene's p < 0.05), use Welch's t-test instead of the pooled version

2. Worked Example: Step by Step With Real Numbers

Research question: Does caffeine improve reaction time? Two groups: caffeine (n=10) and placebo (n=10). Reaction time measured in milliseconds (lower = faster). Data: Caffeine group: x̄₁ = 245 ms, s₁ = 30 ms. Placebo group: x̄₂ = 268 ms, s₂ = 35 ms. **Step 1: State hypotheses.** H₀: μ₁ = μ₂ (no difference in mean reaction time between groups) Hₐ: μ₁ ≠ μ₂ (there is a difference) — this is a two-tailed test. Significance level: α = 0.05. **Step 2: Calculate pooled variance.** s²p = [(n₁-1)s₁² + (n₂-1)s₂²] / (n₁ + n₂ - 2) s²p = [(9)(900) + (9)(1225)] / 18 s²p = [8100 + 11025] / 18 = 19125 / 18 = 1062.5 **Step 3: Calculate the t-statistic.** t = (x̄₁ - x̄₂) / √(s²p/n₁ + s²p/n₂) t = (245 - 268) / √(1062.5/10 + 1062.5/10) t = -23 / √(106.25 + 106.25) t = -23 / √212.5 t = -23 / 14.58 t = -1.578 **Step 4: Degrees of freedom and p-value.** df = n₁ + n₂ - 2 = 10 + 10 - 2 = 18 Using a t-table or calculator with df=18 and t=-1.578 (two-tailed): p ≈ 0.132 **Step 5: Decision and conclusion.** p = 0.132 > α = 0.05. Fail to reject H₀. Conclusion: There is not sufficient evidence at the 0.05 significance level to conclude that caffeine significantly affects reaction time. The 23 ms difference between groups could be due to random variation. StatsIQ shows every one of these steps when you snap a photo of a t-test problem — including the pooled variance calculation that most students find hardest.

Key Points

•Step 1: H₀ (no difference) vs Hₐ (difference exists). Set α (usually 0.05).
•Step 2: Pooled variance combines both groups' variability: s²p = weighted average of s₁² and s₂²
•Step 3: t = mean difference ÷ standard error. Larger |t| = stronger evidence against H₀.
•Step 4: df = n₁+n₂-2. Look up p-value. Step 5: if p < α, reject H₀. If p ≥ α, fail to reject.

3. Writing the Conclusion: What to Say and What Not to Say

The conclusion must include four elements: the decision (reject or fail to reject H₀), the significance level, the test statistic and p-value, and an interpretation in context. Good conclusion for our example: "An independent samples t-test was conducted to compare reaction times between the caffeine group (M = 245, SD = 30) and the placebo group (M = 268, SD = 35). There was no statistically significant difference between the groups, t(18) = -1.578, p = .132. Caffeine did not significantly improve reaction time in this sample." What NOT to say: "We accept the null hypothesis." You never accept H₀ — you either reject it or fail to reject it. Failing to reject means the evidence was not strong enough to conclude a difference, not that you have proven the groups are equal. The distinction matters because a small sample might simply lack the power to detect a real difference. Also do not say: "The result is insignificant." Say "not statistically significant." Insignificant implies the result does not matter. Not statistically significant means the evidence did not reach the threshold — a subtle but important difference. With a larger sample, the same 23 ms difference might reach significance because the standard error would be smaller. Effect size complements the p-value: Cohen's d = (x̄₁ - x̄₂) / sp = -23 / √1062.5 = -23 / 32.6 = -0.71. This is a medium-to-large effect. The practical difference (23 ms faster) might be meaningful even though it is not statistically significant with n=10 per group — the study may have been underpowered.

Key Points

•Report: test type, t-value, df, p-value, group means and SDs, and interpretation in context
•NEVER say "accept H₀" — say "fail to reject H₀" (insufficient evidence, not proof of no difference)
•"Not statistically significant" ≠ "no difference." It means the evidence was not strong enough at this sample size.
•Cohen's d measures effect SIZE independent of sample size: small (0.2), medium (0.5), large (0.8)

4. Common Variations and When to Use Each

Paired (dependent) t-test: when the same subjects are measured twice (before/after, or matched pairs). The formula uses the differences within each pair rather than comparing group means: t = d̄ / (sd / √n), where d̄ is the mean of the differences and sd is the standard deviation of the differences. Use when: pre-test/post-test designs, matched-pair experiments, or when each subject serves as their own control. Welch's t-test: when the equal variance assumption is violated (Levene's test p < 0.05 or one SD is more than double the other). Welch's does not pool the variances — it calculates the standard error from each group's variance separately and adjusts the degrees of freedom downward. Most statistical software defaults to Welch's because it performs well even when variances are equal. If your professor does not specify, Welch's is the safer choice. One-sample t-test: comparing a single group mean to a known value (not another group). Example: is this class's average test score significantly different from the national average of 75? t = (x̄ - μ₀) / (s / √n). Use when: you have one sample and a hypothesized population mean. The choice between these three depends on study design: independent groups → independent t-test (or Welch's). Same subjects measured twice → paired t-test. One group vs a known value → one-sample t-test. StatsIQ identifies which variant applies from the problem description and solves accordingly.

Key Points

•Paired t-test: same subjects, two measurements. Uses within-pair differences.
•Welch's t-test: unequal variances. Does not pool — adjusts df downward. Safer default.
•One-sample t-test: one group vs a known population value (e.g., national average).
•Study design determines the variant: independent groups, paired/repeated, or single group vs known value.

Key Takeaways

★t = (x̄₁ - x̄₂) / SE. Larger |t| = stronger evidence against H₀.
★Pooled variance: s²p = [(n₁-1)s₁² + (n₂-1)s₂²] / (n₁+n₂-2). Weights by sample size.
★df = n₁ + n₂ - 2 for pooled t-test. Welch's has adjusted (usually lower) df.
★Never "accept H₀" — only reject or fail to reject. Failing to reject ≠ proving no difference.
★Cohen's d: effect size independent of sample size. d = 0.2 small, 0.5 medium, 0.8 large.

Practice Questions

1. Group A (n=15): mean = 82, SD = 12. Group B (n=15): mean = 75, SD = 10. Test at α = 0.05 whether the means differ.

Pooled variance: s²p = [(14)(144) + (14)(100)] / 28 = [2016 + 1400] / 28 = 122. t = (82-75) / √(122/15 + 122/15) = 7 / √16.27 = 7 / 4.03 = 1.737. df = 28. Critical t (two-tailed, α=0.05, df=28) ≈ 2.048. Since |1.737| < 2.048, fail to reject H₀. p ≈ 0.093. Not significant at α = 0.05. Cohen's d = 7/√122 = 7/11.05 = 0.63 (medium effect — may reach significance with larger n).

Study with AI

Get personalized help and instant answers anytime.

Download StatsIQ

FAQs

Common questions about this topic

A z-test is used when the population standard deviation (σ) is known. A t-test is used when σ is unknown and must be estimated from the sample (s). In practice, σ is almost never known, so the t-test is used in virtually all real applications. The z-test appears in textbooks primarily as an introduction to hypothesis testing before the t-test is taught.

Yes. Snap a photo of any t-test problem and StatsIQ identifies the variant (independent, paired, one-sample, Welch's), states the hypotheses, calculates the pooled variance and t-statistic, determines the p-value, computes Cohen's d, and writes the conclusion — all step by step with every formula shown.

Related Study Guides

🔬 fundamentals

Browse All Study Guides

🎯 AP Statistics 🔬 Introduction to 📈 Regression Analysis 🎲 Probability Foundations 📊 Understanding Statistical 🧪 ANOVA and 📉 Data Visualization 🔄 Bayesian vs 📊 What Is 📐 What Is 🔗 Correlation vs 📐 Central Limit 📏 Confidence Intervals:📐 P-Values and 📐 Chi-Square Tests ⚠️ Type I 🎲 Sampling Methods 📈 Introduction to 📏 Effect Size 📉 Multiple Regression:🔀 Non-Parametric Tests:🎯 How to 🧪 A/B Testing 🧹 Data Cleaning ⏱️ Survival Analysis:🔗 Introduction to 📈 Time Series 🔬 Principal Component 🔀 How to 📐 Two-Sample t-Test 📊 How to 🔀 Paired vs 📋 How to 📊 Z-Scores and 📈 R Squared 🎲 Binomial Probability 🎲 Expected Value 📐 Standard Error 🎯 Margin of 📊 Contingency Tables 📉 Poisson Distribution:📏 Cohen's d 🔗 Pearson vs ⚖️ One-Tailed vs 🔔 Normal Distribution 📉 Linear Regression 📊 Mean vs 🎯 Confidence vs 📊 Two-Way ANOVA:⚡ Statistical Power 🎯 Conditional Probability 🎲 Permutations vs 📈 Log Transformations 🔄 Simpson's Paradox:🧪 Hypothesis Testing:🎲 Probability Distributions:📈 Central Limit ⚖️ Type I 🎯 P-Value Interpretation:↔️ One-Tailed vs 🎲 Binomial vs 📊 Normal Distribution 📈 Discrete vs 📊 Chi-Square Goodness-of-Fit 🔬 Mann-Whitney U ⏱️ Exponential Distribution:🎯 Geometric vs 🎯 Wilcoxon Signed-Rank 🎯 Kruskal-Wallis Test 🎯 Tukey HSD 🎯 Relative Risk 🔁 Friedman Test 📈 Spearman vs 🎚️ Bonferroni vs 🎯 Confidence vs ⚡ A-Priori vs

Two-Sample t-Test Step by Step: Hypotheses, Calculation, and Interpretation With a Worked Example

What You'll Learn

1. The Direct Answer: What the Two-Sample t-Test Does

Key Points

2. Worked Example: Step by Step With Real Numbers

Key Points

3. Writing the Conclusion: What to Say and What Not to Say

Key Points

4. Common Variations and When to Use Each

Key Points

Key Takeaways

Practice Questions

Study with AI

FAQs

What is the difference between a t-test and a z-test?

Can StatsIQ solve t-test problems?

Related Study Guides

Introduction to Hypothesis Testing

P-Values and Statistical Significance: What They Actually Mean

Type I and Type II Errors Explained: Power, Sample Size, and the Trade-Off

Browse All Study Guides