📊
advancedadvanced35 min

Two-Way ANOVA: Main Effects, Interactions, and Worked Examples

Two-way ANOVA tests two factors simultaneously. This guide walks through main effects, interaction effects, and the F-tests — with numerical examples you can reproduce step-by-step.

What You'll Learn

  • Distinguish one-way from two-way ANOVA and know when to use each.
  • Compute and interpret main effects and interaction effects.
  • Decompose total sum of squares into its components.
  • Read and interpret a two-way ANOVA output table.
  • Recognize when interaction effects change the interpretation of main effects.

1. When You Need Two-Way ANOVA

Two-way ANOVA tests the effect of two categorical independent variables (factors) on one continuous dependent variable. Unlike running two separate one-way ANOVAs, two-way ANOVA also tests whether the factors interact — whether the effect of one factor depends on the level of the other. Example setup: a researcher tests whether plant growth (dependent variable, continuous) depends on fertilizer type (Factor A: three types) AND sunlight level (Factor B: two levels, full/partial). The design has 3 × 2 = 6 treatment combinations. Two-way ANOVA answers three questions at once: (1) does fertilizer affect growth? (2) does sunlight affect growth? (3) does the fertilizer effect depend on sunlight level (interaction)? Why not two one-way ANOVAs: running separate tests ignores the interaction, inflates Type I error across the multiple tests, and misses variance that the two-factor model captures. Two-way ANOVA is more statistically efficient and more interpretively rich.

Key Points

  • Two-way ANOVA tests effects of two factors simultaneously.
  • Adds an interaction test that one-way ANOVA cannot provide.
  • More statistically efficient than running separate one-way ANOVAs.
  • Requires a continuous outcome and two categorical predictors.

2. Main Effects Versus Interaction Effects

Main effect of Factor A: the effect of Factor A averaged across all levels of Factor B. Significant main effect of fertilizer means plants grow differently across fertilizer types, on average, ignoring sunlight. Main effect of Factor B: the effect of Factor B averaged across all levels of Factor A. Significant main effect of sunlight means plants grow differently across sunlight levels, on average, ignoring fertilizer. Interaction effect A×B: whether the effect of one factor depends on the level of the other. Significant interaction means fertilizer affects growth differently in full sun than in partial sun — for example, fertilizer 1 might be best in full sun but worst in partial sun. Interpretation priority rule: always check the interaction first. If the interaction is significant, the main effects alone are misleading. You must describe effects separately at each level of the other factor (simple main effects). If the interaction is not significant, main effects can be interpreted on their own. Visual shortcut: plot group means on a line graph with one factor on the x-axis and the other shown as different lines. Parallel lines suggest no interaction. Non-parallel lines suggest an interaction (crossing lines indicate a strong interaction; spreading lines indicate a weaker interaction).

Key Points

  • Main effect of A = effect of A averaged across B levels.
  • Main effect of B = effect of B averaged across A levels.
  • Interaction A×B = whether effect of A depends on level of B.
  • Always check interaction first — it changes how main effects are interpreted.
  • Parallel lines on an interaction plot = no interaction; crossing lines = strong interaction.

3. Sum of Squares Decomposition

Two-way ANOVA partitions total variability into components: SS_total = SS_A + SS_B + SS_AB + SS_error SS_A: variability explained by Factor A (differences between A-level means). SS_B: variability explained by Factor B (differences between B-level means). SS_AB: variability explained by interaction (how much cell means differ from what main effects alone predict). SS_error: unexplained variability within cells (residual). Degrees of freedom: - df_A = (levels of A) − 1 - df_B = (levels of B) − 1 - df_AB = df_A × df_B - df_error = N − (levels of A × levels of B), where N is total observations - df_total = N − 1 Mean squares (MS) = SS / df for each component. F-statistics (each main effect and the interaction has its own F-test): - F_A = MS_A / MS_error, with df_A and df_error - F_B = MS_B / MS_error, with df_B and df_error - F_AB = MS_AB / MS_error, with df_AB and df_error Compare each F to the critical F-value at chosen alpha (typically 0.05) or use p-values to decide significance.

Key Points

  • Total SS partitions into SS_A + SS_B + SS_AB + SS_error.
  • Each component has its own degrees of freedom.
  • MS = SS / df for each component.
  • Three F-tests: one for each main effect and one for the interaction.
  • Each F-test uses MS_error as denominator.

4. Worked Example: 3 × 2 Design with 5 Observations per Cell

Research question: does teaching method (Factor A: Lecture, Discussion, Flipped) interact with class size (Factor B: Small, Large) to affect exam score? Data structure: 3 methods × 2 sizes = 6 cells. 5 students per cell. N = 30 total. Hypothetical cell means and cell variances: Cell means (rows = method, columns = size): Small Large Lecture 78 72 Discussion 82 74 Flipped 88 71 Row means (main effect A, averaging across B): Lecture: (78+72)/2 = 75 Discussion: (82+74)/2 = 78 Flipped: (88+71)/2 = 79.5 Grand mean (GM): (75 + 78 + 79.5)/3 = 77.5 Column means (main effect B, averaging across A): Small: (78+82+88)/3 = 82.67 Large: (72+74+71)/3 = 72.33 Step 1: compute SS_A (between-row variability, scaled by n per row): Each row has 2 × 5 = 10 observations. SS_A = 10 × [(75 − 77.5)² + (78 − 77.5)² + (79.5 − 77.5)²] = 10 × [6.25 + 0.25 + 4] = 10 × 10.5 = 105 df_A = 3 − 1 = 2 MS_A = 105 / 2 = 52.5 Step 2: compute SS_B (between-column variability): Each column has 3 × 5 = 15 observations. SS_B = 15 × [(82.67 − 77.5)² + (72.33 − 77.5)²] = 15 × [26.73 + 26.73] = 15 × 53.47 = 802.0 df_B = 2 − 1 = 1 MS_B = 802.0 / 1 = 802.0 Step 3: compute SS_AB (interaction): Interaction SS comes from how much each cell mean deviates from what the main effects alone would predict. Expected cell mean = grand mean + (row effect) + (column effect) For Lecture-Small: 77.5 + (75 − 77.5) + (82.67 − 77.5) = 77.5 − 2.5 + 5.17 = 80.17 Actual cell mean: 78 Deviation: 78 − 80.17 = −2.17, squared: 4.71 Repeat for all 6 cells, sum the squared deviations, and multiply by n per cell (5): After computing for all cells, suppose SS_AB = 5 × 8.3 = 41.5 df_AB = 2 × 1 = 2 MS_AB = 41.5 / 2 = 20.75 Step 4: compute SS_error (within-cell variability): Sum of squared deviations of each observation from its cell mean. With the raw data, suppose SS_error = 288. df_error = N − (3 × 2) = 30 − 6 = 24 MS_error = 288 / 24 = 12 Step 5: compute F-statistics and compare to critical values (α = 0.05): F_A = MS_A / MS_error = 52.5 / 12 = 4.38, df(2, 24), critical F ≈ 3.40 → significant F_B = MS_B / MS_error = 802.0 / 12 = 66.83, df(1, 24), critical F ≈ 4.26 → significant F_AB = MS_AB / MS_error = 20.75 / 12 = 1.73, df(2, 24), critical F ≈ 3.40 → NOT significant Conclusion: main effects of teaching method and class size are both significant; interaction is not. We can interpret main effects directly: discussion and flipped classrooms produce higher exam scores than lecture, on average; small classes produce higher exam scores than large, on average. The effect of teaching method does not depend meaningfully on class size in this dataset.

Key Points

  • Compute row means, column means, and grand mean first.
  • SS_A: row-mean deviations from grand mean, scaled by observations per row.
  • SS_B: column-mean deviations from grand mean, scaled by observations per column.
  • SS_AB: how much cell means deviate from what main effects alone predict.
  • SS_error: within-cell variability (deviation of each observation from its cell mean).
  • Three F-tests share MS_error as denominator.

5. When the Interaction Is Significant: Simple Main Effects

When F_AB is significant, the main effects alone are misleading. You must describe effects at each level of the other factor — called simple main effects (or simple effects). Example where interaction is significant: imagine a different dataset where flipped classroom was best in small classes (mean = 92) but worst in large classes (mean = 65). Other methods were more consistent. Here, averaging across class size would hide that the flipped method is either the best or the worst depending on context. Main effect of teaching method would look small because the strong small-class performance is canceled by the weak large-class performance. How to report: rather than saying "flipped classrooms produce higher scores," you must say: "in small classes, flipped produces the highest scores; in large classes, flipped produces the lowest scores." You are describing the effect of Factor A separately at each level of Factor B. Statistical procedures: use simple main effect F-tests (one-way ANOVA within each level of the other factor), often with a corrected error term from the full model. Post-hoc tests (Tukey HSD, Bonferroni) identify which specific pairs differ within each level. This is why the interaction test matters. A non-trivial percentage of two-way ANOVA results has significant interactions. Reporting only the main effects in those cases is factually incorrect.

Key Points

  • Significant interaction = main effects alone are misleading.
  • Describe Factor A effects separately at each level of Factor B.
  • Use simple main effects F-tests, not just the overall main effect.
  • Post-hoc tests identify specific group differences within each simple main effect.
  • Interaction plots visualize the pattern that the statistics capture.

6. Assumptions, Diagnostics, and Common Pitfalls

Two-way ANOVA assumes: 1. Normality of residuals within each cell (check with Q-Q plots; more important for small n). 2. Homogeneity of variance across cells (check with Levene's test; ratio of largest to smallest cell variance should be less than 4:1). 3. Independence of observations (design issue; can't test statistically — must be satisfied by how data was collected). 4. Interval or ratio dependent variable. Diagnostics: - Plot residuals vs fitted values (should show random scatter, no funnel shape) - Q-Q plot of residuals (should be approximately linear) - Levene's test p-value > 0.05 supports homogeneity - Shapiro-Wilk test for normality within cells Common pitfalls: 1. Running two one-way ANOVAs instead of a two-way. This misses the interaction and can produce contradictory results. 2. Interpreting main effects when interaction is significant. Reread the interaction first. If significant, the main effect story is wrong or incomplete. 3. Unequal cell sizes (unbalanced design). Requires Type III sum of squares (the default in SPSS, not always in R without specification). Type I and Type II SS give different results when cells are unbalanced; Type III is usually what you want. 4. Ignoring cell size requirements. Power depends on cell size, not total N. 30 observations in 10 cells (3 per cell) has very low power. Aim for at least 10 per cell for meaningful tests; ideally 20+. 5. Using two-way ANOVA on repeated measures. If the same subjects are measured under both conditions, you need repeated-measures ANOVA (accounts for within-subject correlation). Two-way ANOVA assumes independent observations. 6. Treating a continuous predictor as categorical just to use ANOVA. If one of your "factors" is continuous, ANCOVA or multiple regression is usually more appropriate. 7. Reporting only the F-statistic without effect size. Report η² (eta squared) or ω² (omega squared): SS_effect / SS_total tells you what proportion of variability the effect explains. A significant F with η² = 0.01 is statistically significant but practically trivial.

Key Points

  • Check normality of residuals, homogeneity of variance, and independence.
  • Unbalanced designs require Type III sum of squares.
  • At least 10 observations per cell for adequate power; 20+ is better.
  • Repeated measures designs need repeated-measures ANOVA, not standard two-way.
  • Always report effect size (η² or ω²), not just F and p.

Key Takeaways

  • Two-way ANOVA partitions variability into two main effects, one interaction, and error.
  • Each of the three F-tests uses MS_error as the denominator.
  • The interaction test must be checked first — if significant, main effects alone mislead.
  • Interaction plots with non-parallel lines indicate an interaction.
  • With unbalanced cells, specify Type III sum of squares in your software.
  • Report effect size (η² = SS_effect / SS_total) alongside F and p.

Practice Questions

1. A 2×3 two-way ANOVA has 6 observations per cell. What are the degrees of freedom for each component?
df_A (2 levels) = 1, df_B (3 levels) = 2, df_AB = 1×2 = 2, df_error = N − (2×3) = 36 − 6 = 30, df_total = 35.
2. MS_error = 4.5, MS_A = 18, MS_B = 9, MS_AB = 6. Compute the three F-statistics.
F_A = 18/4.5 = 4.0; F_B = 9/4.5 = 2.0; F_AB = 6/4.5 = 1.33.
3. The interaction F is highly significant, but the main effect of Factor A is not. Can you conclude Factor A does not matter?
No. With a significant interaction, Factor A matters differently at different levels of B. Its overall average effect is small (hence non-significant main effect), but within specific B levels it may be large. Report simple main effects of A at each level of B.
4. You ran two separate one-way ANOVAs instead of a two-way. What do you lose?
You lose the interaction test entirely — you cannot detect whether Factor A effects depend on Factor B. You also inflate Type I error across the multiple tests, and you miss variance the combined model would explain. Two-way ANOVA is strictly more informative than two one-way ANOVAs.
5. Cell means for a 2×2 design are: (A1,B1)=10, (A1,B2)=14, (A2,B1)=16, (A2,B2)=12. Does the interaction plot suggest an interaction?
Yes. Plot B on x-axis, connect points by A level. Line for A1: 10 → 14 (rising). Line for A2: 16 → 12 (falling). Lines cross. A strong crossover interaction is suggested. Factor A effect reverses direction depending on B.

Study with AI

Get personalized help and instant answers anytime.

Download StatsIQ

FAQs

Common questions about this topic

Use two-way if you have two categorical independent variables and want to test their combined effects (including interaction). Use one-way if you have one categorical independent variable. If you have three categorical IVs, use three-way ANOVA (same logic extends). If one IV is continuous, consider ANCOVA or multiple regression instead.

It describes the number of levels in each factor. 2×3 means Factor A has 2 levels and Factor B has 3 levels, creating 2×3 = 6 treatment combinations (cells). The design also specifies how many observations per cell — "2×3 with n=5 per cell" means 5 observations in each of the 6 cells, 30 observations total.

Ideally yes (balanced design) but not required. Balanced designs give simpler, more powerful tests. Unbalanced designs require choosing a sum-of-squares type (Type III is most common for unbalanced designs with interactions). SPSS defaults to Type III; R defaults to Type I — specify Type III in R using car::Anova() if you have unbalanced data with interactions.

Fixed effects: the levels of your factor are all the levels you care about (e.g., three specific fertilizer types). Random effects: the levels are a random sample from a larger population (e.g., 5 randomly selected schools out of 100). The F-test denominators differ between fixed and random effects models. Most introductory two-way ANOVA uses all fixed effects. Mixed-effects models combine both.

Rule of thumb: at least 10-20 observations per cell. For a 2×3 design, that is 60-120 total observations. For smaller effect sizes, more is needed. G*Power or similar software computes exact required sample size given desired power (usually 0.80), effect size, and alpha (usually 0.05). Small cell sizes (n < 5) give unreliable results even with significant p-values.

Yes. Provide the data or describe your design and StatsIQ will compute cell means, partition the sum of squares, calculate all three F-statistics, check assumptions, compute effect sizes, and generate interpretation including simple main effects if the interaction is significant. Also flags assumption violations and suggests corrections. This content is for educational purposes only and does not constitute statistical advice.

More Study Guides