šŸ“
fundamentalsintermediate25-30 min

Chi-Square Tests Explained: Goodness of Fit and Test of Independence

A clear walkthrough of both chi-square tests — goodness of fit (does a distribution match what we expected?) and test of independence (are two categorical variables related?) — with worked examples, assumptions, and interpretation guidelines.

What You'll Learn

  • āœ“Explain when to use a chi-square goodness of fit test vs. a test of independence
  • āœ“Calculate the chi-square test statistic from observed and expected frequencies
  • āœ“Determine degrees of freedom and interpret the result using a chi-square table or software
  • āœ“Check the assumptions required for valid chi-square inference

1. When to Use Chi-Square Tests

Chi-square tests are your go-to tool whenever you are working with categorical data — data that falls into distinct categories rather than continuous measurements. You cannot run a t-test on whether people prefer chocolate, vanilla, or strawberry. But you can run a chi-square test. There are two main flavors. The goodness of fit test asks: does the distribution of a single categorical variable match what we expected? For example, if a die is fair, we expect each face to come up roughly 1/6 of the time. The goodness of fit test checks whether your observed results are close enough to that expectation or suspiciously far off. The test of independence asks: are two categorical variables related, or are they independent? For example, is there a relationship between gender and political party preference? The test of independence examines whether the pattern in your two-way table could have occurred by chance if the variables were truly unrelated. Both tests use the same core formula but set up the expected values differently. The key decision: one variable means goodness of fit, two variables means test of independence.

Key Points

  • •Chi-square tests work with categorical (count) data, not continuous measurements
  • •Goodness of fit: one categorical variable tested against an expected distribution
  • •Test of independence: two categorical variables tested for association in a contingency table
  • •Both use the same chi-square statistic formula but differ in how expected values are calculated

2. The Chi-Square Statistic: How It Works

The chi-square statistic measures the overall discrepancy between what you observed and what you would expect under the null hypothesis. The formula is: X² = Ī£ (O - E)² / E, where O is each observed count and E is each expected count. You calculate (O - E)² / E for every cell in your table and sum them all up. The logic is intuitive once you see it. If observed counts are close to expected counts, each (O - E)² / E term is small and the total X² is small — consistent with the null hypothesis. If observed counts are far from expected, the terms get large and X² gets large — evidence against the null. Dividing by E is important because it standardizes the contribution of each cell. A difference of 10 between observed and expected matters much more if the expected count is 20 (that is a 50% discrepancy) than if the expected count is 500 (only a 2% discrepancy). Without dividing by E, cells with large expected counts would dominate the statistic even if their proportional discrepancy was tiny. The chi-square statistic is always non-negative (because you square the differences), and it follows a chi-square distribution under the null hypothesis. Larger values provide stronger evidence against the null.

Key Points

  • •X² = Ī£ (O - E)² / E — sum over all categories or cells
  • •Small X² means observed data is close to expected (consistent with null hypothesis)
  • •Dividing by E standardizes each cell's contribution so large cells do not automatically dominate
  • •The chi-square statistic is always zero or positive and follows a chi-square distribution under H0

3. Goodness of Fit: Worked Example

Suppose you roll a die 300 times and want to test whether it is fair. Under the null hypothesis (fair die), you expect each face to appear 300/6 = 50 times. Your observed counts are: 1 = 42, 2 = 55, 3 = 48, 4 = 63, 5 = 47, 6 = 45. Calculate X²: (42-50)²/50 + (55-50)²/50 + (48-50)²/50 + (63-50)²/50 + (47-50)²/50 + (45-50)²/50 = 64/50 + 25/50 + 4/50 + 169/50 + 9/50 + 25/50 = 1.28 + 0.50 + 0.08 + 3.38 + 0.18 + 0.50 = 5.92. Degrees of freedom = number of categories minus 1 = 6 - 1 = 5. Looking up X² = 5.92 with df = 5 in a chi-square table, the critical value at alpha = 0.05 is 11.07. Since 5.92 < 11.07, we fail to reject the null. The die is consistent with being fair — the deviations we observed could reasonably occur by chance. Notice that the face showing 4 contributed the most to X² (3.38 out of 5.92). If we had observed even more 4s, that single cell could push the total past the critical value. Chi-square tests are sensitive to where the deviation occurs, not just the overall magnitude.

Key Points

  • •Expected counts for goodness of fit come from the hypothesized distribution (e.g., 1/6 for a fair die)
  • •Degrees of freedom = number of categories - 1
  • •Compare X² to the critical value from the chi-square table at your chosen alpha level
  • •Individual cell contributions to X² tell you which categories are driving any overall deviation

4. Test of Independence: Worked Example

Now suppose you survey 200 students and record both their year (freshman, sophomore) and whether they prefer online or in-person classes. The contingency table is: | | Online | In-Person | Total | |---|---|---|---| | Freshman | 45 | 55 | 100 | | Sophomore | 65 | 35 | 100 | | Total | 110 | 90 | 200 | The null hypothesis is that class year and format preference are independent. Expected counts are calculated as: E = (row total Ɨ column total) / grand total. For Freshman-Online: (100 Ɨ 110) / 200 = 55. For Freshman-In-Person: (100 Ɨ 90) / 200 = 45. Sophomore-Online: 55. Sophomore-In-Person: 45. X² = (45-55)²/55 + (55-45)²/45 + (65-55)²/55 + (35-45)²/45 = 100/55 + 100/45 + 100/55 + 100/45 = 1.82 + 2.22 + 1.82 + 2.22 = 8.08. Degrees of freedom = (rows - 1) Ɨ (columns - 1) = (2-1)(2-1) = 1. The critical value at alpha = 0.05 with df = 1 is 3.84. Since 8.08 > 3.84, we reject the null hypothesis. There is statistically significant evidence that class year and format preference are related — sophomores appear to prefer online classes more than freshmen do. StatsIQ generates practice problems with contingency tables of different sizes so you can build fluency with the expected count formula and the degrees of freedom calculation.

Key Points

  • •Expected counts for independence: E = (row total Ɨ column total) / grand total
  • •Degrees of freedom = (number of rows - 1) Ɨ (number of columns - 1)
  • •Rejecting the null means the two variables are associated — but the test does not tell you the direction or strength, only that independence is unlikely
  • •For a 2Ɨ2 table, the chi-square test of independence is equivalent to comparing two proportions

5. Assumptions and When Chi-Square Fails

Chi-square tests require several conditions to produce valid results. The most important is the expected count condition: every expected cell count should be at least 5. When expected counts are too small, the chi-square distribution is a poor approximation and the test becomes unreliable. If you have cells with expected counts below 5, you can either combine categories (merge small groups) or use Fisher's exact test (for 2Ɨ2 tables) which does not rely on the chi-square approximation. Second, the observations must be independent. Each subject or observation should contribute to exactly one cell in the table. If the same person appears in multiple categories (repeated measures), the chi-square test is not appropriate. Third, the data must be counts (frequencies), not percentages or proportions. You cannot run a chi-square test on a table of percentages — you need the raw counts. This seems obvious but is a common mistake when students work from summary tables in published papers. Finally, chi-square tests only detect association — they do not measure its strength or direction. A significant result tells you the variables are probably not independent, but not how strong the relationship is. For that, you need additional measures like Cramer's V (which ranges from 0 to 1 and quantifies effect size) or examining the standardized residuals to see which cells are driving the departure from independence.

Key Points

  • •All expected cell counts must be at least 5 for the chi-square approximation to be valid
  • •Observations must be independent — each subject contributes to exactly one cell
  • •Data must be raw counts, not percentages or proportions
  • •Chi-square detects association but not strength or direction — use Cramer's V for effect size

Key Takeaways

  • ā˜…X² = Ī£ (O - E)² / E — the fundamental chi-square formula for both test types
  • ā˜…Goodness of fit df = categories - 1; independence df = (rows - 1)(columns - 1)
  • ā˜…All expected counts must be at least 5 for valid inference
  • ā˜…Chi-square is always a right-tailed test — only large values provide evidence against H0
  • ā˜…For 2Ɨ2 tables with small samples, use Fisher's exact test instead of chi-square

Practice Questions

1. A candy bag claims 25% red, 25% blue, 25% green, 25% yellow. You count 40 red, 30 blue, 35 green, 45 yellow out of 150 candies. Calculate X² and determine if the distribution matches the claim at alpha = 0.05.
Expected: 150 Ɨ 0.25 = 37.5 for each color. X² = (40-37.5)²/37.5 + (30-37.5)²/37.5 + (35-37.5)²/37.5 + (45-37.5)²/37.5 = 0.167 + 1.500 + 0.167 + 1.500 = 3.333. With df = 3, the critical value at 0.05 is 7.815. Since 3.333 < 7.815, fail to reject — the distribution is consistent with the claimed proportions.
2. In a 3Ɨ2 contingency table with 300 observations, what are the degrees of freedom for a test of independence?
df = (3-1)(2-1) = 2 Ɨ 1 = 2.

Study with AI

Get personalized help and instant answers anytime.

Download StatsIQ

FAQs

Common questions about this topic

For comparing two proportions (a 2Ɨ2 table), they are mathematically equivalent — the chi-square statistic equals the square of the z-statistic. Chi-square generalizes to larger tables (3+ categories or 3+ groups) where a single z-test cannot be applied. Use chi-square when you have more than two groups or more than two categories.

Yes. StatsIQ generates chi-square problems for both goodness of fit and independence tests, including calculating expected counts, finding the test statistic, determining degrees of freedom, and interpreting results in context.

More Study Guides