Chi-Square Test of Independence
Perform a chi-square test of independence to determine whether there is an association between two categorical variables in a contingency table.
Problem Scenario
A public health researcher wants to determine whether exercise frequency is associated with stress level among office workers. She surveys 300 workers and classifies them by exercise frequency (Low, Moderate, High) and stress level (Low, High). The observed counts are: Low Exercise & Low Stress = 30, Low Exercise & High Stress = 70, Moderate Exercise & Low Stress = 55, Moderate Exercise & High Stress = 45, High Exercise & Low Stress = 65, High Exercise & High Stress = 35. Test at alpha = 0.05.
Given Data
Requirements
- Set up the contingency table and state hypotheses
- Calculate expected frequencies and the chi-square statistic
- Determine degrees of freedom and make a conclusion
Solution
Step 1:
State hypotheses. H_0: Exercise frequency and stress level are independent (no association). H_a: Exercise frequency and stress level are not independent (there is an association).
Step 2:
Calculate expected frequencies using E = (row total x column total) / grand total. For each cell with row total = 100 and column totals = 150 each: E = (100 x 150) / 300 = 50 for every cell. So all six expected counts are 50.
Step 3:
Calculate chi-square contributions for each cell: (O - E)^2 / E. Low Ex & Low Stress: (30 - 50)^2 / 50 = 400/50 = 8.0. Low Ex & High Stress: (70 - 50)^2 / 50 = 400/50 = 8.0. Mod Ex & Low Stress: (55 - 50)^2 / 50 = 25/50 = 0.5. Mod Ex & High Stress: (45 - 50)^2 / 50 = 25/50 = 0.5. High Ex & Low Stress: (65 - 50)^2 / 50 = 225/50 = 4.5. High Ex & High Stress: (35 - 50)^2 / 50 = 225/50 = 4.5.
Step 4:
Sum all contributions: chi-square = 8.0 + 8.0 + 0.5 + 0.5 + 4.5 + 4.5 = 26.0. Degrees of freedom: df = (rows - 1)(columns - 1) = (3 - 1)(2 - 1) = 2.
Step 5:
Find the p-value. With chi-square = 26.0 and df = 2, the p-value is extremely small (p < 0.0001). The critical value at alpha = 0.05 with df = 2 is 5.991. Since 26.0 > 5.991, we reject H_0.
Step 6:
Conclusion: At the 0.05 significance level, there is a statistically significant association between exercise frequency and stress level among office workers. Workers who exercise more tend to report lower stress.
Final Answer
Chi-square = 26.0, df = 2, p-value < 0.0001. We reject H_0 at the 0.05 level. There is strong evidence of an association between exercise frequency and stress level. Higher exercise frequency is associated with lower reported stress.
Key Takeaways
- โThe chi-square test of independence assesses whether two categorical variables are associated. It does not measure the strength or direction of the association.
- โAll expected cell counts should be at least 5 for the chi-square approximation to be valid. If not, consider combining categories or using Fisher's exact test.
- โA significant result tells you that the variables are associated, but it does not imply causation. Other confounding factors may explain the relationship.
Common Errors to Avoid
- โUsing observed counts instead of expected counts in the chi-square formula. The formula requires (Observed - Expected)^2 / Expected.
- โCalculating degrees of freedom incorrectly. For a contingency table, df = (number of rows - 1) x (number of columns - 1), not (number of cells - 1).
- โInterpreting a significant chi-square result as proof of a causal relationship. Association does not imply causation.
Practice More Problems with AI
Snap a photo of any problem and get instant explanations.
Download StatsIQFAQs
Common questions about this problem type
A test of independence uses a contingency table to assess whether two categorical variables are related. A goodness-of-fit test compares observed frequency counts of a single categorical variable to expected counts from a hypothesized distribution. They use the same test statistic formula but address different research questions.
Use Cramer's V, which ranges from 0 (no association) to 1 (perfect association). It is calculated as V = sqrt(chi-square / (n x min(r-1, c-1))), where r and c are the number of rows and columns. For this problem, V = sqrt(26 / (300 x 1)) = sqrt(0.0867) = 0.294, indicating a moderate association.