๐Ÿ”ขinference

Chi-Square Statistic

ฯ‡ยฒ = ฮฃ(O - E)ยฒ / E

The chi-square statistic measures the discrepancy between observed and expected frequencies. It is used in goodness-of-fit tests (does data follow a hypothesized distribution?) and tests of independence (are two categorical variables related?). Larger values of ฯ‡ยฒ indicate greater deviation from what was expected.

Variables

ฯ‡ยฒ=Chi-Square Statistic

The test statistic measuring overall discrepancy between observed and expected counts

O=Observed Frequency

The actual count in each category from the data

E=Expected Frequency

The count expected under the null hypothesis

Example Calculation

Scenario

A die is rolled 60 times. The observed frequencies for faces 1 through 6 are: 8, 12, 10, 14, 7, 9. Test if the die is fair.

Given Data

O:8, 12, 10, 14, 7, 9
E:10 for each face (60/6 = 10)

Calculation

ฯ‡ยฒ = (8-10)ยฒ/10 + (12-10)ยฒ/10 + (10-10)ยฒ/10 + (14-10)ยฒ/10 + (7-10)ยฒ/10 + (9-10)ยฒ/10 = 0.4 + 0.4 + 0 + 1.6 + 0.9 + 0.1

Result

ฯ‡ยฒ = 3.4 with df = 5

Interpretation

With ฯ‡ยฒ = 3.4 and 5 degrees of freedom, the p-value is approximately 0.64. Since p > 0.05, we fail to reject the null hypothesis. There is no significant evidence that the die is unfair.

When to Use This Formula

  • โœ“Testing whether observed categorical data fits an expected distribution (goodness-of-fit)
  • โœ“Testing whether two categorical variables are independent (test of independence)
  • โœ“Analyzing survey responses across categories
  • โœ“Comparing proportions across multiple groups (test of homogeneity)

Common Mistakes

  • โœ—Using raw proportions or percentages instead of counts in the formula
  • โœ—Applying the chi-square test when expected frequencies are too small (generally E < 5)
  • โœ—Confusing degrees of freedom for goodness-of-fit (k - 1) versus independence ((r-1)(c-1))
  • โœ—Interpreting a large chi-square as the direction of the association without examining the residuals

Calculate This Formula Instantly

Snap a photo of any problem and get step-by-step solutions.

Download StatsIQ

FAQs

Common questions about this formula

For each cell in a contingency table, the expected frequency is E = (row total x column total) / grand total. This formula assumes the two variables are independent, which is the null hypothesis being tested.

The data must consist of independent observations, every observation must fall into exactly one category, and the expected frequency in each cell should be at least 5. When expected frequencies are too small, consider combining categories or using Fisher's exact test.

More Formulas