๐Ÿƒ

Hypergeometric Distribution

discrete

The hypergeometric distribution models the number of successes in a sample drawn without replacement from a finite population. Unlike the binomial distribution, where each trial is independent, the hypergeometric accounts for the changing composition of the population as items are drawn. It is used extensively in quality control, ecological capture-recapture studies, and combinatorial probability problems like card-drawing scenarios.

Formula

P(X = k) = [C(K, k) ยท C(N - K, n - k)] / C(N, n), where C(a, b) = a! / (b!(a - b)!)

Mean (Expected Value)

nK/N

Variance

n ยท (K/N) ยท (1 - K/N) ยท (N - n)/(N - 1)

Parameters

N
Population Size

The total number of items in the population. Must be a non-negative integer.

K
Number of Success States

The total number of success items in the population. Must satisfy 0 โ‰ค K โ‰ค N.

n
Number of Draws

The number of items drawn without replacement from the population. Must satisfy 0 โ‰ค n โ‰ค N.

Key Properties

  • โ€ขModels sampling without replacement from a finite population, so trials are not independent
  • โ€ขX can range from max(0, n + K - N) to min(n, K)
  • โ€ขThe factor (N - n)/(N - 1) in the variance is called the finite population correction factor; it makes the variance smaller than the corresponding binomial variance
  • โ€ขAs N โ†’ โˆž with K/N โ†’ p held constant, the hypergeometric converges to the binomial Bin(n, p)
  • โ€ขWhen n/N < 0.05 (sample is less than 5% of population), the binomial is a good approximation

Example

A deck of 52 cards contains 13 hearts. You draw 5 cards without replacement. What is the probability of getting exactly 2 hearts?

Here N = 52, K = 13, n = 5, k = 2. P(X = 2) = [C(13, 2) ยท C(39, 3)] / C(52, 5) = [78 ยท 9139] / 2598960 = 712842 / 2598960 โ‰ˆ 0.2743.

Result: P(X = 2) โ‰ˆ 0.2743, or about 27.43%

There is about a 27.43% chance of drawing exactly 2 hearts in a 5-card hand. The expected number of hearts is nK/N = 5 ร— 13/52 = 1.25. Getting 2 hearts is slightly above average but is the most likely individual outcome.

When to Use

  • โœ“When sampling without replacement from a finite population with two categories (defective/non-defective, tagged/untagged, hearts/non-hearts)
  • โœ“In quality control when inspecting a batch of items without replacement and counting defectives
  • โœ“In ecological studies using capture-recapture methods to estimate population sizes
  • โœ“When the sample size is a substantial fraction of the population (n/N > 0.05), making the binomial approximation inaccurate

Common Mistakes

  • โœ—Using the binomial distribution when sampling without replacement from a small population. The binomial assumes independence, which is violated without replacement. Use hypergeometric when n/N > 0.05.
  • โœ—Getting the parameters confused. N is the population size, K is the number of success items in the population, n is the sample size, and k is the number of successes in the sample.
  • โœ—Forgetting the constraints on k: it must satisfy max(0, n + K - N) โ‰ค k โ‰ค min(n, K). Not all values from 0 to n are possible.
  • โœ—Neglecting the finite population correction when computing the variance. The hypergeometric variance is always less than or equal to the binomial variance for the same n and p = K/N.

Need Help with Distribution Problems?

Snap a photo of any distribution problem for instant step-by-step solutions.

Download StatsIQ

FAQs

Common questions about Hypergeometric Distribution

The binomial is a good approximation to the hypergeometric when the sample size is small relative to the population size, specifically when n/N < 0.05 (the 5% rule). In this case, removing items from the population barely changes the composition, so the trials are approximately independent. Set p = K/N and use Bin(n, p). For example, sampling 10 items from a population of 10,000 can safely use the binomial, but sampling 10 from 50 should use the hypergeometric.

The finite population correction (FPC) factor is (N - n)/(N - 1), which appears in the hypergeometric variance formula. It adjusts for the fact that sampling without replacement from a finite population reduces variability compared to sampling with replacement. When N is much larger than n, the FPC is close to 1 and can be ignored. When n is a substantial fraction of N, the FPC significantly reduces the variance. In the extreme case where n = N (you sample the entire population), the FPC equals 0 and there is no variability at all.

Fisher's exact test uses the hypergeometric distribution to test for association in a 2x2 contingency table. Under the null hypothesis of no association, the cell counts follow a hypergeometric distribution with the row and column totals fixed. The test computes the exact probability of observing data as extreme as (or more extreme than) the actual table. It is preferred over the chi-square test when sample sizes are small or expected frequencies are below 5, because it does not rely on any large-sample approximation.

Related Distributions

All Distributions