🎯
fundamentalsintermediate20-25 min

Geometric vs Negative Binomial Distribution: When to Use Each

Side-by-side comparison of the geometric (trials until first success) and negative binomial (trials until r-th success) distributions — formulas, mean and variance, two worked examples, and the conceptual link to the binomial.

What You'll Learn

  • Distinguish geometric (until first success) from negative binomial (until r-th success).
  • Apply the PMF and moment formulas for both distributions correctly.
  • Choose the right "waiting-for-success" distribution given a problem statement.

1. Direct Answer: Choosing Between Geometric and Negative Binomial

The geometric distribution models the number of independent Bernoulli trials needed to obtain the FIRST success. The negative binomial models the number of trials (or failures) needed to obtain the R-TH success. Both share the underlying Bernoulli framework with constant success probability p across independent trials. The geometric is a special case of the negative binomial with r=1. Use geometric when the question is "how long until the first arrival of an event"; use negative binomial when "how long until the third (or fifth, or r-th) arrival." Both distributions are discrete, defined on positive integers (or non-negative integers depending on convention), and both have right-skewed shapes that approach symmetry only when both r and p are large.

Key Points

  • Geometric: trials until first success. Negative binomial: trials until r-th success.
  • Geometric is a special case of negative binomial with r=1.
  • Both require constant p and independence across trials.

2. Geometric Distribution: Formulas and Intuition

There are two conventions and you must check which one any textbook or software uses. Convention A: X = number of TRIALS until the first success, X ∈ {1, 2, 3, ...}. P(X = k) = (1-p)^(k-1) × p. Mean = 1/p. Variance = (1-p)/p². Convention B: X = number of FAILURES before the first success, X ∈ {0, 1, 2, ...}. P(X = k) = (1-p)^k × p. Mean = (1-p)/p. Variance = (1-p)/p². Both produce identical answers to "what is the probability the first success occurs on trial 4" if you map the conventions consistently, but mean and variance differ by 1. R’s pgeom uses convention B, scipy.stats.geom uses convention A. Always check.

Key Points

  • Convention A counts trials including the success; mean = 1/p.
  • Convention B counts failures before the success; mean = (1-p)/p.
  • Software differs — check before trusting a computed probability.

3. Negative Binomial Distribution: Formulas

Convention A: X = number of trials to get the r-th success, X ∈ {r, r+1, ...}. P(X = k) = C(k-1, r-1) × p^r × (1-p)^(k-r). Mean = r/p. Variance = r(1-p)/p². Convention B: X = number of failures before the r-th success, X ∈ {0, 1, 2, ...}. P(X = k) = C(k+r-1, r-1) × p^r × (1-p)^k. Mean = r(1-p)/p. Variance = r(1-p)/p². With r=1 the negative binomial reduces to the geometric. The variance grows linearly with r, which makes the negative binomial increasingly diffuse as r grows. There is also an alternative "extended" negative binomial parameterization (with shape and dispersion) used in count regression for overdispersed Poisson data — that is the same distribution algebraically but parameterized differently.

Key Points

  • Convention A: trials to r-th success; mean = r/p.
  • Convention B: failures before r-th success; mean = r(1-p)/p.
  • Geometric = negative binomial with r=1.
  • Used in count regression to handle Poisson overdispersion.

4. Worked Example 1: Free Throws Until First Make

A player makes 65% of free throws (p=0.65). What is the probability the first make occurs on the third attempt? Using convention A: P(X=3) = (1-0.65)² × 0.65 = 0.35² × 0.65 = 0.1225 × 0.65 = 0.0796. About 8%. What is the probability of needing MORE than 5 attempts to get the first make? P(X > 5) = (1-p)^5 = 0.35^5 = 0.00525. About 0.5% — the player will almost certainly make at least one in five attempts. Expected number of attempts = 1/p = 1/0.65 ≈ 1.54 attempts. So on average the first make comes on the second attempt.

Key Points

  • Use convention A (trials including the success) when the question asks "what trial number."
  • P(X > k) = (1-p)^k — convenient closed form.
  • Mean = 1/p translates "65% success" into "about 1.5 attempts on average."

5. Worked Example 2: Manufacturing — Time to 5th Defective Unit

A manufacturing line produces units with defect probability p=0.02. You want to know the distribution of the number of inspections needed to find 5 defective units. Negative binomial with r=5, p=0.02. Expected number of inspections = r/p = 5/0.02 = 250. Variance = r(1-p)/p² = 5 × 0.98 / 0.0004 = 12,250. Standard deviation = sqrt(12,250) ≈ 110.7. So you expect to inspect about 250 units before finding 5 defects, with one standard deviation of about 111 units of uncertainty. What is the probability of finding the 5th defective unit on exactly the 200th inspection? P(X=200) = C(199, 4) × 0.02^5 × 0.98^195. C(199, 4) = 64,684,950; 0.02^5 = 3.2×10^-9; 0.98^195 ≈ 0.0196. Product ≈ 0.004, about 0.4%. The narrow probability on any single specific value reflects how spread out the distribution is when r is small and p is small.

Key Points

  • Negative binomial mean r/p scales linearly with r.
  • Variance r(1-p)/p² is very large for small p, making predictions diffuse.
  • Quality-control planning uses negative binomial to schedule inspections.

6. Relationship to the Binomial

Binomial asks "in n fixed trials, how many successes did I get?" Negative binomial asks "to get r successes, how many trials did it take?" The roles of count and trial number are swapped. Both share the Bernoulli setup of independent trials with constant p. The binomial has fixed n and random count; the negative binomial has fixed count r and random trial number. This duality means problems can sometimes be solved by either depending on what is fixed and what is random.

Key Points

  • Binomial: fixed n trials, random success count.
  • Negative binomial: fixed r successes, random number of trials.
  • Same underlying Bernoulli process; different "what is fixed" choice.

7. Common Confusions and Fixes

Confusion 1: which convention is in use? Always confirm whether your formula or software counts trials or failures. Confusion 2: independence and constant p. If trials are not independent (sampling without replacement) or p changes (learning curve, fatigue), the geometric/negative binomial assumptions fail and you may need the hypergeometric or other alternatives. Confusion 3: confusing "first success" (geometric) with "exactly one success in n trials" (binomial). These are distinct events with different probabilities. Confusion 4: in count regression, "negative binomial" is parameterized by mean μ and dispersion k, and the link to the trials/successes formulation can be obscured. The distribution is the same algebraically.

Key Points

  • Always confirm trial-counting vs failure-counting convention.
  • Independence and constant p are required assumptions.
  • Distinguish "first success" from "any one success" carefully.

8. Running These Problems in StatsIQ

Snap a photo of any waiting-for-success problem and StatsIQ identifies whether the geometric or negative binomial applies, picks the convention that matches the question (trials vs failures), computes the requested probability or moment, and walks through the C(k-1, r-1) × p^r × (1-p)^(k-r) calculation step by step. The app also flags when independence or constant-p assumptions look implausible.

Key Points

  • Automatic convention detection based on question phrasing.
  • Step-by-step combinatorial calculation shown.
  • Assumption checks for independence and constant probability.

Key Takeaways

  • Geometric (trials until first success): P(X=k) = (1-p)^(k-1) × p; mean = 1/p; variance = (1-p)/p².
  • Negative binomial (trials until r-th success): P(X=k) = C(k-1, r-1) × p^r × (1-p)^(k-r); mean = r/p; variance = r(1-p)/p².
  • Geometric = negative binomial with r=1.
  • Both assume independent Bernoulli trials with constant p.
  • Two conventions (trial count vs failure count) — always confirm which is in use.

Practice Questions

1. A baseball player gets a hit on 30% of at-bats. What is the probability his first hit comes on the 4th at-bat?
Geometric, convention A. P(X=4) = (0.7)³ × 0.3 = 0.343 × 0.3 = 0.1029. About 10%.
2. You roll a die repeatedly. What is the expected number of rolls to get the third six?
Negative binomial with r=3, p=1/6. Mean = r/p = 3/(1/6) = 18 rolls expected.
3. A test has a 95% pass rate. What is the probability the third failure happens by the 50th candidate?
Negative binomial with r=3, p=0.05 (treating failure as the "success" we are counting). Expected trials to 3rd failure = 3/0.05 = 60. P(X ≤ 50) requires summing the PMF or using a software CDF; the answer is approximately 0.32, meaning a 32% chance of seeing 3 failures within the first 50 candidates.

Study with AI

Get personalized help and instant answers anytime.

Download StatsIQ

FAQs

Common questions about this topic

When your count data is overdispersed — variance exceeds the mean, which violates the Poisson assumption that variance equals the mean. The negative binomial introduces a dispersion parameter that lets variance exceed the mean, providing better fit for many real-world count datasets (medical visits, animal counts, accident data). Test for overdispersion with a likelihood ratio test comparing Poisson and negative binomial fits.

Yes — the geometric is the unique discrete memoryless distribution, analogous to the exponential in continuous time. P(X > s + t | X > s) = P(X > t). The probability of needing additional trials does not depend on how many trials you have already conducted without success.

Look for the word that follows "until" or "to get." If it is "the first" — geometric. If it is "the r-th" or "the 5th" — negative binomial. If the question is "how many successes in n trials" with n fixed — binomial, not either of the waiting-for-success distributions.

No — the negative binomial is discrete. The continuous analog of waiting for the r-th event in a Poisson process is the gamma distribution. The geometric extends to the exponential in continuous time; the negative binomial extends to the gamma. The discrete/continuous parallel is exact.

As r → ∞ and p → 1 with r(1-p)/p held constant equal to λ, the negative binomial converges to a Poisson with mean λ. In practice the negative binomial generalizes the Poisson by allowing variance to exceed the mean, making it the workhorse distribution for overdispersed count data in regression modeling.

Snap a photo of any waiting-for-success problem and StatsIQ identifies whether geometric, negative binomial, or another distribution applies, sets up the PMF and moment calculations under the convention matching the problem, and shows the step-by-step arithmetic. The app also catches common mistakes like applying geometric when the trials are not independent or when p changes across trials. This content is for educational purposes only.

Related Study Guides

Browse All Study Guides