📉
fundamentalsintermediate25 min

Poisson Distribution: Formula, When to Use, and Worked Examples for Students

A complete guide to the Poisson distribution — covering the formula, the conditions that make a Poisson distribution appropriate, worked examples for counts of events in time/space, and the relationship to the binomial distribution.

What You'll Learn

  • State the Poisson distribution formula and identify its parameters
  • Recognize the conditions under which a Poisson distribution is appropriate
  • Calculate Poisson probabilities for counts of events in time or space
  • Understand the relationship between the Poisson and binomial distributions

1. The Direct Answer: Poisson Counts Rare Events in Fixed Intervals

The Poisson distribution describes the probability of a given number of events occurring in a fixed interval of time or space, when events occur independently at a constant average rate. **Formula**: P(X = k) = (λ^k × e^(-λ)) / k! Where: - **λ (lambda)** = the average number of events in the interval (the rate parameter) - **k** = the specific number of events you're calculating the probability for - **e** ≈ 2.71828 (Euler's number, base of natural log) - **k!** = k factorial (k × (k-1) × (k-2) × ... × 1) **Conditions for using Poisson** (all four must be met): 1. **Events occur independently** — one event doesn't affect the probability of another 2. **Events occur at a constant average rate** — λ is stable over the interval 3. **Two events can't occur at exactly the same instant** 4. **The number of events in disjoint intervals is independent** **Typical Poisson applications**: - Number of calls to a call center per hour - Number of customers arriving at a store per 15 minutes - Number of defects per square meter of fabric - Number of emails per day - Number of accidents at an intersection per month - Number of radioactive decays per second - Number of goals scored in a soccer match (roughly Poisson) - Number of mutations per DNA sequence length **Parameters of the Poisson distribution**: - **Mean**: μ = λ - **Variance**: σ² = λ (mean equals variance — this is the signature property of the Poisson) - **Standard deviation**: σ = √λ **The 'mean equals variance' property** is diagnostic. If your data's mean approximately equals its variance, Poisson is likely appropriate. If the variance is significantly larger than the mean, your data may be over-dispersed and require a negative binomial distribution. If variance is smaller than mean, the data may be under-dispersed. Snap a photo of any Poisson distribution problem and StatsIQ identifies the parameters, applies the formula, and computes probabilities step by step. This content is for educational purposes only.

Key Points

  • P(X = k) = (λ^k × e^(-λ)) / k!. Memorize this formula.
  • λ = average rate of events per interval. Use the rate appropriate to your interval.
  • Mean = variance = λ. This is the diagnostic property of the Poisson distribution.
  • 4 conditions: independent events, constant rate, no simultaneous events, independence across intervals.

2. Worked Examples: Standard Poisson Problems

Let's work through the most common Poisson problem types. **Example 1: Exactly k events** A call center receives an average of 12 calls per hour. What is the probability that exactly 10 calls arrive in the next hour? Given: λ = 12, k = 10 P(X = 10) = (12^10 × e^(-12)) / 10! = (61,917,364,224 × 0.00000614) / 3,628,800 = 380,071 / 3,628,800 ≈ 0.1048 or 10.48% **Using a Poisson table or calculator**: most problems expect you to use tables or a calculator rather than compute e^(-12) by hand. The table provides P(X = k) for various λ and k values. **Example 2: At most k events (cumulative)** The same call center — what is the probability that AT MOST 8 calls arrive in the next hour? P(X ≤ 8) = P(X = 0) + P(X = 1) + P(X = 2) + ... + P(X = 8) For λ = 12: this is a cumulative probability. You'd sum individual Poisson probabilities from k=0 to k=8, or use a cumulative Poisson table/function. Using cumulative Poisson with λ = 12: P(X ≤ 8) ≈ 0.1550 or 15.50% **Example 3: At least k events** What is the probability that AT LEAST 15 calls arrive in the next hour? P(X ≥ 15) = 1 - P(X ≤ 14) Using cumulative Poisson with λ = 12: P(X ≤ 14) ≈ 0.7720 So P(X ≥ 15) = 1 - 0.7720 = 0.2280 or 22.80% **Example 4: Events in a different interval** Same call center receives 12 calls per hour on average. What is the probability of exactly 5 calls in the next 30 minutes? Key insight: you must adjust λ for the different interval. If rate is 12 per hour, then rate for 30 minutes = 12 × (30/60) = 6. So new λ = 6, k = 5. P(X = 5) = (6^5 × e^(-6)) / 5! = (7,776 × 0.002479) / 120 = 19.28 / 120 ≈ 0.1606 or 16.06% **Example 5: Multiple independent intervals** Same call center. What is the probability of exactly 12 calls in the first hour AND exactly 10 calls in the second hour (independent hours)? Because the two intervals are independent: P(first hour = 12 AND second hour = 10) = P(first = 12) × P(second = 10) P(first = 12) for λ = 12: from Poisson table ≈ 0.1144 P(second = 10) for λ = 12: from Poisson table ≈ 0.1048 Combined = 0.1144 × 0.1048 ≈ 0.0120 or 1.20% **Common Poisson calculation traps**: 1. **Forgetting to adjust λ for the interval**: if rate is per hour but you're asked about 30 minutes, use λ/2. If asked about 2 hours, use 2λ. 2. **Confusing 'at least' and 'at most'**: 'at least 5' means P(X ≥ 5) = 1 - P(X ≤ 4). 'At most 5' means P(X ≤ 5). 3. **Using Poisson when conditions aren't met**: events that aren't independent (e.g., demand surges during lunch time — rate isn't constant) don't fit Poisson. 4. **Including k = 0 incorrectly**: P(X ≥ 1) = 1 - P(X = 0) = 1 - e^(-λ). This is a useful shortcut. **Poisson table vs formula**: For problems with λ ≤ 20, use the formula or a Poisson table. For larger λ, the Poisson approaches the normal distribution, and you can use normal approximation with μ = λ and σ² = λ. StatsIQ handles Poisson calculations including interval adjustments, cumulative probabilities, and normal approximation for large λ.

Key Points

  • Exactly k events: P(X = k) = (λ^k × e^(-λ)) / k!
  • At most k: cumulative sum from 0 to k. Use Poisson tables.
  • At least k: 1 - P(X ≤ k-1). Especially useful for P(X ≥ 1) = 1 - e^(-λ).
  • Always adjust λ for the interval in the question. Rate × interval = correct λ.

3. Relationship to the Binomial Distribution

The Poisson distribution has a crucial relationship to the binomial distribution. Understanding this connection helps you see when Poisson is appropriate and when you should use binomial. **The binomial distribution** describes k successes in n independent trials, each with probability p of success: P(X = k) = C(n, k) × p^k × (1-p)^(n-k) Where C(n, k) = n! / (k! × (n-k)!) **When binomial becomes Poisson**: When n is large and p is small (events are rare), the binomial distribution is approximated by the Poisson distribution with λ = n × p. Rule of thumb for the approximation: - n ≥ 20 AND p ≤ 0.05 → Poisson is a good approximation - n ≥ 100 AND p ≤ 0.10 → Poisson is even better - n × p ≤ 10 → Poisson works regardless of n **Why the approximation works**: Imagine a binomial with n trials and probability p. Set λ = np (the expected number of successes). As n gets large and p gets small (keeping np constant), the binomial PMF converges to the Poisson PMF. **Practical example**: A factory has a defect rate of 0.2% per unit. In a batch of 500 units, what is the probability of exactly 2 defects? **Binomial approach**: n = 500, p = 0.002, k = 2 P(X = 2) = C(500, 2) × 0.002^2 × 0.998^498 = 124,750 × 0.000004 × 0.3693 = 0.1842 **Poisson approximation**: λ = np = 500 × 0.002 = 1 k = 2 P(X = 2) = (1^2 × e^(-1)) / 2! = (1 × 0.3679) / 2 = 0.1839 The two answers are virtually identical (0.1842 vs 0.1839). The Poisson is easier to compute and doesn't require large factorials or small numbers to large powers. **When Poisson is the natural choice (not just an approximation)**: Some processes are inherently Poisson — not approximations of binomials: - **Continuous time processes**: if events can occur at any instant (not fixed 'trials'), the underlying probability model is typically Poisson. Example: number of earthquake aftershocks in a time window. - **Rate-based processes**: phenomena described by a rate per unit time or space, where the 'trials' aren't discrete. Example: number of typing errors per page. **When to use each**: | Situation | Use | |-----------|-----| | Fixed number of trials, count successes | Binomial | | Count events in a time/space interval | Poisson | | Binomial with small p, large n | Poisson approximation | | Events at constant rate | Poisson | | Independent trials with fixed p | Binomial | **Poisson vs geometric vs exponential**: - **Poisson**: COUNT of events in a fixed interval - **Geometric**: NUMBER of trials until the first success - **Exponential**: TIME between consecutive events All three are related — they describe different aspects of the same underlying random process. If events occur as a Poisson process with rate λ: - Number of events per unit interval: Poisson(λ) - Time to next event: Exponential(1/λ) - Number of intervals until first event: Geometric(probability per interval) **The 'memoryless' property**: The exponential distribution (time between Poisson events) has the memoryless property: given that no event has occurred by time t, the distribution of the additional time until the next event is the same as the original distribution. This means Poisson processes have no 'memory' of past events. **Common confusion: Poisson vs binomial in the same problem**: If the problem says 'out of 500 attempts, what's the probability of 3 failures?' — this is binomial (fixed n = 500). If the problem says 'how many failures occur per day on average 3, what's the probability of exactly 5 failures tomorrow?' — this is Poisson (rate-based, no fixed n). Read the problem carefully to identify which structure applies. StatsIQ identifies the correct distribution from the problem structure and applies the appropriate formula with full calculation steps.

Key Points

  • Poisson approximates binomial when n is large and p is small. λ = np.
  • Rule of thumb: Poisson works when n ≥ 20 and p ≤ 0.05, or when np ≤ 10.
  • Poisson is the natural model for rate-based processes (events per time/space).
  • Binomial: fixed trials. Poisson: continuous rate-based events.

4. Applications and Diagnostic Checks

Beyond basic probability calculations, the Poisson distribution has specific applications in real-world problems and statistical testing. **Application 1: Modeling event rates** **Call center staffing**: if calls arrive as a Poisson process with rate 30/hour, staffing decisions are based on Poisson probabilities: - Probability that more than 40 calls arrive = probability of being understaffed - Probability that fewer than 20 arrive = probability of having idle staff Staffing is optimized to balance the costs of understaffing vs overstaffing. **Quality control**: if defect rate is 2 per 100 units, inspecting a batch of 50 units: - Expected defects: 1 (λ = 2 × 50/100) - Probability of 0 defects: e^(-1) ≈ 0.368 - Probability of 3+ defects: 1 - P(X ≤ 2) ≈ 0.080 This helps set quality thresholds. **Epidemiology**: rare disease incidence. If a disease occurs at rate 1 per 10,000 people per year, in a population of 50,000: - Expected annual cases: 5 (λ = 5) - Probability of 10+ cases in a year: 1 - P(X ≤ 9) ≈ 0.032 (3.2%) This is how public health officials assess whether a cluster of cases is suspicious. **Application 2: Testing if data follow a Poisson distribution** You can test whether real data follow a Poisson distribution. The diagnostic tools: 1. **Compare mean to variance**: for Poisson, they should be approximately equal. If variance >> mean, use negative binomial. If variance << mean, consider binomial or geometric. 2. **Chi-square goodness-of-fit test**: - Calculate expected frequencies for each k based on Poisson with estimated λ - Compare to observed frequencies - Chi-square statistic = Σ (O-E)²/E - If test fails to reject, Poisson is a reasonable model 3. **Index of dispersion**: ratio of variance to mean. For Poisson, this is 1. For over-dispersed data (common in biology, insurance), this is > 1. **Application 3: Poisson regression** When you have count data and want to model how it depends on predictors, Poisson regression is the standard approach: log(λ) = β₀ + β₁x₁ + β₂x₂ + ... Examples: - Number of hospital visits as a function of age, health conditions, etc. - Number of species in ecological sites as a function of habitat characteristics - Number of bike trips per day as a function of weather variables Poisson regression is a generalized linear model (GLM). If over-dispersion is detected, use negative binomial regression instead. **Application 4: Spatial analysis** Poisson distributions work for events in space, not just time: - Number of trees per square meter in a forest - Number of bacterial colonies per petri dish - Number of craters per region of the moon - Number of stars per small volume of space **Common data patterns that are NOT Poisson**: 1. **Over-dispersed data**: variance > mean. Often occurs when there's clustering (e.g., disease outbreaks, where one case increases probability of others nearby). Use negative binomial instead. 2. **Under-dispersed data**: variance < mean. Less common but occurs when events are constrained to be regular (e.g., customers arriving at a fixed time each hour). 3. **Zero-inflated data**: more zeros than Poisson would predict. Occurs when there's a 'structural zero' — e.g., number of fishing trips where many people never fish (structural zero) vs those who fish but happened to have zero trips. 4. **Truncated data**: only observing counts > 0 (or < some threshold). Use a truncated Poisson. **Poisson in R and Python**: **R**: ``` dpois(k, lambda) # P(X = k) ppois(k, lambda) # P(X ≤ k) qpois(p, lambda) # quantile rpois(n, lambda) # random samples ``` **Python (scipy)**: ``` from scipy.stats import poisson poisson.pmf(k, lambda_) # P(X = k) poisson.cdf(k, lambda_) # P(X ≤ k) poisson.sf(k, lambda_) # P(X > k) poisson.rvs(lambda_, size=n) # random samples ``` **Mean and variance of sums**: If X ~ Poisson(λ₁) and Y ~ Poisson(λ₂) are independent, then X + Y ~ Poisson(λ₁ + λ₂). This 'summing' property is unique to the Poisson and very useful for combining data from different time periods or locations. Example: if a call center receives calls at rate 12/hour in the morning and 8/hour in the afternoon, total calls over a 2-hour morning + 3-hour afternoon period follow Poisson with λ = 2(12) + 3(8) = 48. StatsIQ handles all Poisson applications including goodness-of-fit tests, regression analysis, and identification of over-dispersion or zero-inflation in real data.

Key Points

  • Diagnostic: mean should equal variance. If variance >> mean, use negative binomial instead.
  • Applications: call center staffing, quality control, epidemiology, ecology, spatial analysis.
  • Poisson regression: log(λ) = β₀ + β₁x₁ + β₂x₂ + ... for count data with predictors.
  • Sum property: independent Poissons with rates λ₁, λ₂ sum to Poisson with rate λ₁ + λ₂.

Key Takeaways

  • Poisson formula: P(X = k) = (λ^k × e^(-λ)) / k!. Memorize cold.
  • Mean = Variance = λ. Diagnostic property of Poisson.
  • Adjust λ for the interval. Rate × interval = correct λ.
  • Poisson approximates binomial when n ≥ 20 and p ≤ 0.05. Use λ = np.
  • Sum of independent Poissons is Poisson with summed rates (unique property).

Practice Questions

1. A hospital emergency room sees an average of 4 trauma cases per 8-hour shift. What is the probability of seeing exactly 6 trauma cases in one shift?
Given: λ = 4 (rate per 8-hour shift), k = 6. P(X = 6) = (4^6 × e^(-4)) / 6! = (4096 × 0.01832) / 720 = 75.03 / 720 ≈ 0.1042 or 10.42%. You could also verify using a Poisson table with λ = 4 and k = 6.
2. A call center receives 20 calls per hour on average. What is the probability that at least 1 call arrives in the next 15 minutes?
First, adjust λ for the 15-minute interval: rate = 20/hour, 15 min = 1/4 hour, so λ = 20 × 0.25 = 5. Then P(X ≥ 1) = 1 - P(X = 0) = 1 - (5^0 × e^(-5)) / 0! = 1 - e^(-5) = 1 - 0.00674 = 0.9933 or 99.33%. With λ = 5, there's almost certainly going to be at least one call in 15 minutes.

Study with AI

Get personalized help and instant answers anytime.

Download StatsIQ

FAQs

Common questions about this topic

Use Poisson directly when λ is small or moderate (up to about 20). For large λ (≥ 30 or so), the Poisson approaches the normal distribution with μ = λ and σ² = λ, and the normal approximation works well. The practical rule: for λ ≤ 20, use Poisson formula or tables. For λ > 20, use normal approximation with continuity correction (subtract 0.5 for lower bound, add 0.5 for upper bound). For very small λ (e.g., 0.5), stick with exact Poisson — normal approximation is poor for such small means.

Yes. Snap a photo of any Poisson problem and StatsIQ identifies the rate parameter λ (adjusting for the interval if needed), applies the formula, and computes the probability. It handles exact, cumulative, and "at least k" formats, as well as related applications like Poisson regression and goodness-of-fit tests. For problems that should use binomial or negative binomial instead, StatsIQ flags the mismatch and suggests the correct distribution.

More Study Guides