🎲
fundamentalsintermediate50-65 minutes

Probability Distributions: The Complete Guide With Decision Tree

A pillar guide to the most important probability distributions in applied statistics — Bernoulli, binomial, Poisson, geometric, normal, t, chi-square, F, uniform, exponential, beta — with worked probability calculations, a decision tree for choosing the right distribution, and a Central Limit Theorem walkthrough that ties them all together.

What You'll Learn

  • Distinguish discrete from continuous probability distributions and identify the parameters that define each
  • Compute probabilities for binomial, Poisson, normal, t, chi-square, and F distributions
  • Apply the Central Limit Theorem to convert sample means into approximately normal distributions
  • Use a distribution decision tree to select the right model from a problem description
  • Recognize when distributions approximate one another (normal-to-binomial, Poisson-to-binomial)

1. Direct Answer: What a Probability Distribution Is

A probability distribution is a mathematical function describing the probability of each possible outcome of a random variable. Discrete distributions (binomial, Poisson, geometric) describe outcomes that take countable values such as integers; continuous distributions (normal, t, chi-square, F, uniform, exponential) describe outcomes on a continuum and use probability density functions where probability is measured by area under the curve. Every distribution is defined by parameters (e.g., the binomial by n and p, the normal by μ and σ) and has known formulas for the mean, variance, and key probabilities. Choosing the right distribution comes down to four questions: discrete or continuous, how many possible outcomes per trial, are trials independent, and what is the underlying generative process. The Central Limit Theorem (CLT) provides a powerful unifying result: regardless of the underlying distribution, the sample mean of a sufficiently large sample is approximately normally distributed — which is why the normal distribution shows up everywhere in inferential statistics.

Key Points

  • Discrete distributions: outcomes take countable values; use probability mass functions (PMFs)
  • Continuous distributions: outcomes on a continuum; use probability density functions (PDFs)
  • Each distribution defined by its parameters (e.g., binomial: n, p; normal: μ, σ; Poisson: λ)
  • Central Limit Theorem: sample means become approximately normal as n grows, regardless of underlying distribution
  • Choice of distribution: discrete/continuous? Trials independent? What is the generative process?

2. Discrete Distribution 1: Binomial (Worked Example)

The binomial distribution models the number of successes in n independent Bernoulli trials, each with success probability p. Use it when each trial has exactly two outcomes (success/failure), the number of trials is fixed, and trials are independent. P(X = k) = C(n, k) × p^k × (1 − p)^(n − k) Mean = np; Variance = np(1 − p). Worked Example. A factory produces parts where 8% are defective. In a random sample of 20 parts, what is the probability that exactly 3 are defective? P(X = 3) = C(20, 3) × 0.08^3 × 0.92^17 C(20, 3) = 20! / (3! × 17!) = 1,140 P(X = 3) = 1,140 × 0.000512 × 0.2452 = 0.1431 Probability of exactly 3 defective parts is about 14.3%. Expected number = 20 × 0.08 = 1.6 defects. For "at least 3" or "at most 3" calculations, sum the relevant probability mass values. Modern practice uses statistical software (Python scipy.stats.binom, R dbinom, Excel BINOM.DIST); the formula is rarely computed by hand for n > 5.

Key Points

  • Use binomial when n is fixed, trials independent, two outcomes per trial, constant p
  • P(X = k) = C(n, k) × p^k × (1 − p)^(n − k)
  • Mean = np; Variance = np(1 − p)
  • Common applications: defect rates, survey responses, A/B-testing conversions
  • When np > 10 and n(1 − p) > 10, normal approximation works well

3. Discrete Distribution 2: Poisson (Worked Example)

The Poisson distribution models the number of events in a fixed interval of time or space when events occur independently and at a constant average rate λ. P(X = k) = (λ^k × e^(−λ)) / k! Mean = λ; Variance = λ. Worked Example. A call center receives an average of 6 calls per hour. What is the probability of receiving exactly 4 calls in a given hour? P(X = 4) = (6^4 × e^(−6)) / 4! = (1,296 × 0.002479) / 24 = 3.213 / 24 = 0.1339 Probability of exactly 4 calls is about 13.4%. P(X ≥ 4) requires summing P(X = 4) + P(X = 5) + ... — done with software in practice. The Poisson is also a limiting form of the binomial: when n is very large and p is very small, the binomial approaches Poisson with λ = np. This approximation is useful for rare-event modeling — defect rates in millions of parts, hospital readmission rates, equipment failure intervals, accident counts on highway segments. Key distinguishing feature: Poisson has Mean = Variance. If sample data shows variance much larger than mean (over-dispersion), Poisson is the wrong model — switch to negative binomial which allows variance > mean.

Key Points

  • Use Poisson for counts of events in a fixed interval, constant rate, independent occurrences
  • P(X = k) = (λ^k × e^(−λ)) / k!
  • Mean = Variance = λ — distinguishing feature
  • Approximates binomial when n is large and p is small (λ = np)
  • Variance >> mean (over-dispersion) → switch to negative binomial

4. Continuous Distribution 1: Normal (Worked Example)

The normal distribution is the most important continuous distribution in statistics. It is symmetric, bell-shaped, and completely defined by its mean μ and standard deviation σ. The standard normal Z has μ = 0, σ = 1. Key properties (the empirical rule): - 68% of values within ±1σ - 95% of values within ±1.96σ (rounded to ±2σ) - 99.7% of values within ±3σ Probabilities are computed by converting to Z-scores: Z = (X − μ) / σ, then looking up P(Z < z) in a standard normal table or computing with software (Python scipy.stats.norm.cdf, R pnorm, Excel NORM.DIST). Worked Example. SAT scores are approximately normal with μ = 1,050 and σ = 200. What is the probability a randomly selected student scores above 1,300? Z = (1,300 − 1,050) / 200 = 250 / 200 = 1.25 P(Z > 1.25) = 1 − P(Z < 1.25) = 1 − 0.8944 = 0.1056 About 10.6% of students score above 1,300. The normal distribution is the asymptotic limit of many statistics by the Central Limit Theorem. Sample means, sample proportions, and many estimators become approximately normal as sample size grows, even when the underlying data is not normal. This is why z-tests and t-tests are robust at large sample sizes regardless of data distribution.

Key Points

  • Normal distribution: symmetric, bell-shaped, defined by μ and σ
  • Empirical rule: 68% within 1σ, 95% within 2σ, 99.7% within 3σ
  • Z = (X − μ) / σ converts to standard normal for table lookup
  • CLT makes normal central to inferential statistics — sample means become normal as n grows
  • Approximation reasonable when n > 30 (rule of thumb); larger n needed for severely skewed populations

5. Continuous Distributions 2-4: t, Chi-Square, F (Brief Worked Examples)

Three more continuous distributions, each tied to specific test families. Student's t-distribution. Similar to normal but with heavier tails; used when σ is unknown and estimated by s. Defined by degrees of freedom (df). As df grows, t approaches Z. For df = 24 (n = 25), the two-tailed critical value at alpha = 0.05 is t = 2.064 (vs Z = 1.96 for the normal). Chi-square distribution. Sum of squared standard normals; right-skewed; defined by df. Used in chi-square tests of independence, goodness-of-fit, and variance tests. For df = 5, the upper 5% critical value is χ² = 11.07. F-distribution. Ratio of two scaled chi-squares; right-skewed; defined by two df parameters (numerator and denominator). Used in ANOVA and regression overall-fit tests. For df = (3, 30), the upper 5% critical value is F = 2.92. These three are the workhorses of inferential statistics. Their critical values are tabulated in every statistics textbook (and computed instantly by software). Relationships among them: a t-statistic is approximately Z for large df; t² with df_t = F with df = (1, df_t); a chi-square divided by its df becomes an F-statistic with infinite denominator df. Other continuous distributions worth knowing: Uniform (equal probability over an interval; used in random-number generation and Monte Carlo); Exponential (time between Poisson events; right-skewed; memoryless property); Gamma (generalization of exponential; sum of exponentials); Beta (bounded between 0 and 1; used for proportions, prior distributions in Bayesian inference); Log-normal (right-skewed, takes only positive values; used for income, stock prices, response times).

Key Points

  • t-distribution: heavier tails than normal; converges to normal as df → ∞
  • Chi-square: sum of squared standard normals; right-skewed; df parameter
  • F: ratio of two chi-squares; two df parameters; used in ANOVA
  • Relationships: t² = F(1, df); t → Z as df → ∞
  • Other continuous: uniform, exponential, gamma, beta, lognormal — each fits a specific generative process

6. Distribution Decision Tree: Choosing the Right Model

A practical decision tree for selecting the right distribution from a problem description. Is the random variable discrete or continuous? Discrete: - Two outcomes per trial, fixed n, independent → Binomial - Two outcomes per trial, n until first success → Geometric - Two outcomes per trial, n until k-th success → Negative Binomial - Counts in fixed interval, constant rate → Poisson - Sampling without replacement from finite population → Hypergeometric - Equal probability over a fixed range of integers → Discrete Uniform Continuous: - Symmetric, bell-shaped, real-valued → Normal - Like normal but heavier tails (σ unknown) → t-distribution - Sum of squared normals, right-skewed, non-negative → Chi-square - Ratio of variances → F - Equal probability over an interval → Uniform - Time between Poisson events, right-skewed → Exponential - Generalization of exponential → Gamma - Bounded between 0 and 1 → Beta - Right-skewed positive (income, response times) → Log-normal Branch 1 (discrete vs continuous) is usually obvious from the problem context. Branch 2 (which discrete/continuous) requires identifying the generative process. Common rookie mistake: using normal for skewed or bounded variables (income, response times, proportions). Better: lognormal for income, gamma for response times, beta for proportions.

Key Points

  • Discrete vs continuous branch: clear from data type
  • Within branch: identify generative process (independent trials? rate of events? sum of squared normals?)
  • Common error: applying normal to bounded or skewed data
  • Income → lognormal; response times → gamma; proportions → beta
  • Sample means → normal (CLT) regardless of underlying distribution at large n

7. The Central Limit Theorem (CLT)

The Central Limit Theorem states that the sampling distribution of the sample mean approaches a normal distribution as sample size grows, regardless of the underlying population distribution. Specifically: for a population with mean μ and standard deviation σ, the sample mean x̄ from a sample of size n has approximate distribution N(μ, σ²/n) for sufficiently large n. The conventional rule of thumb is n ≥ 30, though strongly skewed populations may require larger samples. Worked Example. Daily call durations at a call center have mean μ = 6 minutes and σ = 4 minutes (right-skewed; not normally distributed). What is the probability that a sample of 50 calls has mean duration above 7 minutes? Sampling distribution of x̄: N(6, 4²/50) = N(6, 0.32) → SE = √0.32 = 0.566 Z = (7 − 6) / 0.566 = 1.768 P(x̄ > 7) = P(Z > 1.768) = 1 − 0.9615 = 0.0385 About 3.9% chance of observing a sample mean above 7 minutes, even though individual calls are not normally distributed. CLT is what enables most of inferential statistics. The t-test, z-test, and ANOVA all rely on the sampling distribution of estimators being approximately normal. CLT also justifies bootstrap methods (which resample and compute means) — the bootstrap distribution of the mean converges to normal even for non-normal data.

Key Points

  • CLT: sample mean is approximately N(μ, σ²/n) for n large enough, regardless of underlying distribution
  • Rule of thumb: n ≥ 30; more for severely skewed populations
  • Standard error = σ / √n decreases with sample size
  • CLT enables t-tests, z-tests, ANOVA, and bootstrap inference
  • n ≥ 30 is conventional but not magic — always check the underlying distribution shape

8. How StatsIQ Helps With Probability Distributions

Probability distribution problems are core to introductory and intermediate statistics, the AP Statistics curriculum, and many engineering, finance, and biostatistics applications. The decision tree (which distribution to use) and parameter identification (which formula gets which numbers) trip up most students. Snap a photo of any probability problem and StatsIQ identifies the correct distribution, plugs in the parameters, computes the requested probability, and visualizes the relevant area under the curve. For Central Limit Theorem problems, StatsIQ produces both the underlying population distribution and the sampling distribution side by side. This content is for educational purposes only and does not constitute statistical advice.

Key Points

  • Identifies the correct distribution from problem description
  • Plugs in parameters (n, p for binomial; λ for Poisson; μ, σ for normal)
  • Computes exact and cumulative probabilities
  • Visualizes area under curve for normal, t, chi-square, F problems
  • Walks through CLT problems with both population and sampling distribution

9. Common Mistakes to Avoid

Five errors recur. First, applying the normal distribution to skewed or bounded data — income, response times, proportions all need different distributions. Second, confusing P(X = k) with P(X ≤ k) or P(X ≥ k). Discrete distributions have specific values; continuous distributions only assign probabilities to ranges. Third, forgetting the n condition for Central Limit Theorem. Small samples from skewed populations remain skewed even at sample-mean level. Fourth, treating the sample standard deviation s as if it were the population σ when computing z-scores. Use t-distribution when σ is estimated by s. Fifth, applying Poisson when the variance differs sharply from the mean. Variance >> mean (over-dispersion) signals negative binomial; variance << mean signals different model entirely.

Key Points

  • Normal applies only to symmetric, unbounded data — check shape first
  • P(X = k) is exact for discrete; continuous distributions need ranges P(a < X < b)
  • CLT requires sufficient n — more for skewed populations
  • When σ is estimated by s, use t-distribution not z
  • Poisson requires Variance ≈ Mean — over-dispersion → negative binomial

Key Takeaways

  • Discrete distributions: binomial, Poisson, geometric, negative binomial, hypergeometric
  • Continuous distributions: normal, t, chi-square, F, uniform, exponential, gamma, beta, lognormal
  • Binomial: P(X = k) = C(n, k) × p^k × (1 − p)^(n − k); mean = np, variance = np(1 − p)
  • Poisson: P(X = k) = (λ^k × e^(−λ)) / k!; mean = variance = λ
  • Normal: 68% within ±1σ, 95% within ±1.96σ, 99.7% within ±3σ
  • Z-score: Z = (X − μ) / σ for converting to standard normal
  • Central Limit Theorem: sample means are approximately normal at n ≥ 30 regardless of underlying distribution
  • t-distribution converges to normal as df → ∞; relationship t² = F(1, df_t)
  • Chi-square: sum of squared standard normals; right-skewed; df parameter
  • F: ratio of two scaled chi-squares; used in ANOVA
  • Poisson approximates binomial when n large, p small (λ = np)
  • Normal approximates binomial when np > 10 and n(1 − p) > 10

Practice Questions

1. A fair coin is flipped 10 times. What is the probability of exactly 6 heads?
Binomial with n = 10, p = 0.5. P(X = 6) = C(10, 6) × 0.5^6 × 0.5^4 = 210 × 0.015625 × 0.0625 = 0.2051. About 20.5%.
2. A hospital ER receives an average of 4 trauma cases per hour. What is the probability of exactly 2 in a given hour?
Poisson with λ = 4. P(X = 2) = (4² × e^(−4)) / 2! = (16 × 0.01832) / 2 = 0.1465. About 14.7%.
3. IQ scores are normally distributed with μ = 100 and σ = 15. What is the probability of a score between 85 and 115?
Z-scores: (85 − 100)/15 = −1, (115 − 100)/15 = +1. P(−1 < Z < 1) = 0.8413 − 0.1587 = 0.6826. About 68% (the empirical rule).
4. Daily prices follow a non-normal right-skewed distribution with μ = 50 and σ = 12. For a sample of n = 36 days, what is the probability the sample mean exceeds 53?
CLT: x̄ is approximately N(50, 12²/36) = N(50, 4). SE = 2. Z = (53 − 50) / 2 = 1.5. P(Z > 1.5) = 0.0668. About 6.7%. CLT rescues us despite the skew.
5. Why is the Poisson distribution inappropriate when sample variance is twice the sample mean?
The Poisson distribution requires Variance = Mean (= λ). When variance is much larger than mean (over-dispersion), the data does not fit Poisson — switch to a negative binomial distribution which allows variance > mean via a separate shape parameter.
6. A test statistic follows a t-distribution with df = 15. The two-tailed critical value at alpha = 0.05 is 2.131. What would the critical value approach as df → ∞?
As df → ∞, the t-distribution approaches the standard normal Z. The two-tailed alpha = 0.05 critical value for Z is 1.96. So the t-critical (2.131) is slightly larger than Z (1.96) at finite df because of t's heavier tails.

Study with AI

Get personalized help and instant answers anytime.

Download StatsIQ

FAQs

Common questions about this topic

Start with two questions: is the variable discrete (countable) or continuous (real-valued)? Then identify the generative process. Discrete with two outcomes and fixed n → binomial. Discrete counts of events in a fixed interval → Poisson. Continuous, symmetric, unbounded → normal. Continuous, σ unknown → t. Sums of squared normals → chi-square. Ratios of variances → F. Bounded between 0 and 1 → beta. The decision tree in this guide covers the most common cases.

When the variable is bounded (must be positive, must be between 0 and 1, must be an integer), when the distribution is severely skewed, or when the variable has a known generative process that produces a different distribution. Income is right-skewed (use lognormal). Proportions are bounded between 0 and 1 (use beta). Response times are right-skewed (use gamma or lognormal). Counts are integers (use Poisson or negative binomial). Forcing normal onto inappropriate data produces wrong probabilities.

The CLT says that as sample size grows, the sampling distribution of the sample mean approaches a normal distribution, regardless of the shape of the underlying population. Specifically, for a population with mean μ and standard deviation σ, the sample mean x̄ has approximate distribution N(μ, σ²/n) for n large enough. The convention n ≥ 30 is a rule of thumb; severely skewed populations may need larger samples. CLT is what makes inferential statistics work — without it, we could not use z and t tests on non-normal data.

A probability mass function (PMF) is for discrete distributions and gives P(X = k) for each possible value k. The total probability mass across all k must sum to 1. A probability density function (PDF) is for continuous distributions and gives a relative likelihood at each point — but P(X = exact value) = 0 for continuous, so we always work with intervals: P(a < X < b) is the area under the PDF between a and b. The total area under the PDF must integrate to 1.

When the binomial has n large and p small with λ = np kept constant. As a rule of thumb, the approximation works well when n ≥ 50 and p ≤ 0.05 (so np ≤ 5). The approximation is useful for rare-event modeling: defect counts in millions of parts, equipment failures across long horizons, accident counts in highway segments.

When np > 10 AND n(1 − p) > 10. The continuity correction (adding or subtracting 0.5 to discrete values) sharpens the approximation. For example, P(X ≤ 5) in a binomial is approximated by P(Y < 5.5) in a normal with the same mean and variance. Modern practice uses exact binomial with software when possible — the approximation is mostly a teaching tool now.

A t-distribution with df is the ratio Z / √(χ²/df), where Z is standard normal and χ² is independent chi-square. This is why t has heavier tails than Z — the chi-square denominator can take small values that inflate the ratio. An F distribution with df = (df1, df2) is the ratio (χ²₁/df1) / (χ²₂/df2). Squaring a t with df_t produces an F with (1, df_t). And as df → ∞, t → Z and chi-square/df → 1 (by the law of large numbers).

Yes. Snap a photo of any probability problem and StatsIQ identifies the correct distribution, plugs in the parameters, computes the requested probability or critical value, and visualizes the area under the curve. For Central Limit Theorem problems, StatsIQ produces the population and sampling distributions side by side. This content is for educational purposes only and does not constitute statistical advice.

Related Study Guides

Browse All Study Guides