Discrete vs Continuous Distributions: How to Choose
A practical guide to choosing between discrete and continuous probability distributions: the conceptual difference, four common discrete distributions, four common continuous distributions, decision criteria for each, and a flowchart for matching data to distribution.
What You'll Learn
- โDistinguish discrete from continuous random variables
- โIdentify four common discrete distributions and their uses
- โIdentify four common continuous distributions and their uses
- โApply decision criteria for distribution choice
- โMatch data scenarios to appropriate distributions
1. The Conceptual Difference
A discrete random variable takes specific separable values โ counts, integers, categories. Examples: number of customer arrivals per hour (0, 1, 2, ...), number of defective products in a batch, number of heads in 10 coin flips. The variable can only assume specific values; there is no continuum. A continuous random variable takes any value within a range. Examples: height in cm (170.5, 170.51, 170.515, ...), time to complete a task, temperature, financial returns. Between any two values, infinitely many intermediate values exist. This distinction drives the probability function form. Discrete distributions have a probability mass function (PMF) โ P(X = k) gives the probability that the variable equals each specific value. Continuous distributions have a probability density function (PDF) โ f(x) gives density, and probabilities are computed as areas under the curve over intervals: P(a < X < b) = integral of f(x) from a to b. P(X = exactly some specific value) = 0 for continuous distributions (the area at a single point is zero). Mixed distributions exist (e.g., a variable that is sometimes zero and sometimes continuous) but are less common in introductory statistics. The discrete/continuous distinction handles the vast majority of practical cases.
Key Points
- โขDiscrete: separable values (counts, integers, categories)
- โขContinuous: any value within a range (heights, times, temperatures)
- โขDiscrete: PMF gives P(X = k) for each specific value
- โขContinuous: PDF gives density; P(a < X < b) = integral of density
- โขP(X = exact value) = 0 for continuous distributions
2. Four Common Discrete Distributions
Binomial. Number of successes in n independent trials with constant probability p. Parameters: n, p. Mean = np. Variance = np(1-p). Use for fixed-sample-size scenarios with binary outcomes: defects in a batch, conversions in a sample, heads in coin flips. Poisson. Number of events in a fixed interval at constant rate. Parameter: lambda. Mean = Variance = lambda. Use for arrival, defect, or count data where events occur independently at a constant rate: customer arrivals per hour, defects per page, calls per minute. Approximates binomial when n is large and p is small (set lambda = np). Geometric. Number of trials until the first success. Parameter: p. Mean = 1/p. Use for "time to first success" scenarios with constant probability: number of customer calls before the first sale, number of submissions before acceptance. Negative Binomial. Number of trials until r successes (or number of failures before r successes). Parameters: r, p. Use when extending geometric to multiple required successes, OR as overdispersion-tolerant alternative to Poisson when variance >> mean.
Key Points
- โขBinomial: number of successes in fixed n trials with constant p
- โขPoisson: count of events in interval at constant rate
- โขGeometric: number of trials until first success
- โขNegative Binomial: number of trials until r successes OR overdispersed Poisson alternative
- โขAll four are countable / integer-valued
3. Four Common Continuous Distributions
Normal (Gaussian). Symmetric bell curve. Parameters: mean (mu), SD (sigma). Use for heights, weights, test scores, measurement errors. Most natural phenomena approximate normal, and Central Limit Theorem produces normal sampling distributions for sample means. Exponential. Time between events in a Poisson process. Parameter: rate (lambda). Mean = 1/lambda. Variance = 1/lambda^2. Use for time-between-arrivals at a service counter, time between failures of components. Memoryless property: P(X > t + s | X > s) = P(X > t) โ the wait time does not depend on how long you have already waited. Log-normal. The log of the variable is normally distributed. Use for variables that are always positive and right-skewed: income, stock prices, file sizes, sound intensity. If Y is log-normal, log(Y) is normal โ many transformations exploit this. Uniform. Equal probability across a range [a, b]. Parameter: a, b. Mean = (a+b)/2. Variance = (b-a)^2 / 12. Use for random number generation, scenarios with no prior preference for any value in a range. Limited real-world use but appears in simulations and Bayesian priors.
Key Points
- โขNormal: symmetric bell, most natural phenomena and CLT
- โขExponential: time between events in Poisson process (memoryless)
- โขLog-normal: positive right-skewed (income, prices, sizes)
- โขUniform: equal probability across a range
- โขAll four are real-valued / continuous on their support
4. Decision Flowchart
Step 1: Is the variable a count or a measurement? Counts = discrete. Measurements = continuous (or treated as continuous for large enough scales). Step 2 (discrete): How are events generated? Fixed trials with binary outcomes โ Binomial. Events in fixed interval at constant rate โ Poisson. Time to first success โ Geometric. Time to r successes OR overdispersed counts โ Negative Binomial. Step 3 (continuous): What is the shape and support? Symmetric, unbounded โ Normal. Positive, right-skewed โ Log-normal. Memoryless time-between-events โ Exponential. Equal probability over a range โ Uniform. Step 4: Run goodness-of-fit checks. Q-Q plots compare observed to theoretical quantiles. Statistical tests (Kolmogorov-Smirnov, Anderson-Darling, Chi-square) test fit. Visual inspection often reveals deviations the eye sees better than tests. Step 5: If no standard distribution fits well, consider transformations (log, sqrt, Box-Cox) or non-parametric methods that do not require distributional assumptions. Permutation tests and bootstrapping are robust alternatives.
Key Points
- โขStep 1: count or measurement?
- โขStep 2 (discrete): match generation mechanism to distribution
- โขStep 3 (continuous): match shape and support to distribution
- โขStep 4: goodness-of-fit checks (Q-Q plot, KS test, visual)
- โขStep 5: transformations or non-parametric methods as fallback
5. How StatsIQ Helps With Distribution Choice
Snap a photo of any data sample and StatsIQ identifies the best-fitting distribution from a library of 15+ common distributions, runs goodness-of-fit tests, produces Q-Q plots, and recommends transformations for poor fits. For exam prep, the app generates problems at all complexity levels including distribution-identification questions. StatsIQ also handles applied problems: given the chosen distribution, compute probabilities, percentiles, and sample-size requirements for various inferential goals. This content is for educational purposes only.
Key Points
- โขIdentifies best-fitting distribution from 15+ candidates
- โขRuns goodness-of-fit tests
- โขProduces Q-Q plots and visual diagnostics
- โขRecommends transformations for poor fits
- โขComputes probabilities under chosen distribution
Key Takeaways
- โ Discrete: separable values (counts, integers)
- โ Continuous: any value in a range (measurements)
- โ Discrete distributions have PMF: P(X = k)
- โ Continuous distributions have PDF: f(x), probabilities via integration
- โ P(X = exact value) = 0 for continuous
- โ Four common discrete: Binomial, Poisson, Geometric, Negative Binomial
- โ Four common continuous: Normal, Exponential, Log-normal, Uniform
- โ Poisson approximates binomial when n large, p small
- โ Exponential is the continuous analog of geometric (memoryless)
- โ Log-normal: log(Y) is normal; positive right-skewed variables
- โ CLT produces normal sampling distributions for sample means
- โ Goodness-of-fit: Q-Q plot, KS test, Anderson-Darling, Chi-square
Practice Questions
1. A company tracks the number of customer service calls per hour. What distribution would you propose?
2. A study measures time between customer arrivals at a service counter. What distribution would you propose?
3. Heights of adult women in a country. What distribution would you propose?
4. Distribution of household income. What distribution would you propose?
5. Number of trials before a basketball player makes a free throw (with constant make probability). What distribution would you propose?
FAQs
Common questions about this topic
Yes, in some cases. A variable might be mostly zero (a discrete value) and continuously distributed when positive. Example: insurance claim amounts โ most policy holders have zero claims, others have continuous claim amounts. This is modeled with a zero-inflated continuous distribution, a mixture model. Software supports these (zero-inflated negative binomial, zero-inflated lognormal, etc.). For introductory statistics, focus on the pure discrete or pure continuous case.
Less than students fear, in many cases. The Central Limit Theorem provides robust normality for sample means with reasonable sample sizes. Many statistical methods are robust to moderate violations. Bootstrap methods are distribution-free. However, with small samples or extreme distributions, the assumption matters more. Always plot the data, run normality tests, and consider transformations or non-parametric alternatives when assumptions clearly fail.
A distribution is the family or shape (normal, Poisson, etc.). Parameters are the specific values that pin down the distribution within its family. Normal with mean = 0 and SD = 1 is different from Normal with mean = 100 and SD = 15 โ same family, different parameters. Estimating parameters from data is what most of statistical inference is about: given the family, what are the most likely parameter values?
You can never know with certainty โ real data rarely matches any theoretical distribution perfectly. Statistical tests (Kolmogorov-Smirnov, Anderson-Darling, Shapiro-Wilk for normal) test whether the data is "close enough" to the proposed distribution. Visual checks (Q-Q plots, histograms) often reveal deviations the tests cannot. The pragmatic approach: choose a distribution that fits reasonably well, run sensitivity analyses with alternatives, and use robust methods when fit is uncertain.
When the distributional assumption is doubtful or untestable due to small sample size. Non-parametric methods (rank tests, permutation tests, bootstrap) make weaker assumptions and produce valid inference under broader conditions. Trade-off: when the parametric assumption is correct, parametric methods are more powerful. When assumptions are violated, non-parametric methods are safer. A common modern workflow is to run both and check for consistency.
Snap a photo of any data sample and StatsIQ identifies the best-fitting distribution from 15+ candidates, runs goodness-of-fit tests, produces Q-Q plots and visual diagnostics, and recommends transformations for poor fits. The app also computes probabilities under the chosen distribution and produces practice problems for exam prep. This content is for educational purposes only.