📈

fundamentalsintermediate25-35 minutes

Central Limit Theorem: Worked Examples and Simulation

A focused cluster guide on the Central Limit Theorem with multiple worked sampling examples at n = 5, 30, and 100 from skewed and uniform populations, demonstrating convergence to normality of the sample mean. Includes the connection to bootstrap methods.

What You'll Learn

✓State the Central Limit Theorem precisely with parameters and conditions
✓Compute sampling distributions of x̄ for various n and underlying distributions
✓Recognize when n is sufficient for CLT to apply (skewed populations need more)
✓Connect CLT to standard error formulas used in t-tests and z-tests
✓Apply CLT to bootstrap inference for non-normal data

1. Direct Answer: What CLT Says

The Central Limit Theorem says that as sample size n grows, the sampling distribution of the sample mean x̄ approaches a normal distribution with mean μ (the population mean) and standard deviation σ/√n (the standard error), regardless of the shape of the underlying population. CLT is the bridge that lets us use normal-based inferential procedures (z-tests, t-tests, confidence intervals) even when the underlying data is not normal. The conventional rule of thumb is n ≥ 30 for most distributions, but severely skewed populations may need n closer to 100; only the underlying shape determines exactly how fast convergence happens.

Key Points

•Sample mean x̄ → Normal(μ, σ²/n) as n grows
•CLT works regardless of underlying distribution shape
•Standard error = σ / √n
•Rule of thumb: n ≥ 30 for moderate skew; larger for extreme skew
•CLT enables z-tests, t-tests, confidence intervals on non-normal data

2. Worked Example 1: Right-Skewed Population (Exponential)

Consider an exponential population (heavily right-skewed) with rate parameter λ = 0.5, so μ = 2 and σ = 2. We draw samples of various sizes and compute the sample mean. At n = 5: Sampling distribution of x̄ has mean = 2 and SE = 2/√5 = 0.894. The shape is still noticeably right-skewed at this small sample size — CLT has not yet kicked in fully. At n = 30: Mean = 2, SE = 2/√30 = 0.365. Distribution is now much closer to normal but with slight residual right-skew. Most CLT-based procedures will work reasonably here. At n = 100: Mean = 2, SE = 2/√100 = 0.20. Distribution is essentially normal — exponential origin completely "forgotten" at the sample-mean level. Probability calculation. What is P(x̄ > 2.5) for n = 50? SE = 2/√50 = 0.283. Z = (2.5 − 2)/0.283 = 1.768. P(Z > 1.768) = 1 − 0.9615 = 0.039. About 3.9% chance of observing a sample mean above 2.5, even though individual values from the exponential distribution can easily exceed 2.5.

Key Points

•Exponential is heavily right-skewed (mean = standard deviation = 1/λ)
•At n = 5, sample-mean distribution still shows skew
•At n = 30, distribution is approximately normal for most purposes
•At n = 100, distribution is indistinguishable from normal
•Standard error decreases as 1/√n — quadrupling n halves the SE

3. Worked Example 2: Uniform Population

A uniform distribution between 0 and 1 has μ = 0.5 and σ² = 1/12, so σ ≈ 0.289. Uniform is symmetric (no skew) so CLT convergence is fast. At n = 5: SE = 0.289/√5 = 0.129. Sampling distribution is already approximately normal — the uniform parent has no skew to overcome. At n = 30: SE = 0.289/√30 = 0.053. At n = 100: SE = 0.289/√100 = 0.029. For symmetric populations, n = 5 to 10 is often sufficient for the CLT approximation to work well in practice. The fact that "n = 30" became the textbook rule of thumb reflects worst-case skewed populations, not all populations. Probability calculation. What is P(x̄ < 0.45) for n = 100? SE = 0.029. Z = (0.45 − 0.5)/0.029 = −1.724. P(Z < −1.724) = 0.0424. About 4.2% chance of observing a sample mean below 0.45, even though about 45% of individual uniform draws are below 0.45.

Key Points

•Uniform distribution: no skew → fast CLT convergence
•For symmetric populations, n = 5–10 often sufficient
•For skewed populations, n = 30+ needed; severely skewed n = 100+
•"n ≥ 30" rule of thumb reflects worst-case, not typical
•Sample-mean distribution narrows rapidly: SE = σ/√n

4. Connection to Standard Error Formulas

The CLT is what justifies the standard error formulas used in inferential statistics. For sample mean: SE(x̄) = σ/√n (CLT directly). For sample proportion p̂ from binary data: SE(p̂) = √(p(1−p)/n). This comes from CLT applied to the binomial: the sample proportion is a sample mean of 0/1 indicator variables. Approximation works when np > 10 and n(1−p) > 10 (the "success-failure condition"). For difference of two means: SE(x̄_1 − x̄_2) = √(σ²_1/n_1 + σ²_2/n_2). This is the formula behind two-sample t-tests; it follows from independence of the two samples and CLT applied to each. For regression coefficients: the CLT applied to linear combinations of residuals gives normal-approximate distributions for slope estimates, justifying t-tests on slopes. Without the CLT, none of these formulas would work for non-normal data. The reason a sample of n = 50 from a heavily skewed population still produces a valid t-test on the mean is precisely that CLT has converted the sample-mean distribution to approximately normal.

Key Points

•SE(x̄) = σ/√n — direct CLT result
•SE(p̂) = √(p(1−p)/n) — CLT applied to binary data; needs np > 10 and n(1−p) > 10
•SE(x̄_1 − x̄_2) = √(σ²_1/n_1 + σ²_2/n_2) — two-sample t-test foundation
•CLT justifies normal-approximate distributions for regression coefficients
•Without CLT, inferential procedures would fail on non-normal data

5. Bootstrap Methods and CLT

The bootstrap is a resampling technique for estimating sampling distributions when analytical formulas are unavailable or assumptions are violated. The procedure: resample n observations WITH replacement from the original sample, compute the statistic of interest, repeat thousands of times, and use the empirical distribution of bootstrap statistics as the sampling distribution. Why the bootstrap works: CLT applies to the bootstrap distribution of x̄ (and many other statistics) under mild conditions. The bootstrap distribution converges to the sampling distribution of the statistic, which by CLT is approximately normal for large n. This is why bootstrap confidence intervals match analytical confidence intervals when the analytical ones are valid. The bootstrap is especially useful when: - The statistic has no closed-form sampling distribution (e.g., median, trimmed mean, ratio of medians). - Sample size is small but the statistic is asymptotically normal. - Data has unusual structure (clustered, censored) that standard formulas do not handle. Limitation: the bootstrap does NOT rescue you from severely skewed small samples where CLT has not yet kicked in. If n is too small for analytical CLT-based inference, n is too small for bootstrap inference too.

Key Points

•Bootstrap: resample n observations WITH replacement, compute statistic, repeat thousands of times
•Bootstrap distribution → sampling distribution under mild conditions
•CLT justifies bootstrap normality for many statistics
•Useful when analytical formulas are unavailable
•Does NOT rescue under-powered small samples

6. How StatsIQ Helps With CLT Problems

CLT problems span every introductory and intermediate statistics course, the AP Statistics exam, and most applied research design discussions. The key skills are recognizing when CLT applies, computing the sampling-distribution parameters (mean μ, SE = σ/√n), and using them for probability or interval calculations. Snap a photo of any CLT problem and StatsIQ identifies the population parameters, computes SE, evaluates the n requirement, and produces the requested probability with the area under the sampling-distribution curve visualized. For multi-part problems involving sample-size planning or bootstrap inference, StatsIQ chains the steps. This content is for educational purposes only and does not constitute statistical advice.

Key Points

•Identifies population parameters (μ, σ) from problem
•Computes SE = σ/√n correctly
•Evaluates whether n is sufficient given population shape
•Produces final probability with sampling-distribution area visualized
•Useful for AP Statistics, intro stats, and applied research design

Key Takeaways

★CLT: x̄ → Normal(μ, σ²/n) as n grows, regardless of population distribution
★Standard error = σ / √n
★Rule of thumb n ≥ 30; severely skewed populations need n closer to 100
★Symmetric populations (uniform): CLT convergence at n ≈ 5-10
★Sample proportion: SE = √(p(1−p)/n); needs np > 10 and n(1−p) > 10
★Two-sample SE: SE(x̄_1 − x̄_2) = √(σ²_1/n_1 + σ²_2/n_2)
★Bootstrap inference relies on CLT for many statistics
★CLT is why z-tests and t-tests work on non-normal data
★Quadrupling n halves SE (since SE ∝ 1/√n)
★Without CLT, inferential statistics would be unable to handle non-normal populations

Practice Questions

1. A right-skewed population has μ = 50, σ = 15. For a sample of n = 36, what is the sampling distribution of x̄?

Approximately Normal(μ = 50, SE = 15/√36 = 2.5). CLT applies because n = 36 ≥ 30. Use this sampling distribution for any probability or CI calculation involving x̄.

2. For the population in question 1, P(x̄ > 53)?

Z = (53 − 50)/2.5 = 1.2. P(Z > 1.2) = 1 − 0.8849 = 0.1151. About 11.5%.

3. A binary survey: p = 0.30. For n = 100, what is the sampling distribution of p̂?

Check conditions: np = 30 > 10 and n(1−p) = 70 > 10 — pass. Sampling distribution: Normal(p = 0.30, SE = √(0.3 × 0.7 / 100) = 0.0458). Use for proportion CIs and z-tests.

4. Why does CLT need n ≥ 30 for skewed populations but only n ≈ 5-10 for symmetric ones?

The CLT convergence rate depends on the third moment (skewness) of the population. Symmetric populations have zero third moment, so convergence is fast. Skewed populations need more sample-size averaging to "wash out" the asymmetry.

Study with AI

Get personalized help and instant answers anytime.

Download StatsIQ

FAQs

Common questions about this topic

CLT applies to any population with finite variance. The Cauchy distribution (and a few other heavy-tailed distributions) have undefined or infinite variance, so CLT does NOT apply — sample means do NOT converge to normal. For practical purposes, every distribution encountered in applied work has finite variance, so CLT applies. The differences are only in HOW FAST convergence happens (skewness controls the rate).

It depends on the population shape. Symmetric populations: n = 5-10 often suffices. Moderately skewed: n = 30 (the textbook rule). Severely skewed (exponential, lognormal): n = 100+ may be needed for the sampling distribution to be indistinguishable from normal. When in doubt, simulate: take many samples of various n from your data, compute means, and visualize.

Population standard deviation σ measures variability of individual observations. Standard error SE measures variability of a statistic (typically a sample mean) — it is σ/√n. SE shrinks as sample size grows; σ does not. Confidence intervals and t-tests use SE because they describe uncertainty about the parameter estimate, not the variability of individual observations.

The bootstrap resamples from the original sample many times and computes the statistic for each resample. By CLT applied to the bootstrap process, the resulting distribution converges to the sampling distribution of the statistic. Bootstrap confidence intervals are then constructed from the percentiles of the bootstrap distribution. The bootstrap is especially valuable when analytical sampling-distribution formulas are unavailable.

Yes. Snap a photo of any sampling-distribution or CLT problem and StatsIQ identifies the population parameters, computes SE, evaluates whether n is sufficient given the population shape, and produces the requested probability or confidence interval with the sampling-distribution curve visualized. This content is for educational purposes only and does not constitute statistical advice.

Related Study Guides

🎲 fundamentals

Browse All Study Guides

🎯 AP Statistics 🔬 Introduction to 📈 Regression Analysis 🎲 Probability Foundations 📊 Understanding Statistical 🧪 ANOVA and 📉 Data Visualization 🔄 Bayesian vs 📊 What Is 📐 What Is 🔗 Correlation vs 📐 Central Limit 📏 Confidence Intervals:📐 P-Values and 📐 Chi-Square Tests ⚠️ Type I 🎲 Sampling Methods 📈 Introduction to 📏 Effect Size 📉 Multiple Regression:🔀 Non-Parametric Tests:🎯 How to 🧪 A/B Testing 🧹 Data Cleaning ⏱️ Survival Analysis:🔗 Introduction to 📈 Time Series 🔬 Principal Component 🔀 How to 📐 Two-Sample t-Test 📊 How to 🔀 Paired vs 📋 How to 📊 Z-Scores and 📈 R Squared 🎲 Binomial Probability 🎲 Expected Value 📐 Standard Error 🎯 Margin of 📊 Contingency Tables 📉 Poisson Distribution:📏 Cohen's d 🔗 Pearson vs ⚖️ One-Tailed vs 🔔 Normal Distribution 📉 Linear Regression 📊 Mean vs 🎯 Confidence vs 📊 Two-Way ANOVA:⚡ Statistical Power 🎯 Conditional Probability 🎲 Permutations vs 📈 Log Transformations 🔄 Simpson's Paradox:🧪 Hypothesis Testing:🎲 Probability Distributions:📈 Central Limit

Central Limit Theorem: Worked Examples and Simulation

What You'll Learn

1. Direct Answer: What CLT Says

Key Points

2. Worked Example 1: Right-Skewed Population (Exponential)

Key Points

3. Worked Example 2: Uniform Population

Key Points

4. Connection to Standard Error Formulas

Key Points

5. Bootstrap Methods and CLT

Key Points

6. How StatsIQ Helps With CLT Problems

Key Points

Key Takeaways

Practice Questions

Study with AI

FAQs

Does CLT apply to ANY population distribution?

How big does n need to be for CLT to work?

What is the difference between population standard deviation and standard error?

How does CLT enable bootstrap inference?

Can StatsIQ help with CLT problems?

Related Study Guides

Probability Distributions: The Complete Guide With Decision Tree

Central Limit Theorem: Definition, Formula + 4 Examples

Standard Error vs Standard Deviation: What's the Difference and When to Use Each

Normal Distribution Probability: Z-Score to Area Worked Examples (Step-by-Step)

Confidence Intervals: What They Mean, How to Calculate Them, and What They Do NOT Tell You

Sampling Methods Explained: Random, Stratified, Cluster, and When to Use Each

Browse All Study Guides