Statistics Study Guides
46 comprehensive guides to help you master statistics concepts, from fundamentals to exam prep.
AP Statistics Exam Prep
A targeted study guide for the AP Statistics exam. Covers exploring data, sampling and experimentation, probability, and statistical inference with exam-specific strategies.
Introduction to Hypothesis Testing
A beginner-friendly guide to the logic and mechanics of hypothesis testing. Learn how to formulate hypotheses, calculate test statistics, interpret p-values, and draw conclusions.
Regression Analysis Complete Guide
A comprehensive guide to regression analysis, from simple linear regression to multiple regression. Covers model fitting, diagnostics, interpretation of coefficients, and common pitfalls.
Probability Foundations
Build a solid understanding of probability, the mathematical language underlying all of statistics. Covers basic rules, conditional probability, Bayes theorem, and common counting methods.
Understanding Statistical Distributions
A guide to the most important probability distributions in statistics. Learn the shapes, parameters, and applications of the normal, binomial, t, chi-square, and other key distributions.
ANOVA and Experimental Design
A comprehensive guide to analysis of variance (ANOVA) and the design of experiments. Covers one-way ANOVA, two-way ANOVA, blocking, randomization, and post-hoc comparisons.
Data Visualization and Descriptive Statistics
Learn how to summarize and visualize data effectively. Covers measures of center, spread, shape, graphical displays, and best practices for communicating data clearly.
Bayesian vs Frequentist Statistics
An exploration of the two major paradigms of statistical inference. Frequentist methods rely on long-run frequencies and fixed parameters. Bayesian methods incorporate prior beliefs and update them with data.
What Is a P-Value? Definition, Interpretation, and Examples
Understand what a p-value actually means, how to interpret it correctly, common misconceptions that lead to wrong conclusions, and how p-values connect to hypothesis testing.
What Is Standard Deviation and How Do You Calculate It?
Learn what standard deviation measures, how to calculate it by hand and interpret it, and why it is the most important measure of spread in statistics.
Correlation vs Causation: What Is the Difference?
Understand why correlation does not imply causation, learn to identify confounding variables and spurious correlations, and know what study designs can establish causal relationships.
The Central Limit Theorem: Why It Matters, What It Says, and How to Apply It
The Central Limit Theorem (CLT) is the single most important theorem in statistics โ it is the reason we can make inferences about populations from samples, and it underpins virtually every confidence interval and hypothesis test you will ever encounter. This guide explains what the CLT actually says, why it works, and how to apply it to real problems.
Confidence Intervals: What They Mean, How to Calculate Them, and What They Do NOT Tell You
A confidence interval gives you a range of plausible values for a population parameter based on sample data โ and it is one of the most misinterpreted concepts in all of statistics. This guide explains what confidence intervals actually mean (and what they do not mean), how to calculate them for means and proportions, and how to interpret them correctly on exams and in practice.
P-Values and Statistical Significance: What They Actually Mean
A p-value is the probability of observing data as extreme as (or more extreme than) your sample data, assuming the null hypothesis is true. It is NOT the probability that the null hypothesis is true, NOT the probability that your results are due to chance, and NOT the probability of making an error. Getting this definition right is the foundation of all statistical inference.
Chi-Square Tests Explained: Goodness of Fit and Test of Independence
A clear walkthrough of both chi-square tests โ goodness of fit (does a distribution match what we expected?) and test of independence (are two categorical variables related?) โ with worked examples, assumptions, and interpretation guidelines.
Type I and Type II Errors Explained: Power, Sample Size, and the Trade-Off
Understand the two kinds of mistakes in hypothesis testing, how they relate to each other, what statistical power actually means, and how sample size affects your ability to detect real effects.
Sampling Methods Explained: Random, Stratified, Cluster, and When to Use Each
A practical guide to the four major sampling methods โ simple random, stratified, cluster, and systematic โ covering how each works, when to use it, common mistakes, and how sampling method affects the conclusions you can draw.
Introduction to Logistic Regression: When and Why Linear Regression Fails for Binary Outcomes
A clear introduction to logistic regression for students who understand linear regression and need to extend to binary outcomes โ covering why linear regression breaks down for yes/no predictions, how the logit transformation works, and how to interpret odds ratios.
Effect Size Measures: Cohen's d, Eta-Squared, and Why P-Values Are Not Enough
A practical guide to effect size โ what it measures, why statistical significance alone is misleading, how to calculate and interpret Cohen's d and eta-squared, and how reporting effect sizes makes your research more honest and more useful.
Multiple Regression: How to Handle Multiple Predictors and Avoid Multicollinearity
A clear guide to multiple regression for students who understand simple regression and need to extend to two or more predictors โ covering the model equation, how to interpret each coefficient, what multicollinearity is and why it wrecks your analysis, and how to detect and fix it.
Non-Parametric Tests: When to Use Mann-Whitney, Wilcoxon, and Kruskal-Wallis
A practical guide to the three most common non-parametric tests covering when parametric assumptions fail, how rank-based tests work, and step-by-step procedures for Mann-Whitney U (two independent groups), Wilcoxon signed-rank (paired data), and Kruskal-Wallis (three or more groups).
How to Handle Outliers in Your Data: Detection Methods and Decision Framework
A practical guide to outlier detection and treatment covering how to identify outliers using statistical methods (IQR rule, z-scores) and visual methods (boxplots, scatterplots), the decision framework for keeping, transforming, or removing them, and the consequences of getting it wrong.
A/B Testing Done Right: Experiment Design, Sample Size, and Avoiding False Discoveries
A practical guide to A/B testing covering how to design valid experiments, calculate the sample size you actually need, choose the right statistical test, interpret results without fooling yourself, and avoid the most common mistakes that produce false discoveries in industry A/B tests.
Data Cleaning and Preprocessing: The Unglamorous Work That Determines Whether Your Analysis Is Worth Anything
A practical guide to data cleaning covering how to diagnose data quality problems (missing values, duplicates, inconsistencies, outliers), the decision framework for handling each type, common preprocessing transformations, and why data cleaning is where most analyses go wrong โ not in the modeling.
Survival Analysis: Time-to-Event Data, Kaplan-Meier Curves, and Cox Proportional Hazards Regression
A practical guide to survival analysis covering censored data, Kaplan-Meier estimation, log-rank tests, and Cox proportional hazards regression โ the core toolkit for analyzing how long it takes for events to occur.
Introduction to Causal Inference: Why Correlation Is Not Enough and What Actually Establishes Causation
A rigorous introduction to causal inference covering why observational correlations mislead, the potential outcomes framework, confounding, DAGs, and the key methods โ randomized experiments, difference-in-differences, instrumental variables, and regression discontinuity โ that let researchers make causal claims from imperfect data.
Time Series Analysis: How to Decompose Trend, Seasonality, and Noise โ and Why Your Forecast Depends on Getting It Right
A practical guide to time series analysis covering the components of a time series (trend, seasonality, cyclicality, noise), decomposition methods, stationarity and differencing, and the core forecasting models (ARIMA, exponential smoothing) with enough worked examples to actually use them.
Principal Component Analysis (PCA): Reducing Dimensions Without Losing What Matters
A practical guide to PCA covering why high-dimensional data is hard to work with, how PCA finds the directions of maximum variance, the mechanics of eigenvalues and eigenvectors in plain language, how to choose the right number of components, and common mistakes that produce misleading results.
How to Choose the Right Statistical Test: A Decision Flowchart for Every Common Scenario
The most practical guide in statistics: given your data type and research question, which test do you use? A decision flowchart covering t-tests, ANOVA, chi-square, correlation, regression, and non-parametric alternatives โ with the criteria for choosing each one.
Two-Sample t-Test Step by Step: Hypotheses, Calculation, and Interpretation With a Worked Example
A complete step-by-step walkthrough of the independent two-sample t-test โ from stating hypotheses through calculating the test statistic, finding the p-value, and writing the conclusion. Includes a fully worked numerical example that you can follow along with.
How to Interpret Regression Output: Coefficients, R-Squared, p-Values, and What They Mean
A practical guide to reading regression output from any statistical software โ covering what each number means, how to interpret coefficients (slope and intercept), Rยฒ and adjusted Rยฒ, the F-test, coefficient p-values, confidence intervals, and the mistakes students make when interpreting results on exams.
Paired vs Independent t-Test: When to Use Which and Why It Matters for Your Results
A clear guide to choosing between the paired (dependent) t-test and the independent (two-sample) t-test โ covering the key distinction (same subjects vs different subjects), how each test calculates the t-statistic differently, why using the wrong test inflates your error rate, and worked examples showing when each applies.
How to Read ANOVA Output: Sum of Squares, Mean Square, F-Statistic, and Post-Hoc Tests
ANOVA tables show up in every statistics course and every research paper that compares group means, but most students stare at the rows and columns without knowing what any of them actually mean. This guide walks through every piece of ANOVA output โ sum of squares, degrees of freedom, mean square, the F-statistic, and the p-value โ and then explains what to do after a significant result with post-hoc tests.
Z-Scores and the Standard Normal Table: How to Calculate and Look Up Probabilities
Z-scores convert any normally distributed value into a standard scale, and the standard normal table (z-table) turns those scores into probabilities. This guide covers the full workflow: calculating z-scores, reading the z-table correctly, handling left-tail and right-tail areas, working between two z-values, and applying z-scores to real problems involving percentiles and probability.
Coefficient of Determination (Rยฒ) Formula: 1 - SS_res / SS_tot Explained With Worked Examples
A complete guide to the coefficient of determination (Rยฒ) โ covering the formula 1 - SS_res / SS_tot, how to compute the sum of squared residuals and total sum of squares, what Rยฒ means in regression analysis, and how to interpret values from 0 to 1 with worked examples.
Binomial Probability Formula: P(X=k) = C(n,k) ร p^k ร (1-p)^(n-k) Explained With Worked Examples
A complete guide to the binomial probability formula โ covering the four conditions for a binomial experiment, the formula derivation, how to compute binomial coefficients, worked examples for exact and at-least probabilities, and how to use the binomial in hypothesis testing.
Expected Value and Variance of Discrete Random Variables: Formulas and Worked Examples
A complete guide to calculating expected value E(X) and variance Var(X) for discrete random variables โ covering the formulas, step-by-step worked examples, the shortcut formula for variance, and applications in probability and statistics courses.
Standard Error vs Standard Deviation: What's the Difference and When to Use Each
A clear explanation of the difference between standard deviation (SD) and standard error (SE) โ two concepts that are commonly confused but measure completely different things. Covers what each one represents, how they are calculated, when to report each, and the common mistakes students make.
Margin of Error and Sample Size: How to Calculate Each and Why They're Connected
A clear guide to margin of error and sample size calculations โ covering what margin of error means, how to calculate it for proportions and means, how to determine the sample size needed for a desired margin of error, and the practical tradeoffs in survey and experiment design.
Contingency Tables and Two-Way Tables: How to Build, Read, and Test for Association
A practical guide to contingency tables (two-way tables) โ covering how to construct them from raw data, how to calculate row and column percentages, how to test for association between two categorical variables using the chi-square test, and the common mistakes students make when interpreting two-way tables.
Poisson Distribution: Formula, When to Use, and Worked Examples for Students
A complete guide to the Poisson distribution โ covering the formula, the conditions that make a Poisson distribution appropriate, worked examples for counts of events in time/space, and the relationship to the binomial distribution.
Cohen's d Effect Size: Formula, Interpretation, and When to Report It
A complete guide to Cohen's d โ covering the formula, how to calculate it from typical study data, the standard interpretation thresholds, and why effect sizes are essential alongside p-values in modern statistical reporting.
Pearson vs Spearman vs Kendall Correlation Coefficient: Formulas, Differences, and Which to Use
Three correlation coefficients dominate statistical practice โ Pearson, Spearman, and Kendall. They measure different things and work with different data types. Learn the formulas, the assumptions, when each is appropriate, and the worked examples that show exactly when they agree and disagree.
One-Tailed vs Two-Tailed Hypothesis Tests: When to Use Each with Worked Examples
Choosing between a one-tailed and two-tailed hypothesis test is one of the most consequential and most commonly botched decisions in applied statistics. Learn the formal definitions, the conditions under which each is appropriate, the penalty for choosing incorrectly, and worked examples across t-tests, z-tests, and proportion tests.
Normal Distribution Probability: Z-Score to Area Worked Examples (Step-by-Step)
Every intro statistics class revolves around the normal distribution and the z-score. But the worked-example side โ given a real-world problem, produce the probability โ is where students get stuck. This guide walks through the full translation from real-world question to z-score to area to probability, with examples for left tail, right tail, between-value, and outside ranges.
Linear Regression Assumptions: How to Check Residuals, Homoscedasticity, and Normality
Linear regression gives you coefficients and p-values โ but those are only trustworthy if the underlying assumptions hold. This guide walks through the five key assumptions, how to check each using residual plots and diagnostic tests, and what to do when assumptions are violated.