Correlation
Correlation measures the strength and direction of the linear relationship between two quantitative variables. The Pearson correlation coefficient (r) ranges from -1 to +1, while the Spearman rank correlation captures monotonic relationships. Understanding the distinction between correlation and causation is one of the most important lessons in statistics.
Solve Correlation Problems with AI
Snap a photo of any correlation problem and get instant step-by-step solutions.
Download StatsIQKey Concepts
Study Tips
- โAlways make a scatterplot before computing a correlation coefficient. Patterns like nonlinear relationships, clusters, or influential outliers can drastically affect r without being obvious from the number alone.
- โRemember that r only measures linear association. A dataset with a strong curved relationship can have r near 0. If the scatterplot is curved, consider a transformation or a different model.
- โPractice articulating why correlation does not imply causation. Be ready to suggest lurking variables or reverse causation for any given example.
- โUnderstand that squaring the correlation gives R-squared: if r = 0.70, then R-squared = 0.49, meaning about 49% of the variability in one variable is explained by the linear relationship with the other.
Common Mistakes to Avoid
The biggest and most consequential mistake is assuming that a strong correlation implies a causal relationship. There may be confounding variables, reverse causation, or coincidental association. Students also sometimes apply Pearson's r to nonlinear relationships or ordinal data, where Spearman's rho would be more appropriate. Another error is not recognizing that a single outlier can dramatically inflate or deflate the correlation coefficient, and removing it can change the conclusion entirely.
Correlation FAQs
Common questions about correlation
Correlation only tells you that two variables move together; it cannot establish that one causes the other. There are several reasons: a third confounding variable might drive both (e.g., ice cream sales and drownings both increase in summer because of heat), the direction of causality might be reversed, or the association might be purely coincidental. Establishing causation requires controlled experiments or advanced causal inference methods that go beyond simple correlation.
Use Spearman's rank correlation when your data are ordinal (e.g., survey rankings), when the relationship is monotonic but not necessarily linear, or when your data contain outliers that would distort Pearson's r. Spearman's rho is based on the ranks of the data rather than the raw values, making it more robust to skewness and outliers. If your data are roughly bivariate normal and the relationship is linear, Pearson's r is preferred because it is more statistically powerful.