Correlation vs Causation: What Is the Difference?
Understand why correlation does not imply causation, learn to identify confounding variables and spurious correlations, and know what study designs can establish causal relationships.
What You'll Learn
- โDefine correlation and causation and explain the difference
- โIdentify confounding variables that create spurious associations
- โRecognize study designs that can and cannot establish causation
- โApply correlation vs causation reasoning to real-world claims
1. Correlation: What It Means
Correlation is a statistical relationship between two variables โ when one changes, the other tends to change in a predictable way. Positive correlation means both variables increase together (height and weight). Negative correlation means one increases as the other decreases (price and quantity demanded). The correlation coefficient (r) ranges from -1 to +1, with values near ยฑ1 indicating strong linear relationships and values near 0 indicating no linear relationship. Correlation measures the strength and direction of a linear association. It does not measure whether one variable causes the other to change.
Key Points
- โขCorrelation measures the strength and direction of a linear relationship between two variables
- โขr ranges from -1 to +1: closer to ยฑ1 = stronger relationship, 0 = no linear relationship
- โขCorrelation only captures linear associations โ nonlinear relationships may exist even when r โ 0
2. Causation: What It Requires
Causation means that changes in one variable directly produce changes in the other. Establishing causation requires three conditions: (1) the cause must precede the effect in time (temporal precedence), (2) the two variables must be correlated (association), and (3) there must be no plausible alternative explanations for the association (no confounding). Observational studies can satisfy conditions 1 and 2, but they struggle with condition 3 because there are almost always potential confounders that have not been measured or controlled. This is why randomized controlled experiments are the gold standard for establishing causation โ random assignment eliminates confounding variables by distributing them equally across groups.
Key Points
- โขCausation requires temporal precedence, association, AND elimination of alternative explanations
- โขObservational studies can show association but rarely prove causation
- โขRandomized controlled experiments are the gold standard because random assignment eliminates confounders
3. Confounding Variables and Spurious Correlations
A confounding variable is a third variable that influences both the suspected cause and the observed effect, creating a misleading association. Classic example: ice cream sales and drowning deaths are strongly correlated. But ice cream does not cause drowning. The confounder is hot weather, which increases both ice cream consumption and swimming activity. Spurious correlations abound in large datasets. Per capita cheese consumption correlates with the number of people who died tangled in their bedsheets. These associations are coincidental artifacts of looking at enough data pairs. Always ask: is there a plausible confounding variable that could explain this relationship without a direct causal link?
Key Points
- โขA confounder affects both variables, creating a false appearance of causation
- โขSpurious correlations are coincidental associations found in large datasets
- โขAlways ask: what third variable could explain both sides of this relationship?
4. Applying This to Real-World Claims
When you encounter a claim that X causes Y, ask four questions. (1) Is there a correlation? Check if the association is statistically significant and practically meaningful. (2) Is the direction clear? Does X precede Y, or could the relationship be reversed? (3) Are confounders controlled? Was this a randomized experiment or an observational study? (4) Has the result been replicated? A single study, no matter how well-designed, is not definitive. News headlines frequently misrepresent correlational findings as causal. A study finding that people who eat breakfast tend to weigh less does not prove breakfast causes weight loss โ it may reflect lifestyle differences between breakfast eaters and skippers. StatsIQ helps you practice evaluating these claims through exam-style questions that test your ability to distinguish correlation from causation in study designs.
Key Points
- โขCheck for association, temporal direction, confounder control, and replication before accepting causation
- โขNews headlines frequently overstate correlational findings as causal
- โขThis reasoning is heavily tested on AP Statistics and intro stats exams
Key Takeaways
- โ Correlation does not imply causation is arguably the single most important principle in statistics and the most frequently tested concept on introductory exams
- โ Randomized controlled trials establish causation by using random assignment to eliminate confounding variables
- โ Observational studies can suggest causation when combined with strong theory, dose-response relationships, consistency across studies, and biological plausibility (Bradford Hill criteria)
- โ Reverse causation occurs when the suspected effect actually causes the suspected cause โ for example, illness may cause reduced exercise rather than reduced exercise causing illness
- โ Simpson's paradox shows that a trend in aggregated data can reverse when the data is separated into subgroups, further illustrating why correlation is not causation
Practice Questions
1. A study finds that students who use tutoring services have lower GPAs than students who do not. A parent concludes that tutoring hurts academic performance. What is wrong with this conclusion?
2. A randomized controlled trial finds that Drug A reduces blood pressure more than a placebo (p < 0.01). Can we conclude Drug A causes blood pressure reduction?
3. Countries with higher chocolate consumption per capita have more Nobel Prize winners. Does chocolate consumption cause Nobel Prizes?
FAQs
Common questions about this topic
Observational studies alone cannot definitively prove causation, but they can provide strong evidence when combined with multiple criteria: consistent findings across many studies, a dose-response relationship, temporal precedence, biological plausibility, and absence of plausible confounders. The link between smoking and lung cancer was established primarily through observational evidence meeting these criteria.
Exams typically present a scenario describing a study and its findings, then ask whether a causal conclusion is justified. You need to identify whether the study was observational or experimental, whether confounders were controlled, and whether the conclusion overstates the evidence. StatsIQ can help you practice these reasoning skills with exam-style questions.