🎯

advancedintermediate20-25 min

Relative Risk vs Odds Ratio: 2×2 Table Worked Examples

Q: When is the odds ratio approximately equal to relative risk?

When the outcome is rare in both groups — a common rule of thumb is below about 10% prevalence. Then a/(a+b) ≈ a/b and c/(c+d) ≈ c/d, so the risk ratio and odds ratio nearly coincide. This "rare-disease assumption" is what lets case-control studies, which can only compute an OR, stand in for the relative risk that could not be measured directly.

Q: Why does logistic regression give odds ratios instead of relative risks?

Logistic regression models the log-odds of the outcome as a linear function of predictors, so exponentiating a coefficient yields an odds ratio by construction. To get adjusted relative risks you need a different model — log-binomial regression or Poisson regression with robust standard errors — which directly models risk but can have convergence issues. For rare outcomes the logistic OR approximates the RR anyway.

Q: Does the odds ratio exaggerate the effect?

It can, for common outcomes. Because odds magnify as probabilities approach 1, an OR sits further from 1 than the corresponding RR whenever the outcome is frequent. A widely cited pitfall is reporting an OR of, say, 2.0 for a 40%-prevalence outcome as if risk doubled, when the actual RR might be closer to 1.5. For common outcomes, prefer the RR when the design allows it.

Q: How do I build a confidence interval for an odds ratio?

Work on the log scale because ratios are right-skewed. Compute ln(OR), then SE = √(1/a + 1/b + 1/c + 1/d). The 95% CI is exp[ln(OR) ± 1.96 × SE]. If that interval includes 1, the association is not statistically significant at the 0.05 level. The same log-scale approach applies to relative risk with its own standard-error formula.

Q: Can StatsIQ compute these from a contingency table photo?

Yes. Snap a photo of the 2×2 table; StatsIQ assigns the a/b/c/d cells, detects whether the design supports relative risk, computes the appropriate measure with its log-scale confidence interval, and warns when a common outcome makes the OR a misleading stand-in for the RR. This content is for educational purposes only.

Relative risk and the odds ratio both measure association in a 2×2 table, but they answer different questions and apply to different study designs. Here is how to compute each, when each is valid, and why the odds ratio overstates risk for common outcomes.

What You'll Learn

✓Compute relative risk and the odds ratio from a 2×2 table.
✓Match each measure to the study design that makes it valid.
✓Explain when the odds ratio approximates relative risk and when it misleads.

1. Direct Answer: RR vs OR

Relative risk (RR), also called the risk ratio, is the probability of the outcome in the exposed group divided by the probability in the unexposed group — a direct "how many times more likely" statement. The odds ratio (OR) is the odds of the outcome given exposure divided by the odds given no exposure, computed as the cross-product ad/bc from the 2×2 table. The key practical difference is design: RR requires that you can estimate actual risks, which you can in cohort studies and randomized trials but NOT in case-control studies, where participants are selected by outcome status. In case-control studies only the OR is valid. The OR is also the native effect measure of logistic regression. When the outcome is rare (under about 10%), the OR closely approximates the RR; when the outcome is common, the OR is pushed further from 1 than the RR and overstates the effect.

Key Points

•RR = risk if exposed ÷ risk if unexposed; OR = odds if exposed ÷ odds if unexposed = ad/bc.
•RR needs estimable risks (cohort/RCT); case-control studies can only report OR.
•OR ≈ RR when the outcome is rare; OR overstates RR when the outcome is common.

2. The 2×2 Table Layout

Lay the table out consistently: rows are exposure (exposed / unexposed), columns are outcome (disease / no disease). Cell a = exposed with the outcome, b = exposed without it, c = unexposed with the outcome, d = unexposed without it. Then: risk in the exposed = a/(a+b); risk in the unexposed = c/(c+d); odds in the exposed = a/b; odds in the unexposed = c/d. Getting the cells in the right corners is half the battle — a surprising share of wrong answers come from transposing exposure and outcome. Always label the margins before computing anything.

Key Points

•Rows = exposure, columns = outcome; a/b/c/d in the four cells.
•Risk exposed = a/(a+b); risk unexposed = c/(c+d).
•Odds exposed = a/b; odds unexposed = c/d.

3. Computing Relative Risk

RR = [a/(a+b)] / [c/(c+d)]. An RR of 1 means no association; RR > 1 means exposure raises risk; RR < 1 means exposure is protective. The interpretation is clean and intuitive: an RR of 2.0 means the exposed group is twice as likely to develop the outcome. Because RR uses the denominators (a+b) and (c+d), you must KNOW the size of the exposed and unexposed populations and observe outcomes prospectively — which is exactly why RR works in cohort studies and trials but not case-control designs, where the investigator fixes the number of cases and controls and the "risk" denominators are artifacts of sampling.

Key Points

•RR = [a/(a+b)] / [c/(c+d)].
•RR = 1 no effect, > 1 harmful, < 1 protective; reads as "times as likely."
•Valid only when the at-risk denominators are real (cohort/RCT).

4. Computing the Odds Ratio

OR = (a/b) / (c/d) = ad / bc — the cross-product ratio. An OR of 1 means no association, with the same directional reading as RR. The OR’s mathematical advantage is that it does not depend on the at-risk denominators, so it stays valid even when cases and controls are sampled separately — that is why case-control studies report ORs. It is also symmetric and the quantity logistic regression naturally estimates (the exponentiated coefficient is an OR). The cost is interpretability: an OR is a ratio of odds, not of probabilities, so "2.5 times the odds" is not the same plain-English claim as "2.5 times the risk," and the two diverge as the outcome becomes common.

Key Points

•OR = ad/bc, the cross-product of the 2×2 table.
•Independent of at-risk denominators → valid in case-control studies.
•OR is the exponentiated coefficient in logistic regression.

5. Worked Example 1: Cohort Study (Both Valid)

A cohort follows 200 smokers and 200 non-smokers for a respiratory outcome. Smokers: 40 develop it, 160 do not (a = 40, b = 160). Non-smokers: 10 develop it, 190 do not (c = 10, d = 190). Risk in smokers = 40/200 = 0.20. Risk in non-smokers = 10/200 = 0.05. RR = 0.20 / 0.05 = 4.0 — smokers are 4 times as likely to develop the outcome. Now the OR = ad/bc = (40 × 190) / (160 × 10) = 7600 / 1600 = 4.75. Both are valid because this is a cohort, but notice the OR (4.75) is larger than the RR (4.0): the outcome here (20% in the exposed) is not rare, so the OR overstates the risk ratio. Report the RR as the headline for a cohort with a common outcome.

Key Points

•Cohort design → compute both; RR is the more interpretable headline.
•RR = 4.0 but OR = 4.75 — the OR is inflated because the 20% outcome is not rare.
•The cross-product ad/bc gives the OR directly.

6. Worked Example 2: Case-Control (OR Only) and the Rare-Disease Case

A case-control study recruits 100 cases (people with a rare cancer) and 100 controls, then asks about a past exposure. Cases: 30 exposed, 70 not (a = 30, b = 70 — but here the "rows" are set by the design). Controls: 10 exposed, 90 not. Because the investigator FIXED 100 cases and 100 controls, the column totals do not reflect true population risk, so RR is meaningless. Compute the OR = ad/bc = (30 × 90) / (70 × 10) = 2700 / 700 = 3.86. Since the cancer is rare in the population, this OR is a good approximation of the RR you could not measure directly — the rare-disease assumption in action. Interpretation: the exposure is associated with roughly 3.9 times the odds of disease, and because the disease is rare, about 3.9 times the risk as well.

Key Points

•Case-control designs fix case and control counts → RR is not estimable, OR only.
•OR = ad/bc = 3.86 here.
•For rare outcomes the OR approximates the RR you cannot measure directly.

7. Confidence Intervals and Interpretation Traps

Both RR and OR are skewed ratios, so confidence intervals are built on the log scale and then exponentiated. For the OR, the standard error of ln(OR) is √(1/a + 1/b + 1/c + 1/d); the 95% CI is exp[ln(OR) ± 1.96 × SE]. If the interval includes 1, the association is not statistically significant. The biggest interpretation trap is treating an OR as if it were an RR when the outcome is common — media reports doing this routinely exaggerate effects. A second trap is sign confusion with protective effects: an OR of 0.5 means halved odds, which for a rare outcome is roughly halved risk, but for a common outcome the RR would be closer to 1 than 0.5. When the outcome is common and the design allows it, prefer RR; when only an OR is available, say "odds," not "risk."

Key Points

•CIs use ln(ratio) ± 1.96 × SE; for OR, SE of ln(OR) = √(1/a+1/b+1/c+1/d).
•An interval containing 1 means no significant association.
•Do not call an odds ratio a risk ratio when the outcome is common.

8. Working These Problems in StatsIQ

Snap a photo of a 2×2 table or an epidemiology problem and StatsIQ labels the cells, identifies the study design, computes the valid measure (RR and OR for cohorts/trials, OR only for case-control), and builds the log-scale confidence interval. It flags the common-outcome case where the OR diverges from the RR so your interpretation stays honest. This content is for educational purposes only.

Key Points

•Automatic cell labeling and study-design detection.
•Computes the valid measure with a log-scale confidence interval.
•Warns when a common outcome makes the OR a poor stand-in for the RR.

Key Takeaways

★RR = [a/(a+b)] / [c/(c+d)]; OR = ad/bc (cross-product).
★Cohort/RCT → RR or OR valid; case-control → OR only (RR not estimable).
★OR ≈ RR when the outcome is rare (<~10%); OR overstates RR for common outcomes.
★OR is the exponentiated logistic-regression coefficient.
★CIs are computed on the log scale; SE of ln(OR) = √(1/a+1/b+1/c+1/d).

Practice Questions

1. A 2×2 cohort table has a = 50, b = 50, c = 20, d = 80. Find RR and OR.

Risk exposed = 50/100 = 0.50; risk unexposed = 20/100 = 0.20; RR = 0.50/0.20 = 2.5. OR = ad/bc = (50×80)/(50×20) = 4000/1000 = 4.0. The OR (4.0) exceeds the RR (2.5) because the 50% outcome is far from rare.

2. Why can a case-control study report an OR but not an RR?

Because the investigator fixes the number of cases and controls, the outcome-column totals do not reflect the true population risk, so risks a/(a+b) and c/(c+d) are not meaningful. The odds ratio uses the cross-product and does not depend on those at-risk denominators, so it remains valid.

3. An OR of 0.40 for a rare outcome — what does it mean for risk?

It means the exposure is associated with about 0.40 times the odds of the outcome. Because the outcome is rare, the OR approximates the RR, so it also implies roughly 0.40 times the risk — a protective association. For a common outcome you could not make that risk claim from the OR alone.

Study with AI

Get personalized help and instant answers anytime.

Download StatsIQ

FAQs

Common questions about this topic

When the outcome is rare in both groups — a common rule of thumb is below about 10% prevalence. Then a/(a+b) ≈ a/b and c/(c+d) ≈ c/d, so the risk ratio and odds ratio nearly coincide. This "rare-disease assumption" is what lets case-control studies, which can only compute an OR, stand in for the relative risk that could not be measured directly.

Logistic regression models the log-odds of the outcome as a linear function of predictors, so exponentiating a coefficient yields an odds ratio by construction. To get adjusted relative risks you need a different model — log-binomial regression or Poisson regression with robust standard errors — which directly models risk but can have convergence issues. For rare outcomes the logistic OR approximates the RR anyway.

It can, for common outcomes. Because odds magnify as probabilities approach 1, an OR sits further from 1 than the corresponding RR whenever the outcome is frequent. A widely cited pitfall is reporting an OR of, say, 2.0 for a 40%-prevalence outcome as if risk doubled, when the actual RR might be closer to 1.5. For common outcomes, prefer the RR when the design allows it.

Work on the log scale because ratios are right-skewed. Compute ln(OR), then SE = √(1/a + 1/b + 1/c + 1/d). The 95% CI is exp[ln(OR) ± 1.96 × SE]. If that interval includes 1, the association is not statistically significant at the 0.05 level. The same log-scale approach applies to relative risk with its own standard-error formula.

Yes. Snap a photo of the 2×2 table; StatsIQ assigns the a/b/c/d cells, detects whether the design supports relative risk, computes the appropriate measure with its log-scale confidence interval, and warns when a common outcome makes the OR a misleading stand-in for the RR. This content is for educational purposes only.

Related Study Guides

🧪 fundamentals

Browse All Study Guides

🎯 AP Statistics 🔬 Introduction to 📈 Regression Analysis 🎲 Probability Foundations 📊 Understanding Statistical 🧪 ANOVA and 📉 Data Visualization 🔄 Bayesian vs 📊 What Is 📐 What Is 🔗 Correlation vs 📐 Central Limit 📏 Confidence Intervals:📐 P-Values and 📐 Chi-Square Tests ⚠️ Type I 🎲 Sampling Methods 📈 Introduction to 📏 Effect Size 📉 Multiple Regression:🔀 Non-Parametric Tests:🎯 How to 🧪 A/B Testing 🧹 Data Cleaning ⏱️ Survival Analysis:🔗 Introduction to 📈 Time Series 🔬 Principal Component 🔀 How to 📐 Two-Sample t-Test 📊 How to 🔀 Paired vs 📋 How to 📊 Z-Scores and 📈 R Squared 🎲 Binomial Probability 🎲 Expected Value 📐 Standard Error 🎯 Margin of 📊 Contingency Tables 📉 Poisson Distribution:📏 Cohen's d 🔗 Pearson vs ⚖️ One-Tailed vs 🔔 Normal Distribution 📉 Linear Regression 📊 Mean vs 🎯 Confidence vs 📊 Two-Way ANOVA:⚡ Statistical Power 🎯 Conditional Probability 🎲 Permutations vs 📈 Log Transformations 🔄 Simpson's Paradox:🧪 Hypothesis Testing:🎲 Probability Distributions:📈 Central Limit ⚖️ Type I 🎯 P-Value Interpretation:↔️ One-Tailed vs 🎲 Binomial vs 📊 Normal Distribution 📈 Discrete vs 📊 Chi-Square Goodness-of-Fit 🔬 Mann-Whitney U ⏱️ Exponential Distribution:🎯 Geometric vs 🎯 Wilcoxon Signed-Rank 🎯 Kruskal-Wallis Test 🎯 Tukey HSD 🎯 Relative Risk 🔁 Friedman Test 📈 Spearman vs 🎚️ Bonferroni vs 🎯 Confidence vs ⚡ A-Priori vs

Relative Risk vs Odds Ratio: 2×2 Table Worked Examples

What You'll Learn

1. Direct Answer: RR vs OR

Key Points

2. The 2×2 Table Layout

Key Points

3. Computing Relative Risk

Key Points

4. Computing the Odds Ratio

Key Points

5. Worked Example 1: Cohort Study (Both Valid)

Key Points

6. Worked Example 2: Case-Control (OR Only) and the Rare-Disease Case

Key Points

7. Confidence Intervals and Interpretation Traps

Key Points

8. Working These Problems in StatsIQ

Key Points

Key Takeaways

Practice Questions

Study with AI

FAQs

When is the odds ratio approximately equal to relative risk?

Why does logistic regression give odds ratios instead of relative risks?

Does the odds ratio exaggerate the effect?

How do I build a confidence interval for an odds ratio?

Can StatsIQ compute these from a contingency table photo?

Related Study Guides

Hypothesis Testing: The Complete Guide With 6 Worked Tests

Chi-Square Tests Explained: Goodness of Fit and Test of Independence

Contingency Tables and Two-Way Tables: How to Build, Read, and Test for Association

Introduction to Logistic Regression: When and Why Linear Regression Fails for Binary Outcomes

Conditional Probability and Bayes Theorem: Worked Examples

Browse All Study Guides