Relative Risk vs Odds Ratio: 2×2 Table Worked Examples
Relative risk and the odds ratio both measure association in a 2×2 table, but they answer different questions and apply to different study designs. Here is how to compute each, when each is valid, and why the odds ratio overstates risk for common outcomes.
What You'll Learn
- ✓Compute relative risk and the odds ratio from a 2×2 table.
- ✓Match each measure to the study design that makes it valid.
- ✓Explain when the odds ratio approximates relative risk and when it misleads.
1. Direct Answer: RR vs OR
Relative risk (RR), also called the risk ratio, is the probability of the outcome in the exposed group divided by the probability in the unexposed group — a direct "how many times more likely" statement. The odds ratio (OR) is the odds of the outcome given exposure divided by the odds given no exposure, computed as the cross-product ad/bc from the 2×2 table. The key practical difference is design: RR requires that you can estimate actual risks, which you can in cohort studies and randomized trials but NOT in case-control studies, where participants are selected by outcome status. In case-control studies only the OR is valid. The OR is also the native effect measure of logistic regression. When the outcome is rare (under about 10%), the OR closely approximates the RR; when the outcome is common, the OR is pushed further from 1 than the RR and overstates the effect.
Key Points
- •RR = risk if exposed ÷ risk if unexposed; OR = odds if exposed ÷ odds if unexposed = ad/bc.
- •RR needs estimable risks (cohort/RCT); case-control studies can only report OR.
- •OR ≈ RR when the outcome is rare; OR overstates RR when the outcome is common.
2. The 2×2 Table Layout
Lay the table out consistently: rows are exposure (exposed / unexposed), columns are outcome (disease / no disease). Cell a = exposed with the outcome, b = exposed without it, c = unexposed with the outcome, d = unexposed without it. Then: risk in the exposed = a/(a+b); risk in the unexposed = c/(c+d); odds in the exposed = a/b; odds in the unexposed = c/d. Getting the cells in the right corners is half the battle — a surprising share of wrong answers come from transposing exposure and outcome. Always label the margins before computing anything.
Key Points
- •Rows = exposure, columns = outcome; a/b/c/d in the four cells.
- •Risk exposed = a/(a+b); risk unexposed = c/(c+d).
- •Odds exposed = a/b; odds unexposed = c/d.
3. Computing Relative Risk
RR = [a/(a+b)] / [c/(c+d)]. An RR of 1 means no association; RR > 1 means exposure raises risk; RR < 1 means exposure is protective. The interpretation is clean and intuitive: an RR of 2.0 means the exposed group is twice as likely to develop the outcome. Because RR uses the denominators (a+b) and (c+d), you must KNOW the size of the exposed and unexposed populations and observe outcomes prospectively — which is exactly why RR works in cohort studies and trials but not case-control designs, where the investigator fixes the number of cases and controls and the "risk" denominators are artifacts of sampling.
Key Points
- •RR = [a/(a+b)] / [c/(c+d)].
- •RR = 1 no effect, > 1 harmful, < 1 protective; reads as "times as likely."
- •Valid only when the at-risk denominators are real (cohort/RCT).
4. Computing the Odds Ratio
OR = (a/b) / (c/d) = ad / bc — the cross-product ratio. An OR of 1 means no association, with the same directional reading as RR. The OR’s mathematical advantage is that it does not depend on the at-risk denominators, so it stays valid even when cases and controls are sampled separately — that is why case-control studies report ORs. It is also symmetric and the quantity logistic regression naturally estimates (the exponentiated coefficient is an OR). The cost is interpretability: an OR is a ratio of odds, not of probabilities, so "2.5 times the odds" is not the same plain-English claim as "2.5 times the risk," and the two diverge as the outcome becomes common.
Key Points
- •OR = ad/bc, the cross-product of the 2×2 table.
- •Independent of at-risk denominators → valid in case-control studies.
- •OR is the exponentiated coefficient in logistic regression.
5. Worked Example 1: Cohort Study (Both Valid)
A cohort follows 200 smokers and 200 non-smokers for a respiratory outcome. Smokers: 40 develop it, 160 do not (a = 40, b = 160). Non-smokers: 10 develop it, 190 do not (c = 10, d = 190). Risk in smokers = 40/200 = 0.20. Risk in non-smokers = 10/200 = 0.05. RR = 0.20 / 0.05 = 4.0 — smokers are 4 times as likely to develop the outcome. Now the OR = ad/bc = (40 × 190) / (160 × 10) = 7600 / 1600 = 4.75. Both are valid because this is a cohort, but notice the OR (4.75) is larger than the RR (4.0): the outcome here (20% in the exposed) is not rare, so the OR overstates the risk ratio. Report the RR as the headline for a cohort with a common outcome.
Key Points
- •Cohort design → compute both; RR is the more interpretable headline.
- •RR = 4.0 but OR = 4.75 — the OR is inflated because the 20% outcome is not rare.
- •The cross-product ad/bc gives the OR directly.
6. Worked Example 2: Case-Control (OR Only) and the Rare-Disease Case
A case-control study recruits 100 cases (people with a rare cancer) and 100 controls, then asks about a past exposure. Cases: 30 exposed, 70 not (a = 30, b = 70 — but here the "rows" are set by the design). Controls: 10 exposed, 90 not. Because the investigator FIXED 100 cases and 100 controls, the column totals do not reflect true population risk, so RR is meaningless. Compute the OR = ad/bc = (30 × 90) / (70 × 10) = 2700 / 700 = 3.86. Since the cancer is rare in the population, this OR is a good approximation of the RR you could not measure directly — the rare-disease assumption in action. Interpretation: the exposure is associated with roughly 3.9 times the odds of disease, and because the disease is rare, about 3.9 times the risk as well.
Key Points
- •Case-control designs fix case and control counts → RR is not estimable, OR only.
- •OR = ad/bc = 3.86 here.
- •For rare outcomes the OR approximates the RR you cannot measure directly.
7. Confidence Intervals and Interpretation Traps
Both RR and OR are skewed ratios, so confidence intervals are built on the log scale and then exponentiated. For the OR, the standard error of ln(OR) is √(1/a + 1/b + 1/c + 1/d); the 95% CI is exp[ln(OR) ± 1.96 × SE]. If the interval includes 1, the association is not statistically significant. The biggest interpretation trap is treating an OR as if it were an RR when the outcome is common — media reports doing this routinely exaggerate effects. A second trap is sign confusion with protective effects: an OR of 0.5 means halved odds, which for a rare outcome is roughly halved risk, but for a common outcome the RR would be closer to 1 than 0.5. When the outcome is common and the design allows it, prefer RR; when only an OR is available, say "odds," not "risk."
Key Points
- •CIs use ln(ratio) ± 1.96 × SE; for OR, SE of ln(OR) = √(1/a+1/b+1/c+1/d).
- •An interval containing 1 means no significant association.
- •Do not call an odds ratio a risk ratio when the outcome is common.
8. Working These Problems in StatsIQ
Snap a photo of a 2×2 table or an epidemiology problem and StatsIQ labels the cells, identifies the study design, computes the valid measure (RR and OR for cohorts/trials, OR only for case-control), and builds the log-scale confidence interval. It flags the common-outcome case where the OR diverges from the RR so your interpretation stays honest. This content is for educational purposes only.
Key Points
- •Automatic cell labeling and study-design detection.
- •Computes the valid measure with a log-scale confidence interval.
- •Warns when a common outcome makes the OR a poor stand-in for the RR.
Key Takeaways
- ★RR = [a/(a+b)] / [c/(c+d)]; OR = ad/bc (cross-product).
- ★Cohort/RCT → RR or OR valid; case-control → OR only (RR not estimable).
- ★OR ≈ RR when the outcome is rare (<~10%); OR overstates RR for common outcomes.
- ★OR is the exponentiated logistic-regression coefficient.
- ★CIs are computed on the log scale; SE of ln(OR) = √(1/a+1/b+1/c+1/d).
Practice Questions
1. A 2×2 cohort table has a = 50, b = 50, c = 20, d = 80. Find RR and OR.
2. Why can a case-control study report an OR but not an RR?
3. An OR of 0.40 for a rare outcome — what does it mean for risk?
FAQs
Common questions about this topic
When the outcome is rare in both groups — a common rule of thumb is below about 10% prevalence. Then a/(a+b) ≈ a/b and c/(c+d) ≈ c/d, so the risk ratio and odds ratio nearly coincide. This "rare-disease assumption" is what lets case-control studies, which can only compute an OR, stand in for the relative risk that could not be measured directly.
Logistic regression models the log-odds of the outcome as a linear function of predictors, so exponentiating a coefficient yields an odds ratio by construction. To get adjusted relative risks you need a different model — log-binomial regression or Poisson regression with robust standard errors — which directly models risk but can have convergence issues. For rare outcomes the logistic OR approximates the RR anyway.
It can, for common outcomes. Because odds magnify as probabilities approach 1, an OR sits further from 1 than the corresponding RR whenever the outcome is frequent. A widely cited pitfall is reporting an OR of, say, 2.0 for a 40%-prevalence outcome as if risk doubled, when the actual RR might be closer to 1.5. For common outcomes, prefer the RR when the design allows it.
Work on the log scale because ratios are right-skewed. Compute ln(OR), then SE = √(1/a + 1/b + 1/c + 1/d). The 95% CI is exp[ln(OR) ± 1.96 × SE]. If that interval includes 1, the association is not statistically significant at the 0.05 level. The same log-scale approach applies to relative risk with its own standard-error formula.
Yes. Snap a photo of the 2×2 table; StatsIQ assigns the a/b/c/d cells, detects whether the design supports relative risk, computes the appropriate measure with its log-scale confidence interval, and warns when a common outcome makes the OR a misleading stand-in for the RR. This content is for educational purposes only.