🎯

fundamentalsintermediate40-50 min

Conditional Probability and Bayes Theorem: Worked Examples

Conditional probability quantifies how an event probability changes given new information; Bayes theorem inverts the conditioning. This guide walks through the formulas with 6 worked examples covering medical screening, courtroom probability, and the classic Monty Hall problem.

What You'll Learn

✓Apply the conditional probability formula P(A|B) = P(A∩B) / P(B)
✓Use Bayes theorem to invert conditional probabilities
✓Recognize and avoid the base rate fallacy
✓Apply Bayes thinking to medical screening, courtroom evidence, and decision problems

1. Direct Answer: The Two Formulas

Conditional probability formula: P(A|B) = P(A and B) / P(B). Read as "the probability of A given that B has occurred." This conditions our universe on B and asks what fraction of B also has A. Bayes theorem flips the conditioning: P(A|B) = P(B|A) × P(A) / P(B). It lets you compute P(A given B) when you know P(B given A) — which is often easier to determine. The denominator P(B) can be expanded using the law of total probability: P(B) = P(B|A) × P(A) + P(B|not A) × P(not A). Together, these formulas handle the vast majority of conditional probability and Bayesian reasoning problems on AP Stats and intro probability.

Key Points

•P(A|B) = P(A ∩ B) / P(B) — the conditional probability formula
•Bayes theorem: P(A|B) = P(B|A) × P(A) / P(B) — inverts the conditioning
•P(B) is often expanded via total probability: P(B) = P(B|A)P(A) + P(B|not A)P(not A)
•P(A|B) ≠ P(B|A) in general — confusing them is the most common error
•Bayes thinking integrates prior probability (base rate) with new evidence

2. Worked Example 1: Medical Screening Test (Base Rate Trap)

A disease has prevalence 1% in the population. A screening test has 99% sensitivity (correctly detects 99% of disease cases) and 99% specificity (correctly identifies 99% of healthy people as healthy). A patient tests POSITIVE. What is the probability they actually have the disease? Most people guess 99% (the test is 99% accurate, so a positive must be 99% likely real). The actual answer is shockingly low. Using Bayes theorem: P(Disease) = 0.01 (prior, base rate) P(Positive | Disease) = 0.99 (sensitivity) P(Positive | No Disease) = 0.01 (false positive rate = 1 − specificity) P(No Disease) = 0.99 P(Positive) = P(Pos|Dis)×P(Dis) + P(Pos|NoDis)×P(NoDis) = 0.99 × 0.01 + 0.01 × 0.99 = 0.0099 + 0.0099 = 0.0198 P(Disease | Positive) = (0.99 × 0.01) / 0.0198 = 0.0099 / 0.0198 = 0.50 The correct answer: 50%, not 99%. Even with a 99%-accurate test, when the disease is rare, half of all positive tests are false positives. This is the base rate fallacy — neglecting the prior probability dramatically overestimates the post-test probability.

Key Points

•Even highly accurate tests produce many false positives when the disease is rare
•P(Disease | Positive) depends on disease PREVALENCE, not just test accuracy
•Always identify the base rate (prior) and the test characteristics separately
•Two-step expansion: compute P(B) via total probability, then apply Bayes

3. Worked Example 2: Two-Test Confirmation

After the positive screening test (Example 1), the patient takes a CONFIRMATORY test with 99% sensitivity and 99% specificity. The confirmatory test is also positive. Now what is the probability of disease? The key insight: the patient's prior is no longer 1%. After the first positive test, their prior shifted to 50%. Plug into Bayes again: P(Disease) = 0.50 (updated prior) P(Positive | Disease) = 0.99 P(Positive | No Disease) = 0.01 P(No Disease) = 0.50 P(Positive) = 0.99 × 0.50 + 0.01 × 0.50 = 0.50 P(Disease | Positive) = (0.99 × 0.50) / 0.50 = 0.99 Now the answer IS 99%. With two independent positive tests, the disease probability is very high. This is why screening protocols use confirmatory testing — one positive test is suspicious; two positives are decisive. The Bayesian update with each new evidence is the rigorous way to think about sequential testing.

Key Points

•After each test, the new prior is the previous posterior
•Two independent confirming tests dramatically increase certainty
•This is the formal basis of confirmatory testing in medicine
•Bayesian updating handles sequential evidence cleanly

4. Worked Example 3: The Three-Card Problem

Three cards in a hat: one has BLUE on both sides, one has WHITE on both sides, one has BLUE on one side and WHITE on the other. You draw a card and see BLUE on the visible side. What is the probability the OTHER side is also BLUE? Intuitive guess: 50% (the card is either the blue-blue card or the blue-white card; both are equally likely). Wrong. The key insight: there are 3 BLUE FACES across all cards. 2 of them are on the BB card; 1 is on the BW card. Given that you see a blue face, the conditional probability favors the BB card. Formally: P(BB) = 1/3 (prior — one of three cards) P(see Blue | BB) = 1 (both faces are blue) P(see Blue | BW) = 1/2 (one face is blue) P(see Blue | WW) = 0 P(see Blue) = (1/3)(1) + (1/3)(1/2) + (1/3)(0) = 1/3 + 1/6 = 1/2 P(BB | see Blue) = (1)(1/3) / (1/2) = 2/3 Probability the other side is blue: 2/3, not 1/2. This is a classic Bayesian counter-intuition — the question conditions on the OBSERVED face, and the count of blue faces (not cards) determines the posterior.

Key Points

•Conditioning on an observation shifts probability toward outcomes that produce that observation
•Count "ways the evidence could occur" not just "cards that could be drawn"
•Many probability puzzles trap intuition by confusing card counts with face counts
•Bayes formalism produces the correct answer mechanically

5. Worked Example 4: Courtroom Evidence (Prosecutor's Fallacy)

A DNA match between a suspect and crime scene evidence is a "1 in 1,000,000" match by random chance. The prosecutor argues: "The probability the defendant is innocent is 1 in 1,000,000." Is this argument valid? No. It commits the prosecutor's fallacy — confusing P(Match | Innocent) with P(Innocent | Match). P(Match | Innocent) = 1/1,000,000 (the random match probability) P(Innocent | Match) requires the prior probability of guilt and the population size. Suppose the suspect was identified by an unrelated police database search across 5 million records. Then there are roughly 5 expected random matches in the database (5,000,000 × 1/1,000,000). One is the actual perpetrator (if the perpetrator is in the database); the other 4 are innocent matches. P(Innocent | Match) is approximately 4/5 = 80%, not 1/1,000,000. The correct interpretation requires the prior probability of guilt and the search context. Database trawls have very different evidentiary weight than directed searches based on independent suspicion. This is one of the most consequential applications of Bayesian thinking — many wrongful convictions have been associated with the prosecutor's fallacy.

Key Points

•P(Evidence | Innocent) ≠ P(Innocent | Evidence) — confusing these is the prosecutor's fallacy
•Database trawl matches require correction for the search size
•Prior probability (base rate of guilt) is essential for Bayesian conclusion
•This affects real legal outcomes — used in landmark wrongful conviction reversals

6. Worked Example 5: Monty Hall Problem

You're on a game show. Three doors: behind one is a car, behind the other two are goats. You pick door 1. The host (who knows what's behind each door) opens door 3 to reveal a goat. The host offers to let you switch to door 2. Should you switch? Intuitive answer: 50/50. Wrong. Switch — you have 2/3 chance of winning if you switch, 1/3 if you stay. Formally: P(Car behind 1) = 1/3 (your initial guess) P(Car behind 2 or 3) = 2/3 combined The host opens a goat door deliberately. Door 3 is now KNOWN goat. The combined 2/3 probability collapses entirely onto door 2. Mathematically: P(Car at 2 | Host opens 3) = P(Host opens 3 | Car at 2) × P(Car at 2) / P(Host opens 3) = 1 × (1/3) / (1/2) = 2/3 The host's deliberate revelation of a goat is INFORMATION that updates the probabilities. The trap is treating the host's action as random; it isn't. This is why Bayes thinking matters — it forces you to ask "given the rules of the host's behavior, what does the new information mean?"

Key Points

•The host's deliberate (non-random) action carries information
•Switching wins 2/3 of the time; staying wins 1/3
•The intuition trap: treating the host's reveal as 50/50 random
•Bayes formalism produces the correct answer mechanically

7. Worked Example 6: Defective Product Diagnosis

A factory has 3 production lines. Line A produces 50% of output with 2% defect rate; Line B produces 30% with 4% defect rate; Line C produces 20% with 5% defect rate. A randomly chosen defective unit is found. What is the probability it came from Line C? P(C) = 0.20 P(Defect | C) = 0.05 P(Defect) = P(D|A)P(A) + P(D|B)P(B) + P(D|C)P(C) = 0.02 × 0.50 + 0.04 × 0.30 + 0.05 × 0.20 = 0.010 + 0.012 + 0.010 = 0.032 P(C | Defect) = (0.05 × 0.20) / 0.032 = 0.010 / 0.032 = 0.3125 Line C contributed 31.25% of defects despite producing only 20% of total output. Higher defect rate × lower volume produces a meaningful but not dominant share of defects. Which line should the QA team investigate first? Often Line C (highest defect rate) — but the answer depends on whether the goal is reducing total defect count (look at line B with the most defects) or improving the worst-quality line (line C). Bayesian probability informs the decision; it doesn't make the decision for you.

Key Points

•Total probability law: sum P(D|line) × P(line) across all lines
•Bayes posterior depends on both the conditional rate and the volume
•High defect rate alone doesn't mean highest contribution to total defects
•Real-world quality decisions depend on the goal (rate reduction vs total count reduction)

Key Takeaways

★P(A|B) = P(A∩B) / P(B) — conditional probability formula
★Bayes theorem: P(A|B) = P(B|A) × P(A) / P(B)
★Total probability: P(B) = P(B|A)P(A) + P(B|not A)P(not A)
★P(A|B) ≠ P(B|A) — confusing them is the most common error
★Base rate fallacy: ignoring prior probability dramatically overestimates posterior
★Bayesian updating: posterior of one test becomes the prior for the next

Practice Questions

1. A disease has 5% prevalence. A test has 95% sensitivity and 90% specificity. What is the probability of disease given a positive test?

P(D) = 0.05; P(+|D) = 0.95; P(+|notD) = 0.10. P(+) = 0.95×0.05 + 0.10×0.95 = 0.0475 + 0.095 = 0.1425. P(D|+) = (0.95×0.05) / 0.1425 = 0.0475/0.1425 = 0.333. About 33%.

2. Two boxes: Box A has 3 red and 7 blue marbles; Box B has 6 red and 4 blue marbles. You pick a box at random and draw a red marble. What is the probability it came from Box A?

P(A) = 0.50; P(red|A) = 0.30; P(red|B) = 0.60. P(red) = 0.30×0.50 + 0.60×0.50 = 0.45. P(A|red) = (0.30×0.50) / 0.45 = 0.15/0.45 = 0.333. About 33%.

3. Why is the base rate fallacy so common in interpreting medical tests?

People hear "99% accurate" and intuit "99% probability of disease given positive." But the base rate (disease prevalence) determines how many false positives the test produces relative to true positives. When prevalence is low, even a small false positive rate produces many false positives. The correct answer requires applying Bayes theorem with both the test characteristics AND the prevalence.

Study with AI

Get personalized help and instant answers anytime.

Download StatsIQ

FAQs

Common questions about this topic

Use the conditional probability formula directly when P(A and B) and P(B) are both easy to compute. Use Bayes theorem when you have P(B|A) but want P(A|B) — that is, you have a "forward" conditional and need to invert it. Most real-world applications (medical testing, courtroom evidence, diagnostic reasoning) involve inverting a forward conditional, so Bayes is the typical workhorse.

The base rate fallacy is the tendency to ignore prior probability (base rate) when interpreting new evidence. The classic example: a 99%-accurate test for a 1%-prevalent disease produces only 50% probability of disease given a positive test, not 99%. Most people (including many medical professionals when not paying attention) get this wrong because they focus on the test characteristics and forget the prevalence.

Frequentist statistics treats parameters as fixed and data as random; it produces p-values and confidence intervals. Bayesian statistics treats parameters as random and updates beliefs based on data; it produces posterior distributions and credible intervals. Bayes theorem is the mathematical bridge — both approaches use it, but Bayesian inference uses it as the primary tool for updating beliefs. Most intro stats courses teach frequentist; Bayes theorem appears as a probability tool that supports both.

The intuition trap is treating the host's door reveal as 50/50 random. The host KNOWS where the car is and INTENTIONALLY opens a goat door — that's information. The 2/3 prior probability that the car is behind a non-chosen door collapses entirely onto the remaining unopened door. Most people, even mathematicians, initially get Monty Hall wrong; the formal Bayes approach produces the correct answer mechanically.

Always ask: "What is P(Evidence | Innocent)?" vs "What is P(Innocent | Evidence)?" These are different. The first is what the test gives you; the second is what you actually want to know. Bayes theorem connects them only if you also know the prior probability of guilt and the population context. Without those, going from P(Evidence | Innocent) to P(Innocent | Evidence) is mathematically wrong.

Yes. Provide the problem (medical screening, courtroom evidence, sequential testing, defective products) and StatsIQ walks through the prior, conditional probabilities, total probability expansion, and Bayes theorem application step by step. Especially useful for AP Stats and intro probability where the formula application is straightforward but identifying the right priors and conditionals is the hard part. This content is for educational purposes only and does not constitute statistical advice.

More Study Guides

🎯 AP Statistics 🔬 Introduction to 📈 Regression Analysis 🎲 Probability Foundations 📊 Understanding Statistical 🧪 ANOVA and 📉 Data Visualization 🔄 Bayesian vs 📊 What Is 📐 What Is 🔗 Correlation vs 📐 The Central 📏 Confidence Intervals:📐 P-Values and 📐 Chi-Square Tests ⚠️ Type I 🎲 Sampling Methods 📈 Introduction to 📏 Effect Size 📉 Multiple Regression:🔀 Non-Parametric Tests:🎯 How to 🧪 A/B Testing 🧹 Data Cleaning ⏱️ Survival Analysis:🔗 Introduction to 📈 Time Series 🔬 Principal Component 🔀 How to 📐 Two-Sample t-Test 📊 How to 🔀 Paired vs 📋 How to 📊 Z-Scores and 📈 Coefficient of 🎲 Binomial Probability 🎲 Expected Value 📐 Standard Error 🎯 Margin of 📊 Contingency Tables 📉 Poisson Distribution:📏 Cohen's d 🔗 Pearson vs ⚖️ One-Tailed vs 🔔 Normal Distribution 📉 Linear Regression 📊 Mean vs 🎯 Confidence vs 📊 Two-Way ANOVA:⚡ Statistical Power 🎯 Conditional Probability 🎲 Permutations vs

Conditional Probability and Bayes Theorem: Worked Examples

What You'll Learn

1. Direct Answer: The Two Formulas

Key Points

2. Worked Example 1: Medical Screening Test (Base Rate Trap)

Key Points

3. Worked Example 2: Two-Test Confirmation

Key Points

4. Worked Example 3: The Three-Card Problem

Key Points

5. Worked Example 4: Courtroom Evidence (Prosecutor's Fallacy)

Key Points

6. Worked Example 5: Monty Hall Problem

Key Points

7. Worked Example 6: Defective Product Diagnosis

Key Points

Key Takeaways

Practice Questions

Study with AI

FAQs

When should I use the conditional probability formula vs Bayes theorem?

What is the base rate fallacy?

How does Bayesian thinking differ from frequentist statistics?

Why is Monty Hall so counterintuitive?

How do I avoid the prosecutor's fallacy in real-world evidence interpretation?

Can StatsIQ help me work through Bayesian problems?

More Study Guides