41 Probability and Bayes’ Theorem
41.1 What is Probability?
Probability is the numerical measure of the likelihood that an event will occur. It takes a value between 0 (impossible) and 1 (certain) (gupta2021?; ross2020?).
The classical formulation, attributed to Pierre-Simon Laplace (1812):
\[ P(A) = \dfrac{\text{Number of favourable outcomes}}{\text{Total number of equally likely outcomes}} \]
Probability theory began with seventeenth-century correspondence between Blaise Pascal and Pierre de Fermat on gambling problems, was systematised by Laplace, and was given a rigorous axiomatic foundation by Andrey N. Kolmogorov in 1933.
41.2 Four Approaches to Probability
| Approach | Definition | Strength | Weakness |
|---|---|---|---|
| Classical / Mathematical (Laplace) | Ratio of favourable to total equally likely outcomes | Closed-form for symmetric problems | Requires equally likely outcomes |
| Statistical / Empirical (von Mises) | Limit of relative frequency in repeated trials | Empirical and intuitive | Requires repeated trials |
| Subjective (Ramsey, de Finetti, Savage) | Degree of belief of a rational agent | Applies to one-off events | Personal, hard to verify |
| Axiomatic (Kolmogorov, 1933) | Probability satisfies a set of formal axioms | Mathematical rigour | Abstract; needs measure theory |
41.3 Basic Terminology
| Term | Definition |
|---|---|
| Random experiment | An experiment whose outcome cannot be predicted with certainty |
| Sample space (\(S\)) | Set of all possible outcomes |
| Outcome / sample point | A single element of the sample space |
| Event (\(A\)) | Any subset of the sample space |
| Simple event | Event with a single sample point |
| Compound event | Event made up of more than one sample point |
| Equally likely | Outcomes with the same probability |
| Mutually exclusive | Events that cannot occur simultaneously: \(A \cap B = \emptyset\) |
| Exhaustive events | Events that together cover the entire sample space |
| Independent events | Occurrence of one does not affect the probability of the other |
| Complementary event (\(A^c\)) | All outcomes not in \(A\); \(P(A) + P(A^c) = 1\) |
| Union and intersection | \(A \cup B\) — at least one of \(A, B\); \(A \cap B\) — both occur |
41.4 Kolmogorov’s Axioms of Probability
Kolmogorov’s three axioms anchor modern probability theory (kolmogorov1933?):
| Axiom | Statement |
|---|---|
| Non-negativity | \(P(A) \geq 0\) for any event \(A\) |
| Normalisation | \(P(S) = 1\) — probability of the sure event is one |
| Countable additivity | For mutually exclusive events \(A_1, A_2, \dots\): \(P(\cup A_i) = \sum P(A_i)\) |
From these three axioms, every other theorem of probability is derived.
41.5 Addition Theorem
For any two events \(A\) and \(B\):
\[ P(A \cup B) = P(A) + P(B) - P(A \cap B) \]
For mutually exclusive events (\(A \cap B = \emptyset\)):
\[ P(A \cup B) = P(A) + P(B) \]
The general addition theorem extends to three events:
\[ P(A \cup B \cup C) = P(A) + P(B) + P(C) - P(A \cap B) - P(B \cap C) - P(A \cap C) + P(A \cap B \cap C) \]
— the inclusion-exclusion principle.
41.6 Conditional Probability
The conditional probability of \(A\) given \(B\) is the probability that \(A\) occurs, knowing that \(B\) has already occurred:
\[ P(A | B) = \dfrac{P(A \cap B)}{P(B)}, \quad P(B) > 0 \]
41.7 Multiplication Theorem
Rearranging the conditional-probability definition:
\[ P(A \cap B) = P(A) \cdot P(B | A) = P(B) \cdot P(A | B) \]
For independent events: \(P(B | A) = P(B)\), so:
\[ P(A \cap B) = P(A) \cdot P(B) \]
41.8 Independence vs Mutual Exclusivity — A Common Trap
| Concept | Definition | Implication for \(P(A \cap B)\) |
|---|---|---|
| Independent | \(P(B | A) = P(B)\) | \(P(A \cap B) = P(A) \cdot P(B)\) |
| Mutually exclusive | \(A \cap B = \emptyset\) | \(P(A \cap B) = 0\) |
Two events with positive probabilities can be independent or mutually exclusive but not both — mutual exclusivity says one prevents the other (so they are strongly dependent), while independence says one carries no information about the other.
41.9 Total Probability Theorem
If \(\{B_1, B_2, \dots, B_n\}\) is a partition of the sample space (mutually exclusive and exhaustive), then for any event \(A\):
\[ P(A) = \sum_{i=1}^{n} P(A | B_i) \cdot P(B_i) \]
This is the Total Probability Theorem — a stepping stone to Bayes’ Theorem.
41.10 Bayes’ Theorem
The Reverend Thomas Bayes (posthumously published in 1763) gave a rule for updating prior beliefs in the light of new evidence. Pierre-Simon Laplace later developed the rule into its modern form.
For events \(B_i\) forming a partition and any event \(A\) with \(P(A) > 0\):
\[ P(B_i | A) = \dfrac{P(A | B_i) \cdot P(B_i)}{\sum_{j=1}^{n} P(A | B_j) \cdot P(B_j)} \]
| Term | Name | Working content |
|---|---|---|
| \(P(B_i)\) | Prior | Probability of \(B_i\) before observing \(A\) |
| \(P(A | B_i)\) | Likelihood | Probability of observing \(A\) given \(B_i\) |
| \(P(A)\) | Evidence / Marginal | Probability of \(A\) under all \(B_j\) |
| \(P(B_i | A)\) | Posterior | Updated probability of \(B_i\) after observing \(A\) |
In words: posterior ∝ likelihood × prior. Bayes’ theorem is the backbone of medical diagnosis, spam filtering, machine-learning classification, decision analysis and Bayesian econometrics.
41.10.1 Worked example — medical test
A disease affects 1 per cent of the population. A test is 99 per cent sensitive (true positive) and 95 per cent specific (true negative). What is the probability a person who tests positive actually has the disease?
Let \(D\) = has disease, \(D^c\) = does not. Let \(T+\) = positive test.
- \(P(D) = 0.01\), \(P(D^c) = 0.99\).
- \(P(T+|D) = 0.99\), \(P(T+|D^c) = 0.05\).
- \(P(T+) = 0.99 \times 0.01 + 0.05 \times 0.99 = 0.0099 + 0.0495 = 0.0594\).
- \(P(D | T+) = (0.99 \times 0.01) / 0.0594 = 0.0099 / 0.0594 \approx \mathbf{0.167}\).
Despite a positive test, the probability of disease is only about 16.7 per cent — because the disease is rare and false positives are common. The result — counter-intuitive at first — is the base-rate fallacy corrected by Bayes’ Theorem.
41.11 Worked Numerical — Two Coins
Two unbiased coins are tossed. The probability of getting at least one head is:
- Sample space \(S = \{HH, HT, TH, TT\}\). \(|S| = 4\).
- Favourable outcomes (at least one head) = \(\{HH, HT, TH\}\). \(|A| = 3\).
- $P(A) = 3/4 = $ 0.75.
Equivalently, \(P(\text{at least one H}) = 1 - P(TT) = 1 - 1/4 = 0.75\).
41.12 Useful Identities
| Identity | Statement |
|---|---|
| Complement | \(P(A^c) = 1 - P(A)\) |
| At least one event | \(P(\cup A_i) = 1 - P(\cap A_i^c)\) |
| Independent events | \(P(A_1 \cap A_2 \cap \dots) = P(A_1) \cdot P(A_2) \cdots\) |
| Conditional rearrangement | \(P(A \cap B) = P(A) \cdot P(B|A)\) |
| Bayes (two-event form) | \(P(B|A) = \dfrac{P(A|B) P(B)}{P(A|B) P(B) + P(A|B^c) P(B^c)}\) |
41.13 Exam-Pattern MCQs
View solution
| Approach | Core idea | ||
| (i) | Classical / Laplace | (a) | Limit of relative frequency in repeated trials |
| (ii) | Statistical / Empirical | (b) | Probability satisfies formal mathematical axioms |
| (iii) | Subjective | (c) | Ratio of favourable to equally likely outcomes |
| (iv) | Axiomatic / Kolmogorov | (d) | Degree of belief of a rational agent |
View solution
View solution
| Concept | Formula | ||||
| (i) | Conditional probability $P(A | B)$ | (a) | $P(A) \cdot P(B | A)$ |
| (ii) | Multiplication for general events | (b) | $P(A) + P(B) - P(A \cap B)$ | ||
| (iii) | Addition for any two events | (c) | $P(A \cap B) / P(B)$ | ||
| (iv) | Independence implies | (d) | $P(A \cap B) = P(A) \cdot P(B)$ |
View solution
View solution
View solution
View solution
| Term | Definition | ||
| (i) | Mutually exclusive events | (a) | $P(A \cap B) = P(A) \cdot P(B)$ |
| (ii) | Independent events | (b) | $A \cap B = \emptyset$, so $P(A \cap B) = 0$ |
| (iii) | Exhaustive events | (c) | Together cover the entire sample space |
| (iv) | Complement of A | (d) | All outcomes not in A |
View solution
- Probability lies in [0, 1].
- Four approaches: Classical (Laplace), Statistical / Empirical (von Mises), Subjective (Ramsey, de Finetti, Savage), Axiomatic (Kolmogorov 1933).
- Kolmogorov’s three axioms: non-negativity, normalisation, countable additivity.
- Addition theorem: \(P(A \cup B) = P(A) + P(B) - P(A \cap B)\).
- Conditional: \(P(A|B) = P(A \cap B)/P(B)\).
- Multiplication: \(P(A \cap B) = P(A) \cdot P(B|A)\). For independent events: \(P(A) \cdot P(B)\).
- Independent ≠ Mutually exclusive. ME → strongly dependent (one prevents the other).
- Bayes’ Theorem (1763): \(P(B_i|A) = \dfrac{P(A|B_i) P(B_i)}{\sum_j P(A|B_j) P(B_j)}\). Posterior ∝ Likelihood × Prior.
- Bayes is the foundation of medical diagnosis, spam filters, machine learning, decision analysis, Bayesian econometrics.
- Base-rate fallacy: rare disease + imperfect test → most positives are false positives.
- Useful identity: \(P(\text{at least one}) = 1 - P(\text{none})\).