41 Probability and Bayes’ Theorem

41.1 What is Probability?

Probability is the numerical measure of the likelihood that an event will occur. It takes a value between 0 (impossible) and 1 (certain) (gupta2021?; ross2020?).

The classical formulation, attributed to Pierre-Simon Laplace (1812):

\[ P(A) = \dfrac{\text{Number of favourable outcomes}}{\text{Total number of equally likely outcomes}} \]

Probability theory began with seventeenth-century correspondence between Blaise Pascal and Pierre de Fermat on gambling problems, was systematised by Laplace, and was given a rigorous axiomatic foundation by Andrey N. Kolmogorov in 1933.

41.2 Four Approaches to Probability

Four Approaches to Probability

Approach	Definition	Strength	Weakness
Classical / Mathematical (Laplace)	Ratio of favourable to total equally likely outcomes	Closed-form for symmetric problems	Requires equally likely outcomes
Statistical / Empirical (von Mises)	Limit of relative frequency in repeated trials	Empirical and intuitive	Requires repeated trials
Subjective (Ramsey, de Finetti, Savage)	Degree of belief of a rational agent	Applies to one-off events	Personal, hard to verify
Axiomatic (Kolmogorov, 1933)	Probability satisfies a set of formal axioms	Mathematical rigour	Abstract; needs measure theory

41.3 Basic Terminology

Twelve Foundational Terms in Probability

Term	Definition
Random experiment	An experiment whose outcome cannot be predicted with certainty
Sample space ($S$)	Set of all possible outcomes
Outcome / sample point	A single element of the sample space
Event ($A$)	Any subset of the sample space
Simple event	Event with a single sample point
Compound event	Event made up of more than one sample point
Equally likely	Outcomes with the same probability
Mutually exclusive	Events that cannot occur simultaneously: $A \cap B = \emptyset$
Exhaustive events	Events that together cover the entire sample space
Independent events	Occurrence of one does not affect the probability of the other
Complementary event ($A^c$)	All outcomes not in $A$; $P(A) + P(A^c) = 1$
Union and intersection	$A \cup B$ — at least one of $A, B$; $A \cap B$ — both occur

41.4 Kolmogorov’s Axioms of Probability

Kolmogorov’s three axioms anchor modern probability theory (kolmogorov1933?):

Three Kolmogorov Axioms

Axiom	Statement
Non-negativity	$P(A) \geq 0$ for any event $A$
Normalisation	$P(S) = 1$ — probability of the sure event is one
Countable additivity	For mutually exclusive events $A_1, A_2, \dots$: $P(\cup A_i) = \sum P(A_i)$

From these three axioms, every other theorem of probability is derived.

41.5 Addition Theorem

For any two events $A$ and $B$:

\[ P(A \cup B) = P(A) + P(B) - P(A \cap B) \]

For mutually exclusive events ($A \cap B = \emptyset$):

\[ P(A \cup B) = P(A) + P(B) \]

The general addition theorem extends to three events:

\[ P(A \cup B \cup C) = P(A) + P(B) + P(C) - P(A \cap B) - P(B \cap C) - P(A \cap C) + P(A \cap B \cap C) \]

— the inclusion-exclusion principle.

41.6 Conditional Probability

The conditional probability of $A$ given $B$ is the probability that $A$ occurs, knowing that $B$ has already occurred:

\[ P(A | B) = \dfrac{P(A \cap B)}{P(B)}, \quad P(B) > 0 \]

41.7 Multiplication Theorem

Rearranging the conditional-probability definition:

\[ P(A \cap B) = P(A) \cdot P(B | A) = P(B) \cdot P(A | B) \]

For independent events: $P(B | A) = P(B)$, so:

\[ P(A \cap B) = P(A) \cdot P(B) \]

41.8 Independence vs Mutual Exclusivity — A Common Trap

Independence vs Mutual Exclusivity

Concept	Definition	Implication for $P(A \cap B)$
Independent	$P(B \| A) = P(B)$	$P(A \cap B) = P(A) \cdot P(B)$
Mutually exclusive	$A \cap B = \emptyset$	$P(A \cap B) = 0$

Two events with positive probabilities can be independent or mutually exclusive but not both — mutual exclusivity says one prevents the other (so they are strongly dependent), while independence says one carries no information about the other.

41.9 Total Probability Theorem

If $\{B_1, B_2, \dots, B_n\}$ is a partition of the sample space (mutually exclusive and exhaustive), then for any event $A$:

\[ P(A) = \sum_{i=1}^{n} P(A | B_i) \cdot P(B_i) \]

This is the Total Probability Theorem — a stepping stone to Bayes’ Theorem.

41.10 Bayes’ Theorem

The Reverend Thomas Bayes (posthumously published in 1763) gave a rule for updating prior beliefs in the light of new evidence. Pierre-Simon Laplace later developed the rule into its modern form.

For events $B_i$ forming a partition and any event $A$ with $P(A) > 0$:

\[ P(B_i | A) = \dfrac{P(A | B_i) \cdot P(B_i)}{\sum_{j=1}^{n} P(A | B_j) \cdot P(B_j)} \]

Components of Bayes’ Theorem

Term	Name	Working content
$P(B_i)$	Prior	Probability of $B_i$ before observing $A$
$P(A \| B_i)$	Likelihood	Probability of observing $A$ given $B_i$
$P(A)$	Evidence / Marginal	Probability of $A$ under all $B_j$
$P(B_i \| A)$	Posterior	Updated probability of $B_i$ after observing $A$

In words: posterior ∝ likelihood × prior. Bayes’ theorem is the backbone of medical diagnosis, spam filtering, machine-learning classification, decision analysis and Bayesian econometrics.

41.10.1 Worked example — medical test

A disease affects 1 per cent of the population. A test is 99 per cent sensitive (true positive) and 95 per cent specific (true negative). What is the probability a person who tests positive actually has the disease?

Let $D$ = has disease, $D^c$ = does not. Let $T+$ = positive test.

$P(D) = 0.01$, $P(D^c) = 0.99$.
$P(T+|D) = 0.99$, $P(T+|D^c) = 0.05$.
$P(T+) = 0.99 \times 0.01 + 0.05 \times 0.99 = 0.0099 + 0.0495 = 0.0594$.
$P(D | T+) = (0.99 \times 0.01) / 0.0594 = 0.0099 / 0.0594 \approx \mathbf{0.167}$.

Despite a positive test, the probability of disease is only about 16.7 per cent — because the disease is rare and false positives are common. The result — counter-intuitive at first — is the base-rate fallacy corrected by Bayes’ Theorem.

41.11 Worked Numerical — Two Coins

Two unbiased coins are tossed. The probability of getting at least one head is:

Sample space $S = \{HH, HT, TH, TT\}$. $|S| = 4$.
Favourable outcomes (at least one head) = $\{HH, HT, TH\}$. $|A| = 3$.
$P(A) = 3/4 = $ 0.75.

Equivalently, $P(\text{at least one H}) = 1 - P(TT) = 1 - 1/4 = 0.75$.

41.12 Useful Identities

Key Identities

Identity	Statement
Complement	$P(A^c) = 1 - P(A)$
At least one event	$P(\cup A_i) = 1 - P(\cap A_i^c)$
Independent events	$P(A_1 \cap A_2 \cap \dots) = P(A_1) \cdot P(A_2) \cdots$
Conditional rearrangement	$P(A \cap B) = P(A) \cdot P(B\|A)$
Bayes (two-event form)	$P(B\|A) = \dfrac{P(A\|B) P(B)}{P(A\|B) P(B) + P(A\|B^c) P(B^c)}$

41.13 Exam-Pattern MCQs

Q 01

Which of the following is not one of Kolmogorov's three axioms of probability?

A$P(A) \geq 0$ for any event $A$
B$P(S) = 1$
CFor mutually exclusive $A_1, A_2$: $P(A_1 \cup A_2) = P(A_1) + P(A_2)$
D$P(A) > 1$ for impossible events

View solution

Correct Option: D

Probabilities lie between 0 and 1 — option D is the negation of an axiom.

Q 02

Match each approach to probability with its core idea:

	Approach		Core idea
(i)	Classical / Laplace	(a)	Limit of relative frequency in repeated trials
(ii)	Statistical / Empirical	(b)	Probability satisfies formal mathematical axioms
(iii)	Subjective	(c)	Ratio of favourable to equally likely outcomes
(iv)	Axiomatic / Kolmogorov	(d)	Degree of belief of a rational agent

A(i)-(c), (ii)-(a), (iii)-(d), (iv)-(b)
B(i)-(a), (ii)-(b), (iii)-(c), (iv)-(d)
C(i)-(b), (ii)-(d), (iii)-(a), (iv)-(c)
D(i)-(d), (ii)-(c), (iii)-(b), (iv)-(a)

View solution

Correct Option: A

Q 03

Two unbiased dice are rolled. The probability of getting a sum of 7 is:

A1/6
B1/8
C1/9
D1/12

View solution

Correct Option: A

Out of 36 outcomes, the sum is 7 in 6 outcomes (1+6, 2+5, 3+4, 4+3, 5+2, 6+1). $P = 6/36 = $ 1/6.

Q 04

Match the probability concept with its formula:

	Concept		Formula
(i)	Conditional probability $P(A	B)$	(a)	$P(A) \cdot P(B	A)$
(ii)	Multiplication for general events	(b)	$P(A) + P(B) - P(A \cap B)$
(iii)	Addition for any two events	(c)	$P(A \cap B) / P(B)$
(iv)	Independence implies	(d)	$P(A \cap B) = P(A) \cdot P(B)$

A(i)-(c), (ii)-(a), (iii)-(b), (iv)-(d)
B(i)-(a), (ii)-(b), (iii)-(c), (iv)-(d)
C(i)-(b), (ii)-(c), (iii)-(d), (iv)-(a)
D(i)-(d), (ii)-(c), (iii)-(a), (iv)-(b)

View solution

Correct Option: A

Q 05

Two events $A$ and $B$ have $P(A) = 0.6$, $P(B) = 0.4$. Which of the following is correct if $A$ and $B$ are independent?

A$P(A \cap B) = 0$
B$P(A \cap B) = 0.24$
C$P(A \cap B) = 1$
D$P(A \cup B) = P(A) + P(B)$

View solution

Correct Option: B

For independent events, $P(A \cap B) = 0.6 \times 0.4 = $ 0.24.

Q 06

A factory produces 60 % of items at machine X (with 1 % defectives) and 40 % at machine Y (with 5 % defectives). The probability that a defective item came from machine Y is approximately:

A0.40
B0.50
C0.77
D0.95

View solution

Correct Option: C

Use Bayes. $P(\text{defective}) = 0.6 \times 0.01 + 0.4 \times 0.05 = 0.006 + 0.020 = 0.026$. $P(Y | \text{defective}) = 0.020 / 0.026 \approx $ 0.77.

Q 07

Arrange the following components of Bayes' Theorem in the order in which they appear (numerator first, then denominator): (i) Likelihood $P(A|B_i)$ (ii) Posterior $P(B_i|A)$ (iii) Marginal evidence $P(A) = \sum_j P(A|B_j) P(B_j)$ (iv) Prior $P(B_i)$

A(ii), (i), (iv), (iii)
B(i), (ii), (iv), (iii)
C(iv), (iii), (ii), (i)
D(iii), (ii), (i), (iv)

View solution

Correct Option: A

Posterior = (Likelihood × Prior) / Evidence.

Q 08

Match each term with its definition:

	Term		Definition
(i)	Mutually exclusive events	(a)	$P(A \cap B) = P(A) \cdot P(B)$
(ii)	Independent events	(b)	$A \cap B = \emptyset$, so $P(A \cap B) = 0$
(iii)	Exhaustive events	(c)	Together cover the entire sample space
(iv)	Complement of A	(d)	All outcomes not in A

A(i)-(b), (ii)-(a), (iii)-(c), (iv)-(d)
B(i)-(a), (ii)-(b), (iii)-(c), (iv)-(d)
C(i)-(c), (ii)-(d), (iii)-(b), (iv)-(a)
D(i)-(d), (ii)-(c), (iii)-(a), (iv)-(b)

View solution

Correct Option: A

Quick recall

Probability lies in [0, 1].
Four approaches: Classical (Laplace), Statistical / Empirical (von Mises), Subjective (Ramsey, de Finetti, Savage), Axiomatic (Kolmogorov 1933).
Kolmogorov’s three axioms: non-negativity, normalisation, countable additivity.
Addition theorem: $P(A \cup B) = P(A) + P(B) - P(A \cap B)$.
Conditional: $P(A|B) = P(A \cap B)/P(B)$.
Multiplication: $P(A \cap B) = P(A) \cdot P(B|A)$. For independent events: $P(A) \cdot P(B)$.
Independent ≠ Mutually exclusive. ME → strongly dependent (one prevents the other).
Bayes’ Theorem (1763): $P(B_i|A) = \dfrac{P(A|B_i) P(B_i)}{\sum_j P(A|B_j) P(B_j)}$. Posterior ∝ Likelihood × Prior.
Bayes is the foundation of medical diagnosis, spam filters, machine learning, decision analysis, Bayesian econometrics.
Base-rate fallacy: rare disease + imperfect test → most positives are false positives.
Useful identity: $P(\text{at least one}) = 1 - P(\text{none})$.

Concept	Definition	Implication for \(P(A \cap B)\)
Independent	\(P(B \| A) = P(B)\)	\(P(A \cap B) = P(A) \cdot P(B)\)
Mutually exclusive	\(A \cap B = \emptyset\)	\(P(A \cap B) = 0\)

Term	Name	Working content
\(P(B_i)\)	Prior	Probability of \(B_i\) before observing \(A\)
\(P(A \| B_i)\)	Likelihood	Probability of observing \(A\) given \(B_i\)
\(P(A)\)	Evidence / Marginal	Probability of \(A\) under all \(B_j\)
\(P(B_i \| A)\)	Posterior	Updated probability of \(B_i\) after observing \(A\)