41  Probability and Bayes’ Theorem

41.1 What is Probability?

Probability is the numerical measure of the likelihood that an event will occur. It takes a value between 0 (impossible) and 1 (certain) (gupta2021?; ross2020?).

The classical formulation, attributed to Pierre-Simon Laplace (1812):

\[ P(A) = \dfrac{\text{Number of favourable outcomes}}{\text{Total number of equally likely outcomes}} \]

Probability theory began with seventeenth-century correspondence between Blaise Pascal and Pierre de Fermat on gambling problems, was systematised by Laplace, and was given a rigorous axiomatic foundation by Andrey N. Kolmogorov in 1933.

41.2 Four Approaches to Probability

TipFour Approaches to Probability
Approach Definition Strength Weakness
Classical / Mathematical (Laplace) Ratio of favourable to total equally likely outcomes Closed-form for symmetric problems Requires equally likely outcomes
Statistical / Empirical (von Mises) Limit of relative frequency in repeated trials Empirical and intuitive Requires repeated trials
Subjective (Ramsey, de Finetti, Savage) Degree of belief of a rational agent Applies to one-off events Personal, hard to verify
Axiomatic (Kolmogorov, 1933) Probability satisfies a set of formal axioms Mathematical rigour Abstract; needs measure theory

41.3 Basic Terminology

TipTwelve Foundational Terms in Probability
Term Definition
Random experiment An experiment whose outcome cannot be predicted with certainty
Sample space (\(S\)) Set of all possible outcomes
Outcome / sample point A single element of the sample space
Event (\(A\)) Any subset of the sample space
Simple event Event with a single sample point
Compound event Event made up of more than one sample point
Equally likely Outcomes with the same probability
Mutually exclusive Events that cannot occur simultaneously: \(A \cap B = \emptyset\)
Exhaustive events Events that together cover the entire sample space
Independent events Occurrence of one does not affect the probability of the other
Complementary event (\(A^c\)) All outcomes not in \(A\); \(P(A) + P(A^c) = 1\)
Union and intersection \(A \cup B\) — at least one of \(A, B\); \(A \cap B\) — both occur

41.4 Kolmogorov’s Axioms of Probability

Kolmogorov’s three axioms anchor modern probability theory (kolmogorov1933?):

TipThree Kolmogorov Axioms
Axiom Statement
Non-negativity \(P(A) \geq 0\) for any event \(A\)
Normalisation \(P(S) = 1\) — probability of the sure event is one
Countable additivity For mutually exclusive events \(A_1, A_2, \dots\): \(P(\cup A_i) = \sum P(A_i)\)

From these three axioms, every other theorem of probability is derived.

41.5 Addition Theorem

For any two events \(A\) and \(B\):

\[ P(A \cup B) = P(A) + P(B) - P(A \cap B) \]

For mutually exclusive events (\(A \cap B = \emptyset\)):

\[ P(A \cup B) = P(A) + P(B) \]

The general addition theorem extends to three events:

\[ P(A \cup B \cup C) = P(A) + P(B) + P(C) - P(A \cap B) - P(B \cap C) - P(A \cap C) + P(A \cap B \cap C) \]

— the inclusion-exclusion principle.

41.6 Conditional Probability

The conditional probability of \(A\) given \(B\) is the probability that \(A\) occurs, knowing that \(B\) has already occurred:

\[ P(A | B) = \dfrac{P(A \cap B)}{P(B)}, \quad P(B) > 0 \]

41.7 Multiplication Theorem

Rearranging the conditional-probability definition:

\[ P(A \cap B) = P(A) \cdot P(B | A) = P(B) \cdot P(A | B) \]

For independent events: \(P(B | A) = P(B)\), so:

\[ P(A \cap B) = P(A) \cdot P(B) \]

41.8 Independence vs Mutual Exclusivity — A Common Trap

TipIndependence vs Mutual Exclusivity
Concept Definition Implication for \(P(A \cap B)\)
Independent \(P(B | A) = P(B)\) \(P(A \cap B) = P(A) \cdot P(B)\)
Mutually exclusive \(A \cap B = \emptyset\) \(P(A \cap B) = 0\)

Two events with positive probabilities can be independent or mutually exclusive but not both — mutual exclusivity says one prevents the other (so they are strongly dependent), while independence says one carries no information about the other.

41.9 Total Probability Theorem

If \(\{B_1, B_2, \dots, B_n\}\) is a partition of the sample space (mutually exclusive and exhaustive), then for any event \(A\):

\[ P(A) = \sum_{i=1}^{n} P(A | B_i) \cdot P(B_i) \]

This is the Total Probability Theorem — a stepping stone to Bayes’ Theorem.

41.10 Bayes’ Theorem

The Reverend Thomas Bayes (posthumously published in 1763) gave a rule for updating prior beliefs in the light of new evidence. Pierre-Simon Laplace later developed the rule into its modern form.

For events \(B_i\) forming a partition and any event \(A\) with \(P(A) > 0\):

\[ P(B_i | A) = \dfrac{P(A | B_i) \cdot P(B_i)}{\sum_{j=1}^{n} P(A | B_j) \cdot P(B_j)} \]

TipComponents of Bayes’ Theorem
Term Name Working content
\(P(B_i)\) Prior Probability of \(B_i\) before observing \(A\)
\(P(A | B_i)\) Likelihood Probability of observing \(A\) given \(B_i\)
\(P(A)\) Evidence / Marginal Probability of \(A\) under all \(B_j\)
\(P(B_i | A)\) Posterior Updated probability of \(B_i\) after observing \(A\)

In words: posterior ∝ likelihood × prior. Bayes’ theorem is the backbone of medical diagnosis, spam filtering, machine-learning classification, decision analysis and Bayesian econometrics.

41.10.1 Worked example — medical test

A disease affects 1 per cent of the population. A test is 99 per cent sensitive (true positive) and 95 per cent specific (true negative). What is the probability a person who tests positive actually has the disease?

Let \(D\) = has disease, \(D^c\) = does not. Let \(T+\) = positive test.

  • \(P(D) = 0.01\), \(P(D^c) = 0.99\).
  • \(P(T+|D) = 0.99\), \(P(T+|D^c) = 0.05\).
  • \(P(T+) = 0.99 \times 0.01 + 0.05 \times 0.99 = 0.0099 + 0.0495 = 0.0594\).
  • \(P(D | T+) = (0.99 \times 0.01) / 0.0594 = 0.0099 / 0.0594 \approx \mathbf{0.167}\).

Despite a positive test, the probability of disease is only about 16.7 per cent — because the disease is rare and false positives are common. The result — counter-intuitive at first — is the base-rate fallacy corrected by Bayes’ Theorem.

41.11 Worked Numerical — Two Coins

Two unbiased coins are tossed. The probability of getting at least one head is:

  • Sample space \(S = \{HH, HT, TH, TT\}\). \(|S| = 4\).
  • Favourable outcomes (at least one head) = \(\{HH, HT, TH\}\). \(|A| = 3\).
  • $P(A) = 3/4 = $ 0.75.

Equivalently, \(P(\text{at least one H}) = 1 - P(TT) = 1 - 1/4 = 0.75\).

41.12 Useful Identities

TipKey Identities
Identity Statement
Complement \(P(A^c) = 1 - P(A)\)
At least one event \(P(\cup A_i) = 1 - P(\cap A_i^c)\)
Independent events \(P(A_1 \cap A_2 \cap \dots) = P(A_1) \cdot P(A_2) \cdots\)
Conditional rearrangement \(P(A \cap B) = P(A) \cdot P(B|A)\)
Bayes (two-event form) \(P(B|A) = \dfrac{P(A|B) P(B)}{P(A|B) P(B) + P(A|B^c) P(B^c)}\)

41.13 Exam-Pattern MCQs

Q 01
Which of the following is not one of Kolmogorov's three axioms of probability?
  • A$P(A) \geq 0$ for any event $A$
  • B$P(S) = 1$
  • CFor mutually exclusive $A_1, A_2$: $P(A_1 \cup A_2) = P(A_1) + P(A_2)$
  • D$P(A) > 1$ for impossible events
View solution
Correct Option: D
Probabilities lie between 0 and 1 — option D is the negation of an axiom.
Q 02
Match each approach to probability with its core idea:
Approach Core idea
(i) Classical / Laplace (a) Limit of relative frequency in repeated trials
(ii) Statistical / Empirical (b) Probability satisfies formal mathematical axioms
(iii) Subjective (c) Ratio of favourable to equally likely outcomes
(iv) Axiomatic / Kolmogorov (d) Degree of belief of a rational agent
  • A(i)-(c), (ii)-(a), (iii)-(d), (iv)-(b)
  • B(i)-(a), (ii)-(b), (iii)-(c), (iv)-(d)
  • C(i)-(b), (ii)-(d), (iii)-(a), (iv)-(c)
  • D(i)-(d), (ii)-(c), (iii)-(b), (iv)-(a)
View solution
Correct Option: A
Q 03
Two unbiased dice are rolled. The probability of getting a sum of 7 is:
  • A1/6
  • B1/8
  • C1/9
  • D1/12
View solution
Correct Option: A
Out of 36 outcomes, the sum is 7 in 6 outcomes (1+6, 2+5, 3+4, 4+3, 5+2, 6+1). $P = 6/36 = $ 1/6.
Q 04
Match the probability concept with its formula:
Concept Formula
(i) Conditional probability $P(A B)$ (a) $P(A) \cdot P(B A)$
(ii) Multiplication for general events (b) $P(A) + P(B) - P(A \cap B)$
(iii) Addition for any two events (c) $P(A \cap B) / P(B)$
(iv) Independence implies (d) $P(A \cap B) = P(A) \cdot P(B)$
  • A(i)-(c), (ii)-(a), (iii)-(b), (iv)-(d)
  • B(i)-(a), (ii)-(b), (iii)-(c), (iv)-(d)
  • C(i)-(b), (ii)-(c), (iii)-(d), (iv)-(a)
  • D(i)-(d), (ii)-(c), (iii)-(a), (iv)-(b)
View solution
Correct Option: A
Q 05
Two events $A$ and $B$ have $P(A) = 0.6$, $P(B) = 0.4$. Which of the following is correct if $A$ and $B$ are independent?
  • A$P(A \cap B) = 0$
  • B$P(A \cap B) = 0.24$
  • C$P(A \cap B) = 1$
  • D$P(A \cup B) = P(A) + P(B)$
View solution
Correct Option: B
For independent events, $P(A \cap B) = 0.6 \times 0.4 = $ 0.24.
Q 06
A factory produces 60 % of items at machine X (with 1 % defectives) and 40 % at machine Y (with 5 % defectives). The probability that a defective item came from machine Y is approximately:
  • A0.40
  • B0.50
  • C0.77
  • D0.95
View solution
Correct Option: C
Use Bayes. $P(\text{defective}) = 0.6 \times 0.01 + 0.4 \times 0.05 = 0.006 + 0.020 = 0.026$. $P(Y | \text{defective}) = 0.020 / 0.026 \approx $ 0.77.
Q 07
Arrange the following components of Bayes' Theorem in the order in which they appear (numerator first, then denominator): (i) Likelihood $P(A|B_i)$ (ii) Posterior $P(B_i|A)$ (iii) Marginal evidence $P(A) = \sum_j P(A|B_j) P(B_j)$ (iv) Prior $P(B_i)$
  • A(ii), (i), (iv), (iii)
  • B(i), (ii), (iv), (iii)
  • C(iv), (iii), (ii), (i)
  • D(iii), (ii), (i), (iv)
View solution
Correct Option: A
Posterior = (Likelihood × Prior) / Evidence.
Q 08
Match each term with its definition:
Term Definition
(i) Mutually exclusive events (a) $P(A \cap B) = P(A) \cdot P(B)$
(ii) Independent events (b) $A \cap B = \emptyset$, so $P(A \cap B) = 0$
(iii) Exhaustive events (c) Together cover the entire sample space
(iv) Complement of A (d) All outcomes not in A
  • A(i)-(b), (ii)-(a), (iii)-(c), (iv)-(d)
  • B(i)-(a), (ii)-(b), (iii)-(c), (iv)-(d)
  • C(i)-(c), (ii)-(d), (iii)-(b), (iv)-(a)
  • D(i)-(d), (ii)-(c), (iii)-(a), (iv)-(b)
View solution
Correct Option: A
ImportantQuick recall
  • Probability lies in [0, 1].
  • Four approaches: Classical (Laplace), Statistical / Empirical (von Mises), Subjective (Ramsey, de Finetti, Savage), Axiomatic (Kolmogorov 1933).
  • Kolmogorov’s three axioms: non-negativity, normalisation, countable additivity.
  • Addition theorem: \(P(A \cup B) = P(A) + P(B) - P(A \cap B)\).
  • Conditional: \(P(A|B) = P(A \cap B)/P(B)\).
  • Multiplication: \(P(A \cap B) = P(A) \cdot P(B|A)\). For independent events: \(P(A) \cdot P(B)\).
  • Independent ≠ Mutually exclusive. ME → strongly dependent (one prevents the other).
  • Bayes’ Theorem (1763): \(P(B_i|A) = \dfrac{P(A|B_i) P(B_i)}{\sum_j P(A|B_j) P(B_j)}\). Posterior ∝ Likelihood × Prior.
  • Bayes is the foundation of medical diagnosis, spam filters, machine learning, decision analysis, Bayesian econometrics.
  • Base-rate fallacy: rare disease + imperfect test → most positives are false positives.
  • Useful identity: \(P(\text{at least one}) = 1 - P(\text{none})\).