37 Measures of Central Tendency
37.1 What is Central Tendency?
A measure of central tendency — also called an average or a measure of location — is a single value that represents the centre of a data set (gupta2021?; elhance2020?).
Three working ideas anchor the concept:
- It summarises a large data set in a single number.
- It is a typical or representative value.
- It enables comparison between two or more data sets.
Three classical averages — arithmetic mean, median, mode — were named by G.U. Yule and M.G. Kendall as the mathematical, positional and commonest-value measures, respectively. Two further averages — geometric mean and harmonic mean — extend the toolkit for special-purpose data.
37.2 Yule’s Criteria — What Makes a Good Average
G. Udny Yule’s six requirements for a good average (gupta2021?):
| Requirement | Working content |
|---|---|
| Rigidly defined | Clear, single, unambiguous formula |
| Easy to understand and compute | Suitable for general use |
| Based on all observations | Reflects the entire data set |
| Suitable for further algebraic treatment | Combinable across groups |
| Affected as little as possible by sampling fluctuations | Stable from sample to sample |
| Not unduly affected by extreme values | Robust to outliers |
No single average meets all six criteria — a fact that explains why the toolkit contains several averages.
37.3 Five Types of Average
| Type | Symbol | Best used for |
|---|---|---|
| Arithmetic Mean | \(\bar X\) | General-purpose averaging of measurement data |
| Median | \(M_d\) | Skewed distributions; ordinal data |
| Mode | \(M_o\) | Categorical / qualitative data; most frequent value |
| Geometric Mean | GM | Rates of growth, ratios, index numbers |
| Harmonic Mean | HM | Averaging rates (speed, time per unit) |
37.4 Arithmetic Mean
The arithmetic mean is the sum of all observations divided by the number of observations.
| Data | Formula |
|---|---|
| Ungrouped (n observations) | \(\bar X = \dfrac{\sum X}{n}\) |
| Discrete frequency distribution | \(\bar X = \dfrac{\sum f X}{\sum f}\) |
| Continuous (grouped) | \(\bar X = \dfrac{\sum f m}{\sum f}\), \(m\) = class mid-point |
| Step-deviation / shortcut | \(\bar X = A + \dfrac{\sum f d}{\sum f} \times h\), where \(d = (m - A)/h\) |
| Weighted mean | \(\bar X_w = \dfrac{\sum w X}{\sum w}\) |
37.4.1 Five properties of the arithmetic mean
- The algebraic sum of deviations of observations from the mean is zero: \(\sum (X - \bar X) = 0\).
- The sum of squared deviations from the mean is minimum (least-squares property).
- For two groups with means \(\bar X_1\) and \(\bar X_2\) and sizes \(n_1\) and \(n_2\), the combined mean is \(\bar X_c = \dfrac{n_1 \bar X_1 + n_2 \bar X_2}{n_1 + n_2}\).
- AM is affected by extreme values — its main weakness.
- AM is suitable for further algebraic treatment — its main strength.
37.5 Median
The median is the middle value of an ordered data set — half of the observations lie below it, half above. For an even number of observations, the median is the average of the two middle values.
| Data | Formula |
|---|---|
| Ungrouped, n odd | Middle observation |
| Ungrouped, n even | Average of middle two observations |
| Discrete frequency | The X-value at cumulative frequency \(\geq (N+1)/2\) |
| Continuous (grouped) | \(M_d = L + \dfrac{N/2 - cf}{f} \times h\) |
where \(L\) = lower limit of median class, \(N = \sum f\), \(cf\) = cumulative frequency before median class, \(f\) = frequency of median class, \(h\) = class width.
The median’s main strength is robustness — it is unaffected by extreme values. Its main weakness is that it does not use all the data.
37.5.1 Quartiles, deciles, percentiles
Generalisations of the median divide the data set into four (quartiles), ten (deciles) or hundred (percentiles) equal parts.
| Measure | Position | Formula (continuous) |
|---|---|---|
| First Quartile \(Q_1\) | 25 % | \(L + \dfrac{N/4 - cf}{f} \times h\) |
| Second Quartile \(Q_2\) = Median | 50 % | \(L + \dfrac{N/2 - cf}{f} \times h\) |
| Third Quartile \(Q_3\) | 75 % | \(L + \dfrac{3N/4 - cf}{f} \times h\) |
| \(i\)-th Decile \(D_i\) | \(10 i\) % | \(L + \dfrac{iN/10 - cf}{f} \times h\) |
| \(j\)-th Percentile \(P_j\) | \(j\) % | \(L + \dfrac{jN/100 - cf}{f} \times h\) |
37.6 Mode
The mode is the most frequently occurring value — the value with the highest frequency. It may be unimodal (one peak), bimodal (two), or multimodal.
| Data | Formula |
|---|---|
| Ungrouped | Value occurring most often |
| Discrete frequency | The X-value with the highest frequency |
| Continuous (grouped) | \(M_o = L + \dfrac{f_1 - f_0}{2 f_1 - f_0 - f_2} \times h\) |
where \(L\) = lower limit of modal class, \(f_1\) = frequency of modal class, \(f_0\) = frequency of preceding class, \(f_2\) = frequency of following class.
37.6.1 Empirical relationship — Karl Pearson
For a moderately skewed distribution:
\[ \text{Mode} \approx 3 \cdot \text{Median} - 2 \cdot \text{Mean} \]
This identity allows one of the three averages to be approximated from the other two.
37.7 Geometric Mean (GM)
The geometric mean of \(n\) positive observations is the \(n\)-th root of their product:
\[ \text{GM} = \sqrt[n]{X_1 \cdot X_2 \cdot \dots \cdot X_n} \quad \text{or} \quad \text{GM} = \text{antilog}\left(\dfrac{\sum \log X}{n}\right) \]
The GM is the appropriate average for ratios, rates of growth, index numbers and compound interest. It always understates the AM unless all values are equal.
37.7.1 Use case
A firm’s revenue grew by 10 % in year 1, 20 % in year 2, and 30 % in year 3. The average growth rate is the geometric mean of (1.10, 1.20, 1.30) = \(\sqrt[3]{1.10 \times 1.20 \times 1.30} = \sqrt[3]{1.716} \approx 1.197\) — i.e. about 19.7 per cent per year. The arithmetic mean (20 %) overstates this.
37.8 Harmonic Mean (HM)
The harmonic mean is the reciprocal of the arithmetic mean of the reciprocals:
\[ \text{HM} = \dfrac{n}{\sum (1 / X_i)} \]
The HM is the appropriate average for rates expressed as quantities per unit of time — speed (km/hour), price per kilo, time per task. The classic example: a car travels 100 km at 40 km/hour and returns at 60 km/hour. The average speed over the round trip is the HM = \(2 / (1/40 + 1/60) = 2/(5/120) = 48\) km/hour, not the arithmetic mean of 50.
37.9 Inequality of the Three Means
For any set of positive numbers (not all equal):
\[ \text{AM} \geq \text{GM} \geq \text{HM} \]
with equality only when all the numbers are equal. This AM-GM-HM inequality is widely used in problems and proofs.
37.10 Comparison of Averages
| Property | AM | Median | Mode | GM | HM |
|---|---|---|---|---|---|
| Uses all observations | Yes | No | No | Yes | Yes |
| Affected by extreme values | Strongly | Not | Not | Moderately | Weakly |
| Suitable for further algebra | Yes | Limited | No | Yes | Yes |
| Best for | Symmetric data | Skewed data | Categorical data | Ratios, growth rates | Rates per unit |
| Defined for negative / zero | Yes | Yes | Yes | Only positive | Only positive |
| Computability for open-ended classes | No | Yes | Yes | No | No |
37.11 Worked Numerical
Marks of 10 students: 35, 40, 42, 45, 48, 50, 50, 55, 60, 75.
- \(n = 10\), \(\sum X = 500\), AM = 50.
- Ordered already: middle two are the 5th and 6th values, 48 and 50, so Median = 49.
- 50 occurs twice, all others once: Mode = 50.
- Pearson approximation check: \(3 \times 49 − 2 \times 50 = 47\) — close but not exact, since data are not perfectly moderately skewed.
37.12 Exam-Pattern MCQs
View solution
| Average | Best used for | ||
| (i) | Arithmetic Mean | (a) | Categorical / most-frequent value |
| (ii) | Geometric Mean | (b) | General-purpose averaging |
| (iii) | Harmonic Mean | (c) | Rates of growth and ratios |
| (iv) | Mode | (d) | Averaging rates per unit of time |
View solution
View solution
| Formula | Measure | ||
| (i) | $L + \dfrac{N/2 - cf}{f} \times h$ | (a) | Mode |
| (ii) | $L + \dfrac{f_1 - f_0}{2 f_1 - f_0 - f_2} \times h$ | (b) | Median |
| (iii) | $\dfrac{\sum f m}{\sum f}$ | (c) | Arithmetic mean |
| (iv) | $L + \dfrac{N/4 - cf}{f} \times h$ | (d) | First quartile $Q_1$ |
View solution
View solution
View solution
View solution
| Property | Average | ||
| (i) | Sum of deviations from this measure is zero | (a) | Mode |
| (ii) | Half the observations lie below it | (b) | Mean |
| (iii) | Most frequently occurring value | (c) | Median |
View solution
- Average = single value representing the centre of the data — Yule’s six requirements (rigid, simple, full data, algebra-friendly, stable, robust).
- Five averages: AM, Median, Mode, GM, HM.
- AM: \(\bar X = \sum X / n\); sum of deviations from AM = 0; sum of squared deviations is minimum.
- Combined mean of two groups: \(\bar X_c = (n_1 \bar X_1 + n_2 \bar X_2) / (n_1 + n_2)\).
- Median (continuous): \(L + \dfrac{N/2 - cf}{f} \times h\).
- Mode (continuous): \(L + \dfrac{f_1 - f_0}{2 f_1 - f_0 - f_2} \times h\).
- Pearson empirical relation: Mode = 3 Median − 2 Mean.
- GM = \(n\)-th root of product = antilog(\(\sum \log X / n\)); use for growth rates, ratios, indices.
- HM = \(n / \sum (1/X)\); use for rates per unit of time (e.g., average speed).
- Inequality (positive numbers): AM ≥ GM ≥ HM.
- Quartiles (\(Q_1, Q_3\)), Deciles, Percentiles — positional generalisations of median.
- Robust to outliers: Median, Mode. Strongly affected by outliers: Mean.