37  Measures of Central Tendency

37.1 What is Central Tendency?

A measure of central tendency — also called an average or a measure of location — is a single value that represents the centre of a data set (gupta2021?; elhance2020?).

Three working ideas anchor the concept:

  • It summarises a large data set in a single number.
  • It is a typical or representative value.
  • It enables comparison between two or more data sets.

Three classical averages — arithmetic mean, median, mode — were named by G.U. Yule and M.G. Kendall as the mathematical, positional and commonest-value measures, respectively. Two further averages — geometric mean and harmonic mean — extend the toolkit for special-purpose data.

37.2 Yule’s Criteria — What Makes a Good Average

G. Udny Yule’s six requirements for a good average (gupta2021?):

TipYule’s Six Requirements
Requirement Working content
Rigidly defined Clear, single, unambiguous formula
Easy to understand and compute Suitable for general use
Based on all observations Reflects the entire data set
Suitable for further algebraic treatment Combinable across groups
Affected as little as possible by sampling fluctuations Stable from sample to sample
Not unduly affected by extreme values Robust to outliers

No single average meets all six criteria — a fact that explains why the toolkit contains several averages.

37.3 Five Types of Average

TipFive Standard Averages
Type Symbol Best used for
Arithmetic Mean \(\bar X\) General-purpose averaging of measurement data
Median \(M_d\) Skewed distributions; ordinal data
Mode \(M_o\) Categorical / qualitative data; most frequent value
Geometric Mean GM Rates of growth, ratios, index numbers
Harmonic Mean HM Averaging rates (speed, time per unit)

37.4 Arithmetic Mean

The arithmetic mean is the sum of all observations divided by the number of observations.

TipArithmetic Mean — Formulas
Data Formula
Ungrouped (n observations) \(\bar X = \dfrac{\sum X}{n}\)
Discrete frequency distribution \(\bar X = \dfrac{\sum f X}{\sum f}\)
Continuous (grouped) \(\bar X = \dfrac{\sum f m}{\sum f}\), \(m\) = class mid-point
Step-deviation / shortcut \(\bar X = A + \dfrac{\sum f d}{\sum f} \times h\), where \(d = (m - A)/h\)
Weighted mean \(\bar X_w = \dfrac{\sum w X}{\sum w}\)

37.4.1 Five properties of the arithmetic mean

  • The algebraic sum of deviations of observations from the mean is zero: \(\sum (X - \bar X) = 0\).
  • The sum of squared deviations from the mean is minimum (least-squares property).
  • For two groups with means \(\bar X_1\) and \(\bar X_2\) and sizes \(n_1\) and \(n_2\), the combined mean is \(\bar X_c = \dfrac{n_1 \bar X_1 + n_2 \bar X_2}{n_1 + n_2}\).
  • AM is affected by extreme values — its main weakness.
  • AM is suitable for further algebraic treatment — its main strength.

37.5 Median

The median is the middle value of an ordered data set — half of the observations lie below it, half above. For an even number of observations, the median is the average of the two middle values.

TipMedian — Formulas
Data Formula
Ungrouped, n odd Middle observation
Ungrouped, n even Average of middle two observations
Discrete frequency The X-value at cumulative frequency \(\geq (N+1)/2\)
Continuous (grouped) \(M_d = L + \dfrac{N/2 - cf}{f} \times h\)

where \(L\) = lower limit of median class, \(N = \sum f\), \(cf\) = cumulative frequency before median class, \(f\) = frequency of median class, \(h\) = class width.

The median’s main strength is robustness — it is unaffected by extreme values. Its main weakness is that it does not use all the data.

37.5.1 Quartiles, deciles, percentiles

Generalisations of the median divide the data set into four (quartiles), ten (deciles) or hundred (percentiles) equal parts.

TipQuartiles, Deciles, Percentiles
Measure Position Formula (continuous)
First Quartile \(Q_1\) 25 % \(L + \dfrac{N/4 - cf}{f} \times h\)
Second Quartile \(Q_2\) = Median 50 % \(L + \dfrac{N/2 - cf}{f} \times h\)
Third Quartile \(Q_3\) 75 % \(L + \dfrac{3N/4 - cf}{f} \times h\)
\(i\)-th Decile \(D_i\) \(10 i\) % \(L + \dfrac{iN/10 - cf}{f} \times h\)
\(j\)-th Percentile \(P_j\) \(j\) % \(L + \dfrac{jN/100 - cf}{f} \times h\)

37.6 Mode

The mode is the most frequently occurring value — the value with the highest frequency. It may be unimodal (one peak), bimodal (two), or multimodal.

TipMode — Formulas
Data Formula
Ungrouped Value occurring most often
Discrete frequency The X-value with the highest frequency
Continuous (grouped) \(M_o = L + \dfrac{f_1 - f_0}{2 f_1 - f_0 - f_2} \times h\)

where \(L\) = lower limit of modal class, \(f_1\) = frequency of modal class, \(f_0\) = frequency of preceding class, \(f_2\) = frequency of following class.

37.6.1 Empirical relationship — Karl Pearson

For a moderately skewed distribution:

\[ \text{Mode} \approx 3 \cdot \text{Median} - 2 \cdot \text{Mean} \]

This identity allows one of the three averages to be approximated from the other two.

37.7 Geometric Mean (GM)

The geometric mean of \(n\) positive observations is the \(n\)-th root of their product:

\[ \text{GM} = \sqrt[n]{X_1 \cdot X_2 \cdot \dots \cdot X_n} \quad \text{or} \quad \text{GM} = \text{antilog}\left(\dfrac{\sum \log X}{n}\right) \]

The GM is the appropriate average for ratios, rates of growth, index numbers and compound interest. It always understates the AM unless all values are equal.

37.7.1 Use case

A firm’s revenue grew by 10 % in year 1, 20 % in year 2, and 30 % in year 3. The average growth rate is the geometric mean of (1.10, 1.20, 1.30) = \(\sqrt[3]{1.10 \times 1.20 \times 1.30} = \sqrt[3]{1.716} \approx 1.197\) — i.e. about 19.7 per cent per year. The arithmetic mean (20 %) overstates this.

37.8 Harmonic Mean (HM)

The harmonic mean is the reciprocal of the arithmetic mean of the reciprocals:

\[ \text{HM} = \dfrac{n}{\sum (1 / X_i)} \]

The HM is the appropriate average for rates expressed as quantities per unit of time — speed (km/hour), price per kilo, time per task. The classic example: a car travels 100 km at 40 km/hour and returns at 60 km/hour. The average speed over the round trip is the HM = \(2 / (1/40 + 1/60) = 2/(5/120) = 48\) km/hour, not the arithmetic mean of 50.

37.9 Inequality of the Three Means

For any set of positive numbers (not all equal):

\[ \text{AM} \geq \text{GM} \geq \text{HM} \]

with equality only when all the numbers are equal. This AM-GM-HM inequality is widely used in problems and proofs.

37.10 Comparison of Averages

TipComparison of the Five Averages
Property AM Median Mode GM HM
Uses all observations Yes No No Yes Yes
Affected by extreme values Strongly Not Not Moderately Weakly
Suitable for further algebra Yes Limited No Yes Yes
Best for Symmetric data Skewed data Categorical data Ratios, growth rates Rates per unit
Defined for negative / zero Yes Yes Yes Only positive Only positive
Computability for open-ended classes No Yes Yes No No

37.11 Worked Numerical

Marks of 10 students: 35, 40, 42, 45, 48, 50, 50, 55, 60, 75.

  • \(n = 10\), \(\sum X = 500\), AM = 50.
  • Ordered already: middle two are the 5th and 6th values, 48 and 50, so Median = 49.
  • 50 occurs twice, all others once: Mode = 50.
  • Pearson approximation check: \(3 \times 49 − 2 \times 50 = 47\) — close but not exact, since data are not perfectly moderately skewed.

37.12 Exam-Pattern MCQs

Q 01
Which of the following is not one of Yule's requirements for a good average?
  • ARigidly defined
  • BBased on all observations
  • CAffected unduly by extreme values
  • DSuitable for further algebraic treatment
View solution
Correct Option: C
A good average should not be unduly affected by extreme values.
Q 02
Match each average with the situation in which it is most appropriate:
Average Best used for
(i) Arithmetic Mean (a) Categorical / most-frequent value
(ii) Geometric Mean (b) General-purpose averaging
(iii) Harmonic Mean (c) Rates of growth and ratios
(iv) Mode (d) Averaging rates per unit of time
  • A(i)-(b), (ii)-(c), (iii)-(d), (iv)-(a)
  • B(i)-(a), (ii)-(b), (iii)-(c), (iv)-(d)
  • C(i)-(c), (ii)-(d), (iii)-(b), (iv)-(a)
  • D(i)-(d), (ii)-(a), (iii)-(c), (iv)-(b)
View solution
Correct Option: A
Q 03
A car travels from town A to town B at 30 km/h and returns at 60 km/h. The average speed over the round trip is:
  • A45 km/h
  • B40 km/h
  • C30 km/h
  • D50 km/h
View solution
Correct Option: B
Use HM: $2 / (1/30 + 1/60) = 2 / (3/60) = 40$ km/h.
Q 04
Match each formula with the measure it computes (assume continuous frequency distribution):
Formula Measure
(i) $L + \dfrac{N/2 - cf}{f} \times h$ (a) Mode
(ii) $L + \dfrac{f_1 - f_0}{2 f_1 - f_0 - f_2} \times h$ (b) Median
(iii) $\dfrac{\sum f m}{\sum f}$ (c) Arithmetic mean
(iv) $L + \dfrac{N/4 - cf}{f} \times h$ (d) First quartile $Q_1$
  • A(i)-(b), (ii)-(a), (iii)-(c), (iv)-(d)
  • B(i)-(a), (ii)-(b), (iii)-(c), (iv)-(d)
  • C(i)-(c), (ii)-(d), (iii)-(b), (iv)-(a)
  • D(i)-(d), (ii)-(c), (iii)-(a), (iv)-(b)
View solution
Correct Option: A
Q 05
For a moderately skewed distribution, the empirical relationship between mean, median and mode is:
  • AMode = 3 Median − 2 Mean
  • BMode = 2 Median − 3 Mean
  • CMode = 3 Mean − 2 Median
  • DMode = 2 Mean − 3 Median
View solution
Correct Option: A
Karl Pearson's empirical relationship: Mode = 3 Median − 2 Mean.
Q 06
A firm's revenue grew by 21 % in year 1 and 64 % in year 2. The average annual growth rate is:
  • A42.5 %
  • B21 %
  • C41 %
  • D64 %
View solution
Correct Option: C
Use GM of growth multipliers: $\sqrt{1.21 \times 1.64} = \sqrt{1.9844} ≈ 1.41$ → ≈ 41 % per year.
Q 07
Arrange the following from largest to smallest for any set of positive numbers (not all equal): (i) Geometric Mean (ii) Harmonic Mean (iii) Arithmetic Mean
  • A(iii), (i), (ii)
  • B(i), (ii), (iii)
  • C(ii), (i), (iii)
  • D(i), (iii), (ii)
View solution
Correct Option: A
AM ≥ GM ≥ HM — equality only if all values are equal.
Q 08
Match each property with the average it characterises uniquely among the three classical averages:
Property Average
(i) Sum of deviations from this measure is zero (a) Mode
(ii) Half the observations lie below it (b) Mean
(iii) Most frequently occurring value (c) Median
  • A(i)-(b), (ii)-(c), (iii)-(a)
  • B(i)-(a), (ii)-(b), (iii)-(c)
  • C(i)-(c), (ii)-(a), (iii)-(b)
  • D(i)-(c), (ii)-(b), (iii)-(a)
View solution
Correct Option: A
ImportantQuick recall
  • Average = single value representing the centre of the data — Yule’s six requirements (rigid, simple, full data, algebra-friendly, stable, robust).
  • Five averages: AM, Median, Mode, GM, HM.
  • AM: \(\bar X = \sum X / n\); sum of deviations from AM = 0; sum of squared deviations is minimum.
  • Combined mean of two groups: \(\bar X_c = (n_1 \bar X_1 + n_2 \bar X_2) / (n_1 + n_2)\).
  • Median (continuous): \(L + \dfrac{N/2 - cf}{f} \times h\).
  • Mode (continuous): \(L + \dfrac{f_1 - f_0}{2 f_1 - f_0 - f_2} \times h\).
  • Pearson empirical relation: Mode = 3 Median − 2 Mean.
  • GM = \(n\)-th root of product = antilog(\(\sum \log X / n\)); use for growth rates, ratios, indices.
  • HM = \(n / \sum (1/X)\); use for rates per unit of time (e.g., average speed).
  • Inequality (positive numbers): AM ≥ GM ≥ HM.
  • Quartiles (\(Q_1, Q_3\)), Deciles, Percentiles — positional generalisations of median.
  • Robust to outliers: Median, Mode. Strongly affected by outliers: Mean.