38  Measures of Dispersion

38.1 What is Dispersion?

A measure of dispersion — also called a measure of spread, scatter or variability — captures the degree to which observations differ from one another or from a central value (gupta2021?; elhance2020?). The mean tells us the centre; dispersion tells us how the data are scattered around the centre.

Two data sets can have the same mean but very different dispersions. Marks of two students across five tests:

  • Student A: 60, 60, 60, 60, 60 — mean 60, dispersion zero (all same).
  • Student B: 40, 50, 60, 70, 80 — mean 60, dispersion high (range 40).

Dispersion turns the bare mean into a useful summary.

38.2 Why Dispersion Matters

Four working uses (gupta2021?):

  • Test the reliability of an average — a high-dispersion average is less representative.
  • Compare two distributions — different products, regions, time periods.
  • Identify extremes and control variation — Six Sigma, statistical process control.
  • Set up further statistical analysis — variance, regression, hypothesis tests all rest on dispersion.

38.3 Properties of a Good Measure of Dispersion

Adapted from the criteria of Yule and Kendall:

TipSix Properties of a Good Measure of Dispersion
Property Working content
Rigidly defined Single, clear formula
Easy to compute and understand Suitable for general use
Based on all observations Reflects the whole data set
Suitable for further algebraic treatment Combinable across groups
Not unduly affected by extreme values Robust
Independent of unit of measurement (relative measure) Allows comparison across data sets in different units

38.4 Absolute vs Relative Measures

TipAbsolute vs Relative Measures of Dispersion
Measure Absolute Relative
Range Range = \(L - S\) Coefficient of Range = \(\dfrac{L - S}{L + S}\)
Quartile Deviation \(Q.D. = \dfrac{Q_3 - Q_1}{2}\) Coefficient of QD = \(\dfrac{Q_3 - Q_1}{Q_3 + Q_1}\)
Mean Deviation \(M.D.\) from mean / median Coefficient of MD = \(\dfrac{M.D.}{\bar X}\) or \(\dfrac{M.D.}{M_d}\)
Standard Deviation \(\sigma\) Coefficient of Variation \(= \dfrac{\sigma}{\bar X} \times 100\)

The relative measure is unit-free and is used for comparison across data sets with different magnitudes or different units.

38.5 Range

The range is the difference between the largest and the smallest observation:

\[ \text{Range} = L - S \]

The coefficient of range expresses it relative to the sum:

\[ \text{Coefficient of Range} = \dfrac{L - S}{L + S} \]

Range is the simplest measure but is based on only two values and is highly sensitive to outliers. It is mostly used in quality-control charts (R-chart) and weather statistics.

38.6 Quartile Deviation (Semi-Inter-Quartile Range)

The quartile deviation is half the inter-quartile range:

\[ Q.D. = \dfrac{Q_3 - Q_1}{2} \]

The coefficient of QD is unit-free:

\[ \text{Coefficient of QD} = \dfrac{Q_3 - Q_1}{Q_3 + Q_1} \]

QD covers the middle 50 per cent of observations; its insensitivity to extreme values makes it suitable for open-ended distributions (e.g., income brackets with an unbounded top class).

38.7 Mean Deviation

The mean deviation (also called average deviation or mean absolute deviation, MAD) is the arithmetic mean of the absolute deviations from a central value:

\[ M.D._{\bar X} = \dfrac{\sum |X - \bar X|}{n} \quad ; \quad M.D._{M_d} = \dfrac{\sum |X - M_d|}{n} \]

MD is least when computed about the median. The coefficient divides MD by the central value used.

38.8 Standard Deviation

The standard deviation — proposed by Karl Pearson (1893) and the most-used measure of dispersion — is the positive square root of the average of squared deviations from the mean:

\[ \sigma = \sqrt{\dfrac{\sum (X - \bar X)^2}{n}} \quad \text{(ungrouped)} \quad ; \quad \sigma = \sqrt{\dfrac{\sum f (X - \bar X)^2}{N}} \quad \text{(frequency)} \]

The step-deviation shortcut for grouped data:

\[ \sigma = h \cdot \sqrt{\dfrac{\sum f d^2}{N} - \left(\dfrac{\sum f d}{N}\right)^2}, \quad d = \dfrac{m - A}{h} \]

38.8.1 Properties of standard deviation

TipProperties of Standard Deviation
Property Working content
Always non-negative \(\sigma \geq 0\); zero only if all values are equal
Independent of change of origin Adding a constant to every value leaves \(\sigma\) unchanged
Affected by change of scale Multiplying every value by \(k\) multiplies \(\sigma\) by \(|k|\)
Suitable for further algebra Used in variance, covariance, regression, ANOVA
Affected by extreme values Squared deviations magnify outlier effect
Combined SD of two groups \(\sigma_c = \sqrt{\dfrac{n_1 (\sigma_1^2 + d_1^2) + n_2 (\sigma_2^2 + d_2^2)}{n_1 + n_2}}\) where \(d_i = \bar X_i - \bar X_c\)

38.8.2 Empirical relations among the three measures (normal-like distributions)

For a roughly normal distribution, approximate ratios apply (gupta2021?):

\[ Q.D. : M.D. : \sigma \;\approx\; 10 : 12 : 15 \]

So \(Q.D. \approx \frac{2}{3} \sigma\) and \(M.D. \approx \frac{4}{5} \sigma\) — useful identities for exam questions.

38.9 Variance

The variance is the square of the standard deviation:

\[ \sigma^2 = \dfrac{\sum (X - \bar X)^2}{n} \]

Variance has the same unit as the square of the data, which is why \(\sigma\) is more often reported. But variance is the workhorse of inferential statistics — variances add across independent samples; standard deviations do not.

38.10 Coefficient of Variation (CV)

The coefficient of variation is the standard deviation expressed as a percentage of the mean:

\[ CV = \dfrac{\sigma}{\bar X} \times 100 \]

CV is unit-free and is the standard tool to compare the relative variability of two distributions with different units or magnitudes. The series with the lower CV is more consistent; the series with the higher CV is more variable.

38.10.1 Worked example

Two batsmen’s scores in 10 innings: A — mean 50, SD 10. B — mean 40, SD 10. CV(A) = 20 %; CV(B) = 25 %. Although both have SD = 10, B is relatively more inconsistent.

38.11 Lorenz Curve

The Lorenz Curve, introduced by Max O. Lorenz (1905), is a graphical measure of dispersion — it plots the cumulative percentage of variable (e.g., income) against the cumulative percentage of population. Perfect equality is the 45° line of equal distribution; the further the actual curve sags below, the greater the inequality.

The Gini coefficient, derived from the Lorenz curve, is the most-cited summary measure of inequality.

38.12 Box Plot — A Visual Summary

A box plot (John Tukey, 1977) summarises a distribution with five numbers:

TipFive-Number Summary in a Box Plot
Number Meaning
Minimum (or lower whisker) Smallest non-outlier
First quartile \(Q_1\) 25th percentile
Median \(Q_2\) 50th percentile
Third quartile \(Q_3\) 75th percentile
Maximum (or upper whisker) Largest non-outlier

Outliers are typically defined as values outside \(Q_1 - 1.5 \cdot IQR\) to \(Q_3 + 1.5 \cdot IQR\).

38.13 Comparison of Dispersion Measures

TipComparison of Dispersion Measures
Measure Uses all data? Robust to outliers? Suitable for algebra? Best for
Range No No No Quick rough check
QD No Yes No Open-ended distributions
MD Yes Moderate Limited (absolute value) Intuitive teaching
SD / Variance Yes No Yes Inferential statistics
CV Yes No Yes Comparison across units

38.14 Worked Numerical

Five observations: 4, 6, 8, 10, 12.

  • \(\bar X = 8\).
  • Deviations from mean: \(-4, -2, 0, 2, 4\). \(\sum (X - \bar X)^2 = 16 + 4 + 0 + 4 + 16 = 40\).
  • \(\sigma^2 = 40 / 5 = 8\). \(\sigma = \sqrt{8} \approx 2.828\).
  • $CV = 2.828 / 8 = $ 35.36 %.
  • Range \(= 12 - 4 = 8\); coefficient of range \(= 8 / 16 = 0.5\).
  • Mean deviation about mean = \((4 + 2 + 0 + 2 + 4) / 5 = 2.4\).

38.15 Exam-Pattern MCQs

Q 01
Which of the following is not a property of a good measure of dispersion?
  • ARigidly defined
  • BBased on all observations
  • CHeavily affected by extreme values
  • DSuitable for further algebraic treatment
View solution
Correct Option: C
A good measure should not be unduly affected by extreme values.
Q 02
Match each absolute measure of dispersion with its formula:
Measure Formula
(i) Range (a) $\sqrt{\sum (X - \bar X)^2 / n}$
(ii) Quartile deviation (b) $\sum X - \bar X / n$
(iii) Mean deviation about mean (c) $L - S$
(iv) Standard deviation (d) $(Q_3 - Q_1) / 2$
  • A(i)-(c), (ii)-(d), (iii)-(b), (iv)-(a)
  • B(i)-(a), (ii)-(b), (iii)-(c), (iv)-(d)
  • C(i)-(b), (ii)-(c), (iii)-(d), (iv)-(a)
  • D(i)-(d), (ii)-(a), (iii)-(c), (iv)-(b)
View solution
Correct Option: A
Q 03
Marks of a sample are 4, 6, 8, 10, 12. The standard deviation is approximately:
  • A2.0
  • B2.83
  • C3.16
  • D4.0
View solution
Correct Option: B
$\sigma = \sqrt{40/5} = \sqrt{8} ≈ $ 2.83.
Q 04
Match each measure with its content:
Measure Content
(i) Coefficient of Variation (a) Halve the inter-quartile range
(ii) Quartile Deviation (b) $\sigma$ as a percentage of the mean
(iii) Lorenz Curve (c) Difference between largest and smallest values
(iv) Range (d) Graphical measure of inequality
  • A(i)-(b), (ii)-(a), (iii)-(d), (iv)-(c)
  • B(i)-(a), (ii)-(b), (iii)-(c), (iv)-(d)
  • C(i)-(c), (ii)-(d), (iii)-(b), (iv)-(a)
  • D(i)-(d), (ii)-(c), (iii)-(a), (iv)-(b)
View solution
Correct Option: A
Q 05
For an approximately normal distribution, the empirical ratio of QD : MD : SD is:
  • A10 : 12 : 15
  • B6 : 8 : 10
  • C1 : 1 : 1
  • D15 : 12 : 10
View solution
Correct Option: A
The standard textbook ratio is 10 : 12 : 15, giving $QD ≈ \dfrac{2}{3}\sigma$ and $MD ≈ \dfrac{4}{5}\sigma$.
Q 06
If a constant 5 is added to every observation in a data set, the standard deviation:
  • AIncreases by 5
  • BDecreases by 5
  • CRemains unchanged
  • DIncreases by 25
View solution
Correct Option: C
SD is independent of change of origin; adding a constant shifts the data but not the spread.
Q 07
Arrange the following dispersion measures in increasing order of their typical magnitude for a roughly normal distribution: (i) Mean deviation (ii) Quartile deviation (iii) Standard deviation
  • A(ii), (i), (iii)
  • B(i), (ii), (iii)
  • C(iii), (ii), (i)
  • D(ii), (iii), (i)
View solution
Correct Option: A
Approximately QD < MD < SD in the ratio 10 : 12 : 15.
Q 08
Two batsmen have the same standard deviation of 10 runs but different means: A averages 50, B averages 40. Which batsman is more consistent?
  • AA — lower coefficient of variation
  • BB — higher mean
  • CBoth equally consistent
  • DCannot be determined
View solution
Correct Option: A
CV(A) = 10/50 = 20 %; CV(B) = 10/40 = 25 %. Lower CV → more consistent. A is more consistent.
ImportantQuick recall
  • Dispersion = scatter of observations around a central value.
  • Absolute measures: Range, QD, MD, SD, Variance.
  • Relative measures: Coefficient of Range, Coefficient of QD, Coefficient of MD, CV.
  • Range = \(L − S\); Coefficient of Range = \((L − S)/(L + S)\).
  • \(Q.D. = (Q_3 − Q_1)/2\); covers middle 50 % of observations; suitable for open-ended classes.
  • MD is least about the median.
  • SD = \(\sqrt{\sum(X − \bar X)^2 / n}\). Pearson 1893. Independent of origin, depends on scale.
  • Variance = \(\sigma^2\); variances add across independent samples; SDs do not.
  • CV = (σ / mean) × 100 — unit-free; lower CV → more consistent.
  • Empirical ratio for normal-like distributions: QD : MD : SD ≈ 10 : 12 : 15, so QD ≈ \(\frac{2}{3}\sigma\), MD ≈ \(\frac{4}{5}\sigma\).
  • Lorenz Curve (Lorenz 1905) — graphical inequality measure; Gini coefficient is its summary.
  • Five-number summary (box plot, Tukey 1977): Min, \(Q_1\), Median, \(Q_3\), Max.