38 Measures of Dispersion

38.1 What is Dispersion?

A measure of dispersion — also called a measure of spread, scatter or variability — captures the degree to which observations differ from one another or from a central value (gupta2021?; elhance2020?). The mean tells us the centre; dispersion tells us how the data are scattered around the centre.

Two data sets can have the same mean but very different dispersions. Marks of two students across five tests:

Student A: 60, 60, 60, 60, 60 — mean 60, dispersion zero (all same).
Student B: 40, 50, 60, 70, 80 — mean 60, dispersion high (range 40).

Dispersion turns the bare mean into a useful summary.

38.2 Why Dispersion Matters

Four working uses (gupta2021?):

Test the reliability of an average — a high-dispersion average is less representative.
Compare two distributions — different products, regions, time periods.
Identify extremes and control variation — Six Sigma, statistical process control.
Set up further statistical analysis — variance, regression, hypothesis tests all rest on dispersion.

38.3 Properties of a Good Measure of Dispersion

Adapted from the criteria of Yule and Kendall:

Six Properties of a Good Measure of Dispersion

Property	Working content
Rigidly defined	Single, clear formula
Easy to compute and understand	Suitable for general use
Based on all observations	Reflects the whole data set
Suitable for further algebraic treatment	Combinable across groups
Not unduly affected by extreme values	Robust
Independent of unit of measurement (relative measure)	Allows comparison across data sets in different units

38.4 Absolute vs Relative Measures

Absolute vs Relative Measures of Dispersion

Measure	Absolute	Relative
Range	Range = $L - S$	Coefficient of Range = $\dfrac{L - S}{L + S}$
Quartile Deviation	$Q.D. = \dfrac{Q_3 - Q_1}{2}$	Coefficient of QD = $\dfrac{Q_3 - Q_1}{Q_3 + Q_1}$
Mean Deviation	$M.D.$ from mean / median	Coefficient of MD = $\dfrac{M.D.}{\bar X}$ or $\dfrac{M.D.}{M_d}$
Standard Deviation	$\sigma$	Coefficient of Variation $= \dfrac{\sigma}{\bar X} \times 100$

The relative measure is unit-free and is used for comparison across data sets with different magnitudes or different units.

38.5 Range

The range is the difference between the largest and the smallest observation:

\[ \text{Range} = L - S \]

The coefficient of range expresses it relative to the sum:

\[ \text{Coefficient of Range} = \dfrac{L - S}{L + S} \]

Range is the simplest measure but is based on only two values and is highly sensitive to outliers. It is mostly used in quality-control charts (R-chart) and weather statistics.

38.6 Quartile Deviation (Semi-Inter-Quartile Range)

The quartile deviation is half the inter-quartile range:

\[ Q.D. = \dfrac{Q_3 - Q_1}{2} \]

The coefficient of QD is unit-free:

\[ \text{Coefficient of QD} = \dfrac{Q_3 - Q_1}{Q_3 + Q_1} \]

QD covers the middle 50 per cent of observations; its insensitivity to extreme values makes it suitable for open-ended distributions (e.g., income brackets with an unbounded top class).

38.7 Mean Deviation

The mean deviation (also called average deviation or mean absolute deviation, MAD) is the arithmetic mean of the absolute deviations from a central value:

\[ M.D._{\bar X} = \dfrac{\sum |X - \bar X|}{n} \quad ; \quad M.D._{M_d} = \dfrac{\sum |X - M_d|}{n} \]

MD is least when computed about the median. The coefficient divides MD by the central value used.

38.8 Standard Deviation

The standard deviation — proposed by Karl Pearson (1893) and the most-used measure of dispersion — is the positive square root of the average of squared deviations from the mean:

\[ \sigma = \sqrt{\dfrac{\sum (X - \bar X)^2}{n}} \quad \text{(ungrouped)} \quad ; \quad \sigma = \sqrt{\dfrac{\sum f (X - \bar X)^2}{N}} \quad \text{(frequency)} \]

The step-deviation shortcut for grouped data:

\[ \sigma = h \cdot \sqrt{\dfrac{\sum f d^2}{N} - \left(\dfrac{\sum f d}{N}\right)^2}, \quad d = \dfrac{m - A}{h} \]

38.8.1 Properties of standard deviation

Properties of Standard Deviation

Property	Working content
Always non-negative	$\sigma \geq 0$; zero only if all values are equal
Independent of change of origin	Adding a constant to every value leaves $\sigma$ unchanged
Affected by change of scale	Multiplying every value by $k$ multiplies $\sigma$ by $\|k\|$
Suitable for further algebra	Used in variance, covariance, regression, ANOVA
Affected by extreme values	Squared deviations magnify outlier effect
Combined SD of two groups	$\sigma_c = \sqrt{\dfrac{n_1 (\sigma_1^2 + d_1^2) + n_2 (\sigma_2^2 + d_2^2)}{n_1 + n_2}}$ where $d_i = \bar X_i - \bar X_c$

38.8.2 Empirical relations among the three measures (normal-like distributions)

For a roughly normal distribution, approximate ratios apply (gupta2021?):

\[ Q.D. : M.D. : \sigma \;\approx\; 10 : 12 : 15 \]

So $Q.D. \approx \frac{2}{3} \sigma$ and $M.D. \approx \frac{4}{5} \sigma$ — useful identities for exam questions.

38.9 Variance

The variance is the square of the standard deviation:

\[ \sigma^2 = \dfrac{\sum (X - \bar X)^2}{n} \]

Variance has the same unit as the square of the data, which is why $\sigma$ is more often reported. But variance is the workhorse of inferential statistics — variances add across independent samples; standard deviations do not.

38.10 Coefficient of Variation (CV)

The coefficient of variation is the standard deviation expressed as a percentage of the mean:

\[ CV = \dfrac{\sigma}{\bar X} \times 100 \]

CV is unit-free and is the standard tool to compare the relative variability of two distributions with different units or magnitudes. The series with the lower CV is more consistent; the series with the higher CV is more variable.

38.10.1 Worked example

Two batsmen’s scores in 10 innings: A — mean 50, SD 10. B — mean 40, SD 10. CV(A) = 20 %; CV(B) = 25 %. Although both have SD = 10, B is relatively more inconsistent.

38.11 Lorenz Curve

The Lorenz Curve, introduced by Max O. Lorenz (1905), is a graphical measure of dispersion — it plots the cumulative percentage of variable (e.g., income) against the cumulative percentage of population. Perfect equality is the 45° line of equal distribution; the further the actual curve sags below, the greater the inequality.

The Gini coefficient, derived from the Lorenz curve, is the most-cited summary measure of inequality.

38.12 Box Plot — A Visual Summary

A box plot (John Tukey, 1977) summarises a distribution with five numbers:

Five-Number Summary in a Box Plot

Number	Meaning
Minimum (or lower whisker)	Smallest non-outlier
First quartile $Q_1$	25th percentile
Median $Q_2$	50th percentile
Third quartile $Q_3$	75th percentile
Maximum (or upper whisker)	Largest non-outlier

Outliers are typically defined as values outside $Q_1 - 1.5 \cdot IQR$ to $Q_3 + 1.5 \cdot IQR$.

38.13 Comparison of Dispersion Measures

Comparison of Dispersion Measures

Measure	Uses all data?	Robust to outliers?	Suitable for algebra?	Best for
Range	No	No	No	Quick rough check
QD	No	Yes	No	Open-ended distributions
MD	Yes	Moderate	Limited (absolute value)	Intuitive teaching
SD / Variance	Yes	No	Yes	Inferential statistics
CV	Yes	No	Yes	Comparison across units

38.14 Worked Numerical

Five observations: 4, 6, 8, 10, 12.

$\bar X = 8$.
Deviations from mean: $-4, -2, 0, 2, 4$. $\sum (X - \bar X)^2 = 16 + 4 + 0 + 4 + 16 = 40$.
$\sigma^2 = 40 / 5 = 8$. $\sigma = \sqrt{8} \approx 2.828$.
$CV = 2.828 / 8 = $ 35.36 %.
Range $= 12 - 4 = 8$; coefficient of range $= 8 / 16 = 0.5$.
Mean deviation about mean = $(4 + 2 + 0 + 2 + 4) / 5 = 2.4$.

38.15 Exam-Pattern MCQs

Q 01

Which of the following is not a property of a good measure of dispersion?

ARigidly defined
BBased on all observations
CHeavily affected by extreme values
DSuitable for further algebraic treatment

View solution

Correct Option: C

A good measure should not be unduly affected by extreme values.

Q 02

Match each absolute measure of dispersion with its formula:

	Measure		Formula
(i)	Range	(a)	$\sqrt{\sum (X - \bar X)^2 / n}$
(ii)	Quartile deviation	(b)	$\sum	X - \bar X	/ n$
(iii)	Mean deviation about mean	(c)	$L - S$
(iv)	Standard deviation	(d)	$(Q_3 - Q_1) / 2$

A(i)-(c), (ii)-(d), (iii)-(b), (iv)-(a)
B(i)-(a), (ii)-(b), (iii)-(c), (iv)-(d)
C(i)-(b), (ii)-(c), (iii)-(d), (iv)-(a)
D(i)-(d), (ii)-(a), (iii)-(c), (iv)-(b)

View solution

Correct Option: A

Q 03

Marks of a sample are 4, 6, 8, 10, 12. The standard deviation is approximately:

A2.0
B2.83
C3.16
D4.0

View solution

Correct Option: B

$\sigma = \sqrt{40/5} = \sqrt{8} ≈ $ 2.83.

Q 04

Match each measure with its content:

	Measure		Content
(i)	Coefficient of Variation	(a)	Halve the inter-quartile range
(ii)	Quartile Deviation	(b)	$\sigma$ as a percentage of the mean
(iii)	Lorenz Curve	(c)	Difference between largest and smallest values
(iv)	Range	(d)	Graphical measure of inequality

A(i)-(b), (ii)-(a), (iii)-(d), (iv)-(c)
B(i)-(a), (ii)-(b), (iii)-(c), (iv)-(d)
C(i)-(c), (ii)-(d), (iii)-(b), (iv)-(a)
D(i)-(d), (ii)-(c), (iii)-(a), (iv)-(b)

View solution

Correct Option: A

Q 05

For an approximately normal distribution, the empirical ratio of QD : MD : SD is:

A10 : 12 : 15
B6 : 8 : 10
C1 : 1 : 1
D15 : 12 : 10

View solution

Correct Option: A

The standard textbook ratio is 10 : 12 : 15, giving $QD ≈ \dfrac{2}{3}\sigma$ and $MD ≈ \dfrac{4}{5}\sigma$.

Q 06

If a constant 5 is added to every observation in a data set, the standard deviation:

AIncreases by 5
BDecreases by 5
CRemains unchanged
DIncreases by 25

View solution

Correct Option: C

SD is independent of change of origin; adding a constant shifts the data but not the spread.

Q 07

Arrange the following dispersion measures in increasing order of their typical magnitude for a roughly normal distribution: (i) Mean deviation (ii) Quartile deviation (iii) Standard deviation

A(ii), (i), (iii)
B(i), (ii), (iii)
C(iii), (ii), (i)
D(ii), (iii), (i)

View solution

Correct Option: A

Approximately QD < MD < SD in the ratio 10 : 12 : 15.

Q 08

Two batsmen have the same standard deviation of 10 runs but different means: A averages 50, B averages 40. Which batsman is more consistent?

AA — lower coefficient of variation
BB — higher mean
CBoth equally consistent
DCannot be determined

View solution

Correct Option: A

CV(A) = 10/50 = 20 %; CV(B) = 10/40 = 25 %. Lower CV → more consistent. A is more consistent.

Quick recall

Dispersion = scatter of observations around a central value.
Absolute measures: Range, QD, MD, SD, Variance.
Relative measures: Coefficient of Range, Coefficient of QD, Coefficient of MD, CV.
Range = $L − S$; Coefficient of Range = $(L − S)/(L + S)$.
$Q.D. = (Q_3 − Q_1)/2$; covers middle 50 % of observations; suitable for open-ended classes.
MD is least about the median.
SD = $\sqrt{\sum(X − \bar X)^2 / n}$. Pearson 1893. Independent of origin, depends on scale.
Variance = $\sigma^2$; variances add across independent samples; SDs do not.
CV = (σ / mean) × 100 — unit-free; lower CV → more consistent.
Empirical ratio for normal-like distributions: QD : MD : SD ≈ 10 : 12 : 15, so QD ≈ $\frac{2}{3}\sigma$, MD ≈ $\frac{4}{5}\sigma$.
Lorenz Curve (Lorenz 1905) — graphical inequality measure; Gini coefficient is its summary.
Five-number summary (box plot, Tukey 1977): Min, $Q_1$, Median, $Q_3$, Max.

Measure	Absolute	Relative
Range	Range = \(L - S\)	Coefficient of Range = \(\dfrac{L - S}{L + S}\)
Quartile Deviation	\(Q.D. = \dfrac{Q_3 - Q_1}{2}\)	Coefficient of QD = \(\dfrac{Q_3 - Q_1}{Q_3 + Q_1}\)
Mean Deviation	\(M.D.\) from mean / median	Coefficient of MD = \(\dfrac{M.D.}{\bar X}\) or \(\dfrac{M.D.}{M_d}\)
Standard Deviation	\(\sigma\)	Coefficient of Variation \(= \dfrac{\sigma}{\bar X} \times 100\)