38 Measures of Dispersion
38.1 What is Dispersion?
A measure of dispersion — also called a measure of spread, scatter or variability — captures the degree to which observations differ from one another or from a central value (gupta2021?; elhance2020?). The mean tells us the centre; dispersion tells us how the data are scattered around the centre.
Two data sets can have the same mean but very different dispersions. Marks of two students across five tests:
- Student A: 60, 60, 60, 60, 60 — mean 60, dispersion zero (all same).
- Student B: 40, 50, 60, 70, 80 — mean 60, dispersion high (range 40).
Dispersion turns the bare mean into a useful summary.
38.2 Why Dispersion Matters
Four working uses (gupta2021?):
- Test the reliability of an average — a high-dispersion average is less representative.
- Compare two distributions — different products, regions, time periods.
- Identify extremes and control variation — Six Sigma, statistical process control.
- Set up further statistical analysis — variance, regression, hypothesis tests all rest on dispersion.
38.3 Properties of a Good Measure of Dispersion
Adapted from the criteria of Yule and Kendall:
| Property | Working content |
|---|---|
| Rigidly defined | Single, clear formula |
| Easy to compute and understand | Suitable for general use |
| Based on all observations | Reflects the whole data set |
| Suitable for further algebraic treatment | Combinable across groups |
| Not unduly affected by extreme values | Robust |
| Independent of unit of measurement (relative measure) | Allows comparison across data sets in different units |
38.4 Absolute vs Relative Measures
| Measure | Absolute | Relative |
|---|---|---|
| Range | Range = \(L - S\) | Coefficient of Range = \(\dfrac{L - S}{L + S}\) |
| Quartile Deviation | \(Q.D. = \dfrac{Q_3 - Q_1}{2}\) | Coefficient of QD = \(\dfrac{Q_3 - Q_1}{Q_3 + Q_1}\) |
| Mean Deviation | \(M.D.\) from mean / median | Coefficient of MD = \(\dfrac{M.D.}{\bar X}\) or \(\dfrac{M.D.}{M_d}\) |
| Standard Deviation | \(\sigma\) | Coefficient of Variation \(= \dfrac{\sigma}{\bar X} \times 100\) |
The relative measure is unit-free and is used for comparison across data sets with different magnitudes or different units.
38.5 Range
The range is the difference between the largest and the smallest observation:
\[ \text{Range} = L - S \]
The coefficient of range expresses it relative to the sum:
\[ \text{Coefficient of Range} = \dfrac{L - S}{L + S} \]
Range is the simplest measure but is based on only two values and is highly sensitive to outliers. It is mostly used in quality-control charts (R-chart) and weather statistics.
38.6 Quartile Deviation (Semi-Inter-Quartile Range)
The quartile deviation is half the inter-quartile range:
\[ Q.D. = \dfrac{Q_3 - Q_1}{2} \]
The coefficient of QD is unit-free:
\[ \text{Coefficient of QD} = \dfrac{Q_3 - Q_1}{Q_3 + Q_1} \]
QD covers the middle 50 per cent of observations; its insensitivity to extreme values makes it suitable for open-ended distributions (e.g., income brackets with an unbounded top class).
38.7 Mean Deviation
The mean deviation (also called average deviation or mean absolute deviation, MAD) is the arithmetic mean of the absolute deviations from a central value:
\[ M.D._{\bar X} = \dfrac{\sum |X - \bar X|}{n} \quad ; \quad M.D._{M_d} = \dfrac{\sum |X - M_d|}{n} \]
MD is least when computed about the median. The coefficient divides MD by the central value used.
38.8 Standard Deviation
The standard deviation — proposed by Karl Pearson (1893) and the most-used measure of dispersion — is the positive square root of the average of squared deviations from the mean:
\[ \sigma = \sqrt{\dfrac{\sum (X - \bar X)^2}{n}} \quad \text{(ungrouped)} \quad ; \quad \sigma = \sqrt{\dfrac{\sum f (X - \bar X)^2}{N}} \quad \text{(frequency)} \]
The step-deviation shortcut for grouped data:
\[ \sigma = h \cdot \sqrt{\dfrac{\sum f d^2}{N} - \left(\dfrac{\sum f d}{N}\right)^2}, \quad d = \dfrac{m - A}{h} \]
38.8.1 Properties of standard deviation
| Property | Working content |
|---|---|
| Always non-negative | \(\sigma \geq 0\); zero only if all values are equal |
| Independent of change of origin | Adding a constant to every value leaves \(\sigma\) unchanged |
| Affected by change of scale | Multiplying every value by \(k\) multiplies \(\sigma\) by \(|k|\) |
| Suitable for further algebra | Used in variance, covariance, regression, ANOVA |
| Affected by extreme values | Squared deviations magnify outlier effect |
| Combined SD of two groups | \(\sigma_c = \sqrt{\dfrac{n_1 (\sigma_1^2 + d_1^2) + n_2 (\sigma_2^2 + d_2^2)}{n_1 + n_2}}\) where \(d_i = \bar X_i - \bar X_c\) |
38.8.2 Empirical relations among the three measures (normal-like distributions)
For a roughly normal distribution, approximate ratios apply (gupta2021?):
\[ Q.D. : M.D. : \sigma \;\approx\; 10 : 12 : 15 \]
So \(Q.D. \approx \frac{2}{3} \sigma\) and \(M.D. \approx \frac{4}{5} \sigma\) — useful identities for exam questions.
38.9 Variance
The variance is the square of the standard deviation:
\[ \sigma^2 = \dfrac{\sum (X - \bar X)^2}{n} \]
Variance has the same unit as the square of the data, which is why \(\sigma\) is more often reported. But variance is the workhorse of inferential statistics — variances add across independent samples; standard deviations do not.
38.10 Coefficient of Variation (CV)
The coefficient of variation is the standard deviation expressed as a percentage of the mean:
\[ CV = \dfrac{\sigma}{\bar X} \times 100 \]
CV is unit-free and is the standard tool to compare the relative variability of two distributions with different units or magnitudes. The series with the lower CV is more consistent; the series with the higher CV is more variable.
38.10.1 Worked example
Two batsmen’s scores in 10 innings: A — mean 50, SD 10. B — mean 40, SD 10. CV(A) = 20 %; CV(B) = 25 %. Although both have SD = 10, B is relatively more inconsistent.
38.11 Lorenz Curve
The Lorenz Curve, introduced by Max O. Lorenz (1905), is a graphical measure of dispersion — it plots the cumulative percentage of variable (e.g., income) against the cumulative percentage of population. Perfect equality is the 45° line of equal distribution; the further the actual curve sags below, the greater the inequality.
The Gini coefficient, derived from the Lorenz curve, is the most-cited summary measure of inequality.
38.12 Box Plot — A Visual Summary
A box plot (John Tukey, 1977) summarises a distribution with five numbers:
| Number | Meaning |
|---|---|
| Minimum (or lower whisker) | Smallest non-outlier |
| First quartile \(Q_1\) | 25th percentile |
| Median \(Q_2\) | 50th percentile |
| Third quartile \(Q_3\) | 75th percentile |
| Maximum (or upper whisker) | Largest non-outlier |
Outliers are typically defined as values outside \(Q_1 - 1.5 \cdot IQR\) to \(Q_3 + 1.5 \cdot IQR\).
38.13 Comparison of Dispersion Measures
| Measure | Uses all data? | Robust to outliers? | Suitable for algebra? | Best for |
|---|---|---|---|---|
| Range | No | No | No | Quick rough check |
| QD | No | Yes | No | Open-ended distributions |
| MD | Yes | Moderate | Limited (absolute value) | Intuitive teaching |
| SD / Variance | Yes | No | Yes | Inferential statistics |
| CV | Yes | No | Yes | Comparison across units |
38.14 Worked Numerical
Five observations: 4, 6, 8, 10, 12.
- \(\bar X = 8\).
- Deviations from mean: \(-4, -2, 0, 2, 4\). \(\sum (X - \bar X)^2 = 16 + 4 + 0 + 4 + 16 = 40\).
- \(\sigma^2 = 40 / 5 = 8\). \(\sigma = \sqrt{8} \approx 2.828\).
- $CV = 2.828 / 8 = $ 35.36 %.
- Range \(= 12 - 4 = 8\); coefficient of range \(= 8 / 16 = 0.5\).
- Mean deviation about mean = \((4 + 2 + 0 + 2 + 4) / 5 = 2.4\).
38.15 Exam-Pattern MCQs
View solution
| Measure | Formula | ||||
| (i) | Range | (a) | $\sqrt{\sum (X - \bar X)^2 / n}$ | ||
| (ii) | Quartile deviation | (b) | $\sum | X - \bar X | / n$ |
| (iii) | Mean deviation about mean | (c) | $L - S$ | ||
| (iv) | Standard deviation | (d) | $(Q_3 - Q_1) / 2$ |
View solution
View solution
| Measure | Content | ||
| (i) | Coefficient of Variation | (a) | Halve the inter-quartile range |
| (ii) | Quartile Deviation | (b) | $\sigma$ as a percentage of the mean |
| (iii) | Lorenz Curve | (c) | Difference between largest and smallest values |
| (iv) | Range | (d) | Graphical measure of inequality |
View solution
View solution
View solution
View solution
View solution
- Dispersion = scatter of observations around a central value.
- Absolute measures: Range, QD, MD, SD, Variance.
- Relative measures: Coefficient of Range, Coefficient of QD, Coefficient of MD, CV.
- Range = \(L − S\); Coefficient of Range = \((L − S)/(L + S)\).
- \(Q.D. = (Q_3 − Q_1)/2\); covers middle 50 % of observations; suitable for open-ended classes.
- MD is least about the median.
- SD = \(\sqrt{\sum(X − \bar X)^2 / n}\). Pearson 1893. Independent of origin, depends on scale.
- Variance = \(\sigma^2\); variances add across independent samples; SDs do not.
- CV = (σ / mean) × 100 — unit-free; lower CV → more consistent.
- Empirical ratio for normal-like distributions: QD : MD : SD ≈ 10 : 12 : 15, so QD ≈ \(\frac{2}{3}\sigma\), MD ≈ \(\frac{4}{5}\sigma\).
- Lorenz Curve (Lorenz 1905) — graphical inequality measure; Gini coefficient is its summary.
- Five-number summary (box plot, Tukey 1977): Min, \(Q_1\), Median, \(Q_3\), Max.