flowchart TB
D[Measures of Dispersion] --> AB[Absolute]
D --> REL[Relative]
AB --> R[Range]
AB --> QD[Quartile Deviation]
AB --> MD[Mean Deviation]
AB --> SD[Standard Deviation / Variance]
REL --> CR[Coeff. of Range]
REL --> CQ[Coeff. of QD]
REL --> CM[Coeff. of MD]
REL --> CV[Coeff. of Variation]
classDef default fill:#003366,color:#ffffff,stroke:#ffcc00,stroke-width:3px,rx:10px,ry:10px;
39 Measures of dispersion
39.1 Concept of Dispersion
Dispersion is the extent to which observations in a data set scatter from a central value. A measure of central tendency alone is incomplete — two distributions can share the same mean yet differ markedly in spread. Dispersion measures answer “how widely are the data points spread?” and “how reliable is the average as a summary?”. Dispersion measures are classified into absolute (expressed in the original units — range, mean deviation, standard deviation, variance, quartile deviation) and relative (unit-free ratios suitable for comparing different data sets — coefficient of range, of QD, of MD, of variation).
39.2 Properties of a Good Measure
- Easy to understand and compute.
- Based on all observations (ideal).
- Rigidly defined.
- Capable of further algebraic treatment.
- Not unduly affected by extreme values.
- Less affected by sampling fluctuations.
39.3 Absolute Measures
39.3.1 Range
\[\text{Range} = X_{\max} - X_{\min}\]
Coefficient of Range = \(\frac{X_{\max} - X_{\min}}{X_{\max} + X_{\min}}\)
Simple but ignores all middle values; very sensitive to outliers.
39.3.2 Quartile Deviation (Semi-Interquartile Range)
\[QD = \frac{Q_3 - Q_1}{2}\]
Coefficient of QD = \(\frac{Q_3 - Q_1}{Q_3 + Q_1}\)
The Interquartile Range (IQR) = Q₃ − Q₁ is used in boxplots.
39.3.3 Mean Deviation (Average Deviation)
\[MD = \frac{\sum |X - \bar{X}|}{N}\]
Coefficient of MD = MD / Mean (or Median).
Uses absolute deviations; less algebraically tractable than SD.
39.3.4 Variance and Standard Deviation
The most important measure of dispersion is the standard deviation (σ) — root mean squared deviation from the AM:
\[\sigma = \sqrt{\frac{\sum (X - \bar{X})^2}{N}} \quad \text{or} \quad s = \sqrt{\frac{\sum (X - \bar{X})^2}{n - 1}}\]
(N for population; n−1 for sample — Bessel’s correction).
Variance = σ². Introduced by Karl Pearson in 1893.
- Always non-negative; zero only if all observations equal.
- Based on all observations.
- Affected by extreme values.
- Independent of change of origin — adding constant doesn’t change σ.
- Dependent on change of scale — multiplying by k changes σ by |k|.
- Most rigorously defined and used in further statistical inference.
- For symmetric distributions, ~ 68% of data within 1 σ of mean; 95% within 2 σ; 99.7% within 3 σ (Empirical Rule for normal distributions).
39.3.5 Combined SD
For two groups with sizes n₁, n₂, means x̄₁, x̄₂, SDs σ₁, σ₂:
\[\sigma_c = \sqrt{\frac{n_1(\sigma_1^2 + d_1^2) + n_2(\sigma_2^2 + d_2^2)}{n_1 + n_2}}\]
where d₁ = x̄₁ − x̄_c, d₂ = x̄₂ − x̄_c.
39.4 Relative Measures
39.4.1 Coefficient of Variation (CV)
\[CV = \frac{\sigma}{\bar{X}} \times 100\]
Most popular relative measure; expressed as %. Lower CV → more consistent / less variable.
39.4.2 When to Compare Using CV
- Two data sets with different units.
- Different means but possibly similar absolute spread.
- Investment portfolios — risk per unit of return.
39.5 Lorenz Curve and Gini Coefficient
A Lorenz curve plots cumulative percentage of population (x-axis) against cumulative percentage of variable (income, wealth, output) on y-axis. The 45-degree line is perfect equality; the more bowed the curve, the more unequal the distribution.
The Gini coefficient = \(\frac{\text{Area between Lorenz curve and equality line}}{\text{Total area under equality line}}\). Ranges from 0 (perfect equality) to 1 (perfect inequality).
PYQs often confuse variance with standard deviation. Variance = σ² in squared units; standard deviation = σ in original units.
39.6 Practice Questions
Range of 5, 9, 12, 15, 22 is:
View solution
Q₁ = 20, Q₃ = 40. Quartile deviation:
View solution
If σ = 5, variance is:
View solution
Standard deviation was introduced by:
View solution
If 5 is added to every observation, the standard deviation will:
View solution
If every observation is multiplied by 4, the SD will:
View solution
Mean = 50; SD = 10. Coefficient of Variation:
View solution
For comparing variability of two data sets with different means and units, the best measure is:
View solution
For a normal distribution, approximately what percentage of data lies within ±2σ of the mean?
View solution
The (n − 1) divisor in *sample* variance is called:
View solution
The most sensitive measure of dispersion to outliers is:
View solution
Which measure of dispersion **uses all observations**?
View solution
SD of 2, 4, 4, 4, 5, 5, 7, 9 (population) is approximately:
View solution
The Gini coefficient ranges between:
View solution
A *more consistent* series has:
View solution
Mean Deviation is calculated as:
View solution
The sum of squared deviations is minimum when measured from:
View solution
A Lorenz curve closer to the 45° line indicates:
View solution
The most widely used measure of dispersion is:
View solution
The Interquartile Range (IQR) is:
View solution
39.7 Quick Recall
- Dispersion — spread of data around centre. Absolute vs Relative measures.
- Range = Max − Min; coeff = (Max − Min)/(Max + Min).
- Quartile Deviation = (Q₃ − Q₁)/2; IQR = Q₃ − Q₁.
- Mean Deviation = Σ|X − X̄|/N.
- Standard Deviation (Karl Pearson 1893) = √[Σ(X − X̄)²/N]; Variance = σ². Sample uses n−1 (Bessel’s correction).
- SD properties: non-negative; independent of origin (add constant); proportional to scale (multiply by k).
- Empirical rule for normal: 68/95/99.7 within 1/2/3σ.
- CV = σ/X̄ × 100 — for unit-free comparison; lower CV = more consistent.
- Lorenz curve — cumulative % of variable vs population; Gini ∈ [0,1] — 0 equality, 1 max inequality.
- Σ(X − A)² minimum at A = mean.