39  Measures of dispersion

39.1 Concept of Dispersion

Dispersion is the extent to which observations in a data set scatter from a central value. A measure of central tendency alone is incomplete — two distributions can share the same mean yet differ markedly in spread. Dispersion measures answer “how widely are the data points spread?” and “how reliable is the average as a summary?”. Dispersion measures are classified into absolute (expressed in the original units — range, mean deviation, standard deviation, variance, quartile deviation) and relative (unit-free ratios suitable for comparing different data sets — coefficient of range, of QD, of MD, of variation).

39.2 Properties of a Good Measure

TipProperties of a Good Measure of Dispersion
  • Easy to understand and compute.
  • Based on all observations (ideal).
  • Rigidly defined.
  • Capable of further algebraic treatment.
  • Not unduly affected by extreme values.
  • Less affected by sampling fluctuations.

39.3 Absolute Measures

39.3.1 Range

\[\text{Range} = X_{\max} - X_{\min}\]

Coefficient of Range = \(\frac{X_{\max} - X_{\min}}{X_{\max} + X_{\min}}\)

Simple but ignores all middle values; very sensitive to outliers.

39.3.2 Quartile Deviation (Semi-Interquartile Range)

\[QD = \frac{Q_3 - Q_1}{2}\]

Coefficient of QD = \(\frac{Q_3 - Q_1}{Q_3 + Q_1}\)

The Interquartile Range (IQR) = Q₃ − Q₁ is used in boxplots.

39.3.3 Mean Deviation (Average Deviation)

\[MD = \frac{\sum |X - \bar{X}|}{N}\]

Coefficient of MD = MD / Mean (or Median).

Uses absolute deviations; less algebraically tractable than SD.

39.3.4 Variance and Standard Deviation

The most important measure of dispersion is the standard deviation (σ)root mean squared deviation from the AM:

\[\sigma = \sqrt{\frac{\sum (X - \bar{X})^2}{N}} \quad \text{or} \quad s = \sqrt{\frac{\sum (X - \bar{X})^2}{n - 1}}\]

(N for population; n−1 for sample — Bessel’s correction).

Variance = σ². Introduced by Karl Pearson in 1893.

TipProperties of Standard Deviation
  • Always non-negative; zero only if all observations equal.
  • Based on all observations.
  • Affected by extreme values.
  • Independent of change of origin — adding constant doesn’t change σ.
  • Dependent on change of scale — multiplying by k changes σ by |k|.
  • Most rigorously defined and used in further statistical inference.
  • For symmetric distributions, ~ 68% of data within 1 σ of mean; 95% within 2 σ; 99.7% within 3 σ (Empirical Rule for normal distributions).

39.3.5 Combined SD

For two groups with sizes n₁, n₂, means x̄₁, x̄₂, SDs σ₁, σ₂:

\[\sigma_c = \sqrt{\frac{n_1(\sigma_1^2 + d_1^2) + n_2(\sigma_2^2 + d_2^2)}{n_1 + n_2}}\]

where d₁ = x̄₁ − x̄_c, d₂ = x̄₂ − x̄_c.

39.4 Relative Measures

39.4.1 Coefficient of Variation (CV)

\[CV = \frac{\sigma}{\bar{X}} \times 100\]

Most popular relative measure; expressed as %. Lower CV → more consistent / less variable.

39.4.2 When to Compare Using CV

  • Two data sets with different units.
  • Different means but possibly similar absolute spread.
  • Investment portfolios — risk per unit of return.

39.5 Lorenz Curve and Gini Coefficient

A Lorenz curve plots cumulative percentage of population (x-axis) against cumulative percentage of variable (income, wealth, output) on y-axis. The 45-degree line is perfect equality; the more bowed the curve, the more unequal the distribution.

The Gini coefficient = \(\frac{\text{Area between Lorenz curve and equality line}}{\text{Total area under equality line}}\). Ranges from 0 (perfect equality) to 1 (perfect inequality).

NoteDistractor warning

PYQs often confuse variance with standard deviation. Variance = σ² in squared units; standard deviation = σ in original units.

flowchart TB
  D[Measures of Dispersion] --> AB[Absolute]
  D --> REL[Relative]
  AB --> R[Range]
  AB --> QD[Quartile Deviation]
  AB --> MD[Mean Deviation]
  AB --> SD[Standard Deviation / Variance]
  REL --> CR[Coeff. of Range]
  REL --> CQ[Coeff. of QD]
  REL --> CM[Coeff. of MD]
  REL --> CV[Coeff. of Variation]
    classDef default fill:#003366,color:#ffffff,stroke:#ffcc00,stroke-width:3px,rx:10px,ry:10px;

39.6 Practice Questions

Q 01 Range Easy

Range of 5, 9, 12, 15, 22 is:

  • A5
  • B17
  • C22
  • D63
View solution
Correct Option: B
22 − 5 = **17**.
Q 02 QD Medium

Q₁ = 20, Q₃ = 40. Quartile deviation:

  • A10
  • B20
  • C30
  • D60
View solution
Correct Option: A
QD = (40 − 20)/2 = **10**.
Q 03 Variance Easy

If σ = 5, variance is:

  • A5
  • B10
  • C25
  • D2.24
View solution
Correct Option: C
Variance = σ² = **25**.
Q 04 Karl Pearson Medium

Standard deviation was introduced by:

  • AGalton
  • BKarl Pearson (1893)
  • CR.A. Fisher
  • DBowley
View solution
Correct Option: B
**Karl Pearson 1893** — coined "standard deviation".
Q 05 SD properties Medium

If 5 is added to every observation, the standard deviation will:

  • AIncrease by 5
  • BDecrease by 5
  • CRemain unchanged
  • DBecome zero
View solution
Correct Option: C
SD is **independent of change of origin**.
Q 06 Scale Medium

If every observation is multiplied by 4, the SD will:

  • ARemain the same
  • BBe multiplied by 4
  • CBe multiplied by 16
  • DBe divided by 4
View solution
Correct Option: B
SD depends linearly on scale: σ_new = |k| × σ_old.
Q 07 CV Medium

Mean = 50; SD = 10. Coefficient of Variation:

  • A5 %
  • B10 %
  • C20 %
  • D50 %
View solution
Correct Option: C
CV = (10/50) × 100 = **20 %**.
Q 08 Compare Medium

For comparing variability of two data sets with different means and units, the best measure is:

  • ARange
  • BVariance
  • CSD
  • DCoefficient of Variation
View solution
Correct Option: D
**CV** is unit-free — for comparing dispersions across data sets.
Q 09 Empirical Hard

For a normal distribution, approximately what percentage of data lies within ±2σ of the mean?

  • A50 %
  • B68 %
  • C95 %
  • D99.7 %
View solution
Correct Option: C
**95 %** within ±2σ — Empirical Rule (68-95-99.7).
Q 10 Bessel Hard

The (n − 1) divisor in *sample* variance is called:

  • ABowley's correction
  • BBessel's correction
  • CPearson's correction
  • DFisher's correction
View solution
Correct Option: B
**Bessel's correction** gives an unbiased estimator of population variance.
Q 11 Range vs SD Medium

The most sensitive measure of dispersion to outliers is:

  • AQD
  • BMD
  • CRange
  • DCV
View solution
Correct Option: C
**Range** depends only on extremes — most sensitive.
Q 12 All obs Medium

Which measure of dispersion **uses all observations**?

  • ARange
  • BQD
  • CMean Deviation and SD
  • DOnly Range
View solution
Correct Option: C
**MD and SD** use all observations; range and QD use only extremes / quartiles.
Q 13 SD compute Medium

SD of 2, 4, 4, 4, 5, 5, 7, 9 (population) is approximately:

  • A2
  • B2.5
  • C3
  • D4
View solution
Correct Option: A
Mean = 5; squared deviations = 9,1,1,1,0,0,4,16 → Σ = 32; Variance = 32/8 = 4; SD = **2**.
Q 14 Gini Hard

The Gini coefficient ranges between:

  • A−1 and 1
  • B0 and 1
  • C0 and 100
  • D−∞ and ∞
View solution
Correct Option: B
Gini ∈ [0, 1]; 0 = perfect equality, 1 = perfect inequality.
Q 15 Consistency Medium

A *more consistent* series has:

  • AHigher CV
  • BLower CV
  • CHigher mean
  • DLower median
View solution
Correct Option: B
Lower CV → lower relative variability → more consistent.
Q 16 MD Medium

Mean Deviation is calculated as:

  • ASum of squared deviations / N
  • BSum of absolute deviations / N
  • CRange / 2
  • DMax − Min
View solution
Correct Option: B
**MD = Σ|X − X̄| / N**.
Q 17 Min Hard

The sum of squared deviations is minimum when measured from:

  • AMode
  • BMedian
  • CAM
  • DGM
View solution
Correct Option: C
$\sum (X - A)^2$ is minimum when A = **mean**.
Q 18 Lorenz Medium

A Lorenz curve closer to the 45° line indicates:

  • AMore inequality
  • BMore equality
  • CNo relation
  • DHigh variance
View solution
Correct Option: B
Closer to equality line → distribution is more equal.
Q 19 Best measure Medium

The most widely used measure of dispersion is:

  • ARange
  • BQD
  • CStandard Deviation
  • DMD
View solution
Correct Option: C
**SD** — most rigorously defined; foundation of inference.
Q 20 IQR Easy

The Interquartile Range (IQR) is:

  • AQ₃ + Q₁
  • BQ₃ − Q₁
  • C(Q₃ + Q₁) / 2
  • D(Q₃ − Q₁) / 2
View solution
Correct Option: B
**IQR = Q₃ − Q₁**. QD = IQR / 2.

39.7 Quick Recall

ImportantQuick recall
  • Dispersion — spread of data around centre. Absolute vs Relative measures.
  • Range = Max − Min; coeff = (Max − Min)/(Max + Min).
  • Quartile Deviation = (Q₃ − Q₁)/2; IQR = Q₃ − Q₁.
  • Mean Deviation = Σ|X − X̄|/N.
  • Standard Deviation (Karl Pearson 1893) = √[Σ(X − X̄)²/N]; Variance = σ². Sample uses n−1 (Bessel’s correction).
  • SD properties: non-negative; independent of origin (add constant); proportional to scale (multiply by k).
  • Empirical rule for normal: 68/95/99.7 within 1/2/3σ.
  • CV = σ/X̄ × 100 — for unit-free comparison; lower CV = more consistent.
  • Lorenz curve — cumulative % of variable vs population; Gini ∈ [0,1] — 0 equality, 1 max inequality.
  • Σ(X − A)² minimum at A = mean.