47  Hypothesis testing: z-test; t-test; ANOVA; Chi-square test; Mann-Whitney test (U-test); Kruskal-Wallis test (H-test); Rank correlation test

47.1 Concept of Hypothesis Testing

A statistical hypothesis test is a decision rule that uses sample data to choose between two competing hypotheses about a population parameter. The framework — devised by Jerzy Neyman and Egon Pearson (1933) building on R.A. Fisher’s work — sets up a null hypothesis (H₀) that we try to reject and an alternative (H₁) that we want to demonstrate. Tests fall into two families: parametric (z, t, F/ANOVA, chi-square — assume distributional form, usually normal) and non-parametric / distribution-free (Mann-Whitney U, Kruskal-Wallis H, Wilcoxon, sign test, run test, Spearman rank correlation — fewer distributional assumptions).

47.2 Five-Step Procedure

TipFive Steps of Hypothesis Testing
  1. State H₀ and H₁ — null and alternative.
  2. Choose level of significance α — typically 0.05 or 0.01.
  3. Select test statistic — z, t, F, χ², U, H, etc.
  4. Compute test statistic and find critical value (or p-value).
  5. Decide: Reject H₀ if test statistic > critical (or p < α); else fail to reject.

47.3 Types of Errors

TipType I and Type II Errors
Decision  Truth H₀ True H₀ False
Reject H₀ Type I error (α) Correct (Power = 1 − β)
Fail to reject H₀ Correct (1 − α) Type II error (β)
  • Type I (α): reject H₀ when it is true — false positive.
  • Type II (β): fail to reject H₀ when it is false — false negative.
  • Power = 1 − β: probability of detecting a true effect.

47.4 Levels of Significance and p-values

Tipα and p-values
  • α = probability of Type I error, chosen before the test (5 %, 1 %).
  • p-value = probability of observing a test statistic as extreme as the actual, if H₀ were true.
  • Decision: Reject H₀ if p < α.

47.5 One-tailed vs Two-tailed Tests

A two-tailed test rejects H₀ when the parameter is significantly different (either direction). A one-tailed test rejects only when the parameter is significantly higher (or only lower).

47.6 Parametric Tests

47.6.1 Z-test

Used when: - Population standard deviation σ is known, or - Sample size is large (n ≥ 30).

\[Z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}}\]

47.6.2 t-test (Student’s t)

Used when σ is unknown and n < 30. Developed by William Gosset (pen-name Student, 1908).

TipVariants of t-test
  • One-sample t-test — test sample mean against a hypothesised μ₀.
  • Independent (two-sample) t-test — compare means of two independent groups.
  • Paired (matched) t-test — same subjects measured before/after.
  • Degrees of freedom = n − 1 (one-sample) or n₁ + n₂ − 2 (two-sample).

47.6.3 F-test and ANOVA

ANOVA (Analysis of Variance) — developed by R.A. Fisher — tests whether means of three or more groups differ. Uses the F-statistic = ratio of between-group variance to within-group variance.

TipTypes of ANOVA
  • One-way ANOVA — one factor with multiple levels.
  • Two-way ANOVA — two factors; tests main effects and interaction.
  • Repeated-measures ANOVA — same subjects measured under several conditions.
  • MANOVA — multiple dependent variables.

47.6.4 Chi-Square Test (χ²)

The χ² test (Karl Pearson 1900) is used for categorical data:

TipThree Uses of χ² Test
  • Goodness of fit — does observed distribution match expected?
  • Independence — are two categorical variables independent in a contingency table?
  • Homogeneity — do several populations have the same distribution?

\[\chi^2 = \sum \frac{(O - E)^2}{E}\]

Degrees of freedom: (rows − 1) × (cols − 1) for independence.

47.7 Non-Parametric Tests

Non-parametric tests do not assume a specific distribution. They are useful when data is ordinal, distribution is non-normal, or sample size is small.

TipCommon Non-Parametric Tests
Non-Parametric Test Parametric Counterpart Use
Mann-Whitney U (Wilcoxon rank-sum) Independent t-test Compare two independent groups
Wilcoxon signed-rank Paired t-test Compare paired observations
Kruskal-Wallis H One-way ANOVA Compare three or more independent groups
Friedman test Repeated-measures ANOVA Compare three or more paired groups
Spearman rank correlation Pearson r Test monotonic association
Sign test t-test (paired) Median test for paired data
Run test Test for randomness
Kolmogorov-Smirnov Test for distribution shape

47.7.1 Mann-Whitney U Test (1947)

Tests whether two independent samples come from the same distribution. Compares the sum of ranks in each group.

47.7.2 Kruskal-Wallis H Test (1952)

Generalisation of Mann-Whitney to three or more groups. Tests whether several independent samples come from the same distribution.

47.7.3 Spearman Rank Correlation Test

Tests significance of Spearman’s ρ — assesses whether a monotonic relationship exists between two variables based on rank.

47.8 Sample Size and Power

Power of a test = 1 − β = probability of correctly rejecting a false null. It depends on: - Effect size (larger → more power). - Sample size n (larger → more power). - α (larger → more power, but more Type I risk). - Variability (lower σ → more power).

flowchart TB
  HT[Hypothesis Tests] --> PAR[Parametric]
  HT --> NP[Non-parametric]
  PAR --> Z[Z-test]
  PAR --> T[t-test]
  PAR --> AN[ANOVA / F-test]
  PAR --> CH[Chi-square]
  NP --> MW[Mann-Whitney U]
  NP --> KW[Kruskal-Wallis H]
  NP --> WS[Wilcoxon signed-rank]
  NP --> SR[Spearman rank corr]
    classDef default fill:#003366,color:#ffffff,stroke:#ffcc00,stroke-width:3px,rx:10px,ry:10px;

NoteDistractor warning

PYQ trap: a small p-value leads to rejection of H₀; not the other way round. Failure to reject H₀ is not evidence H₀ is true — only insufficient evidence to reject.

47.9 Practice Questions

Q 01NullEasy

A small p-value (p < 0.05) indicates:

  • AAccept H₀
  • BReject H₀ at 5 % level
  • CNo effect
  • DPower is low
View solution
Correct Option: B
p < α ⇒ **reject H₀**.
Q 02Type IMedium

Type I error is:

  • ARejecting a true H₀
  • BFailing to reject a false H₀
  • CComputational error
  • DSampling error
View solution
Correct Option: A
**Type I = false positive**; α is its probability.
Q 03TestsMedium

Match each test with its use:

Test Use
(i) z-test (a) Categorical data — independence
(ii) t-test (b) Means of 3+ groups (parametric)
(iii) ANOVA / F-test (c) Mean test with small n, σ unknown
(iv) Chi-square (d) Mean test with large n or σ known
  • A(i)-(d), (ii)-(c), (iii)-(b), (iv)-(a)
  • B(i)-(a), (ii)-(b), (iii)-(c), (iv)-(d)
  • C(i)-(c), (ii)-(d), (iii)-(a), (iv)-(b)
  • D(i)-(b), (ii)-(a), (iii)-(d), (iv)-(c)
View solution
Correct Option: A
z — large n; t — small n; ANOVA — 3+ means; χ² — categorical.
Q 04PearsonMedium

The Chi-square test was introduced by:

  • AKarl Pearson (1900)
  • BR.A. Fisher
  • CGosset
  • DNeyman
View solution
Correct Option: A
**Karl Pearson 1900** — chi-square test.
Q 05ANOVAMedium

ANOVA was developed by:

  • AKarl Pearson
  • BR.A. Fisher
  • CMarkowitz
  • DBayes
View solution
Correct Option: B
**Sir Ronald A. Fisher** — ANOVA, MLE, RCT.
Q 06Non-paramMedium

Match each non-parametric test with its parametric counterpart:

Non-parametric Parametric
(i) Mann-Whitney U (a) One-way ANOVA
(ii) Kruskal-Wallis H (b) Pearson r
(iii) Wilcoxon signed-rank (c) Independent t-test
(iv) Spearman ρ (d) Paired t-test
  • A(i)-(c), (ii)-(a), (iii)-(d), (iv)-(b)
  • B(i)-(a), (ii)-(b), (iii)-(c), (iv)-(d)
  • C(i)-(b), (ii)-(d), (iii)-(a), (iv)-(c)
  • D(i)-(d), (ii)-(c), (iii)-(b), (iv)-(a)
View solution
Correct Option: A
U → independent t; H → ANOVA; signed-rank → paired t; Spearman → Pearson.
Q 07PowerHard

Power of a test is:

  • Aα
  • B1 − α
  • Cβ
  • D1 − β
View solution
Correct Option: D
**Power = 1 − β** — probability of correctly rejecting false H₀.
Q 08t-testMedium

t-test is preferable to z-test when:

  • Aσ is known and n is large
  • Bσ is unknown and n is small (n < 30)
  • CData is categorical
  • DThree groups are compared
View solution
Correct Option: B
t-test — σ unknown, small n.
Q 09χ²Medium

For a 4 × 3 contingency table, the degrees of freedom for χ² test of independence is:

  • A12
  • B6
  • C11
  • D9
View solution
Correct Option: B
df = (4 − 1) × (3 − 1) = **6**.
Q 10One-tailMedium

A one-tailed test is appropriate when:

  • AThe alternative hypothesis specifies a direction
  • BNo direction specified
  • CSample size is small
  • DVariance is unknown
View solution
Correct Option: A
One-tailed — H₁ specifies direction (e.g., μ > μ₀).
Q 11GossetMedium

"Student" of "Student's t-test" was the pen-name of:

  • AR.A. Fisher
  • BWilliam Sealy Gosset
  • CKarl Pearson
  • DNeyman
View solution
Correct Option: B
**W.S. Gosset** wrote as *Student* (1908) while at Guinness.
Q 12F-testMedium

F-statistic in ANOVA is:

  • ABetween-group variance / within-group variance
  • BWithin-group / between-group
  • CSample mean / SD
  • DSum of squares of residuals
View solution
Correct Option: A
**F = MSB / MSW** — between-group MS over within-group MS.
Q 13Neyman-PearsonHard

The framework of formal hypothesis testing is associated with:

  • AFisher only
  • BNeyman and Pearson (1933)
  • CBayes
  • DKolmogorov
View solution
Correct Option: B
**Jerzy Neyman & Egon Pearson 1933**; building on Fisher.
Q 14U-testMedium

The Mann-Whitney U-test compares:

  • ATwo independent samples (non-parametric)
  • BThree or more samples
  • CPaired observations
  • DCategorical data
View solution
Correct Option: A
U-test — two independent samples; non-parametric counterpart of independent t.
Q 15H-testMedium

Kruskal-Wallis H-test is the non-parametric counterpart of:

  • AIndependent t-test
  • BOne-way ANOVA
  • CChi-square test
  • DWilcoxon signed-rank
View solution
Correct Option: B
H-test (1952) — generalises Mann-Whitney to 3+ groups → like ANOVA.
Q 16χ² useMedium

A chi-square test is **not** used for:

  • AGoodness of fit
  • BIndependence in contingency table
  • CComparing means of two groups
  • DHomogeneity across populations
View solution
Correct Option: C
For comparing means use t-test or z-test; χ² is for categorical/frequency data.
Q 17αEasy

A 1 % level of significance means:

  • A99 % probability of accepting H₀
  • B1 % probability of rejecting H₀ when it is true
  • C1 % power
  • DAlways reject H₀
View solution
Correct Option: B
α = 1 % → 1 % chance of Type I error.
Q 18Power factorsHard

Power of a test **increases** when:

  • ASample size decreases
  • BEffect size and sample size increase
  • Cα is reduced
  • DVariance increases
View solution
Correct Option: B
Larger effect, larger n, larger α, smaller σ → higher power.
Q 19df t-testMedium

For an independent two-sample t-test with n₁ = 15 and n₂ = 10, degrees of freedom is:

  • A25
  • B24
  • C23
  • D14
View solution
Correct Option: C
df = n₁ + n₂ − 2 = 25 − 2 = **23**.
Q 20Failing to rejectHard

"Failing to reject H₀" means:

  • AH₀ is true
  • BInsufficient evidence to reject H₀
  • CH₁ is true
  • DSample is biased
View solution
Correct Option: B
Statistical tests can never *prove* H₀ true — only fail to reject.

47.10 Quick Recall

ImportantQuick recall
  • Hypothesis testing framework: Neyman-Pearson (1933) building on Fisher; H₀ (default no effect) vs H₁.
  • Errors: Type I (α, false +), Type II (β, false −). Power = 1 − β.
  • Procedure: H₀ → α → statistic → critical/p-value → decide.
  • Parametric tests: z (σ known or large n), t (Gosset 1908; σ unknown, small n), F/ANOVA (Fisher; 3+ means), χ² (Pearson 1900; categorical — goodness-of-fit, independence, homogeneity).
  • Non-parametric: Mann-Whitney U (vs indep t), Kruskal-Wallis H (vs ANOVA), Wilcoxon signed-rank (vs paired t), Friedman, Spearman ρ, sign, run, KS tests.
  • Decision: reject if p < α.
  • Power ↑ with effect size, n, α; ↓ with σ.
  • Failing to reject ≠ accepting H₀.