flowchart TB
HT[Hypothesis Tests] --> PAR[Parametric]
HT --> NP[Non-parametric]
PAR --> Z[Z-test]
PAR --> T[t-test]
PAR --> AN[ANOVA / F-test]
PAR --> CH[Chi-square]
NP --> MW[Mann-Whitney U]
NP --> KW[Kruskal-Wallis H]
NP --> WS[Wilcoxon signed-rank]
NP --> SR[Spearman rank corr]
classDef default fill:#003366,color:#ffffff,stroke:#ffcc00,stroke-width:3px,rx:10px,ry:10px;
47 Hypothesis testing: z-test; t-test; ANOVA; Chi-square test; Mann-Whitney test (U-test); Kruskal-Wallis test (H-test); Rank correlation test
47.1 Concept of Hypothesis Testing
A statistical hypothesis test is a decision rule that uses sample data to choose between two competing hypotheses about a population parameter. The framework — devised by Jerzy Neyman and Egon Pearson (1933) building on R.A. Fisher’s work — sets up a null hypothesis (H₀) that we try to reject and an alternative (H₁) that we want to demonstrate. Tests fall into two families: parametric (z, t, F/ANOVA, chi-square — assume distributional form, usually normal) and non-parametric / distribution-free (Mann-Whitney U, Kruskal-Wallis H, Wilcoxon, sign test, run test, Spearman rank correlation — fewer distributional assumptions).
47.2 Five-Step Procedure
- State H₀ and H₁ — null and alternative.
- Choose level of significance α — typically 0.05 or 0.01.
- Select test statistic — z, t, F, χ², U, H, etc.
- Compute test statistic and find critical value (or p-value).
- Decide: Reject H₀ if test statistic > critical (or p < α); else fail to reject.
47.3 Types of Errors
| Decision Truth | H₀ True | H₀ False |
|---|---|---|
| Reject H₀ | Type I error (α) | Correct (Power = 1 − β) |
| Fail to reject H₀ | Correct (1 − α) | Type II error (β) |
- Type I (α): reject H₀ when it is true — false positive.
- Type II (β): fail to reject H₀ when it is false — false negative.
- Power = 1 − β: probability of detecting a true effect.
47.4 Levels of Significance and p-values
- α = probability of Type I error, chosen before the test (5 %, 1 %).
- p-value = probability of observing a test statistic as extreme as the actual, if H₀ were true.
- Decision: Reject H₀ if p < α.
47.5 One-tailed vs Two-tailed Tests
A two-tailed test rejects H₀ when the parameter is significantly different (either direction). A one-tailed test rejects only when the parameter is significantly higher (or only lower).
47.6 Parametric Tests
47.6.1 Z-test
Used when: - Population standard deviation σ is known, or - Sample size is large (n ≥ 30).
\[Z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}}\]
47.6.2 t-test (Student’s t)
Used when σ is unknown and n < 30. Developed by William Gosset (pen-name Student, 1908).
- One-sample t-test — test sample mean against a hypothesised μ₀.
- Independent (two-sample) t-test — compare means of two independent groups.
- Paired (matched) t-test — same subjects measured before/after.
- Degrees of freedom = n − 1 (one-sample) or n₁ + n₂ − 2 (two-sample).
47.6.3 F-test and ANOVA
ANOVA (Analysis of Variance) — developed by R.A. Fisher — tests whether means of three or more groups differ. Uses the F-statistic = ratio of between-group variance to within-group variance.
- One-way ANOVA — one factor with multiple levels.
- Two-way ANOVA — two factors; tests main effects and interaction.
- Repeated-measures ANOVA — same subjects measured under several conditions.
- MANOVA — multiple dependent variables.
47.6.4 Chi-Square Test (χ²)
The χ² test (Karl Pearson 1900) is used for categorical data:
- Goodness of fit — does observed distribution match expected?
- Independence — are two categorical variables independent in a contingency table?
- Homogeneity — do several populations have the same distribution?
\[\chi^2 = \sum \frac{(O - E)^2}{E}\]
Degrees of freedom: (rows − 1) × (cols − 1) for independence.
47.7 Non-Parametric Tests
Non-parametric tests do not assume a specific distribution. They are useful when data is ordinal, distribution is non-normal, or sample size is small.
| Non-Parametric Test | Parametric Counterpart | Use |
|---|---|---|
| Mann-Whitney U (Wilcoxon rank-sum) | Independent t-test | Compare two independent groups |
| Wilcoxon signed-rank | Paired t-test | Compare paired observations |
| Kruskal-Wallis H | One-way ANOVA | Compare three or more independent groups |
| Friedman test | Repeated-measures ANOVA | Compare three or more paired groups |
| Spearman rank correlation | Pearson r | Test monotonic association |
| Sign test | t-test (paired) | Median test for paired data |
| Run test | — | Test for randomness |
| Kolmogorov-Smirnov | — | Test for distribution shape |
47.7.1 Mann-Whitney U Test (1947)
Tests whether two independent samples come from the same distribution. Compares the sum of ranks in each group.
47.7.2 Kruskal-Wallis H Test (1952)
Generalisation of Mann-Whitney to three or more groups. Tests whether several independent samples come from the same distribution.
47.7.3 Spearman Rank Correlation Test
Tests significance of Spearman’s ρ — assesses whether a monotonic relationship exists between two variables based on rank.
47.8 Sample Size and Power
Power of a test = 1 − β = probability of correctly rejecting a false null. It depends on: - Effect size (larger → more power). - Sample size n (larger → more power). - α (larger → more power, but more Type I risk). - Variability (lower σ → more power).
PYQ trap: a small p-value leads to rejection of H₀; not the other way round. Failure to reject H₀ is not evidence H₀ is true — only insufficient evidence to reject.
47.9 Practice Questions
A small p-value (p < 0.05) indicates:
View solution
Type I error is:
View solution
Match each test with its use:
| Test | Use | ||
| (i) | z-test | (a) | Categorical data — independence |
| (ii) | t-test | (b) | Means of 3+ groups (parametric) |
| (iii) | ANOVA / F-test | (c) | Mean test with small n, σ unknown |
| (iv) | Chi-square | (d) | Mean test with large n or σ known |
View solution
The Chi-square test was introduced by:
View solution
ANOVA was developed by:
View solution
Match each non-parametric test with its parametric counterpart:
| Non-parametric | Parametric | ||
| (i) | Mann-Whitney U | (a) | One-way ANOVA |
| (ii) | Kruskal-Wallis H | (b) | Pearson r |
| (iii) | Wilcoxon signed-rank | (c) | Independent t-test |
| (iv) | Spearman ρ | (d) | Paired t-test |
View solution
Power of a test is:
View solution
t-test is preferable to z-test when:
View solution
For a 4 × 3 contingency table, the degrees of freedom for χ² test of independence is:
View solution
A one-tailed test is appropriate when:
View solution
"Student" of "Student's t-test" was the pen-name of:
View solution
F-statistic in ANOVA is:
View solution
The framework of formal hypothesis testing is associated with:
View solution
The Mann-Whitney U-test compares:
View solution
Kruskal-Wallis H-test is the non-parametric counterpart of:
View solution
A chi-square test is **not** used for:
View solution
A 1 % level of significance means:
View solution
Power of a test **increases** when:
View solution
For an independent two-sample t-test with n₁ = 15 and n₂ = 10, degrees of freedom is:
View solution
"Failing to reject H₀" means:
View solution
47.10 Quick Recall
- Hypothesis testing framework: Neyman-Pearson (1933) building on Fisher; H₀ (default no effect) vs H₁.
- Errors: Type I (α, false +), Type II (β, false −). Power = 1 − β.
- Procedure: H₀ → α → statistic → critical/p-value → decide.
- Parametric tests: z (σ known or large n), t (Gosset 1908; σ unknown, small n), F/ANOVA (Fisher; 3+ means), χ² (Pearson 1900; categorical — goodness-of-fit, independence, homogeneity).
- Non-parametric: Mann-Whitney U (vs indep t), Kruskal-Wallis H (vs ANOVA), Wilcoxon signed-rank (vs paired t), Friedman, Spearman ρ, sign, run, KS tests.
- Decision: reject if p < α.
- Power ↑ with effect size, n, α; ↓ with σ.
- Failing to reject ≠ accepting H₀.