flowchart TB
S[Sampling Methods] --> PR[Probability]
S --> NP[Non-Probability]
PR --> SRS[SRS]
PR --> SYS[Systematic]
PR --> STR[Stratified]
PR --> CL[Cluster]
PR --> MS[Multi-stage]
NP --> CN[Convenience]
NP --> PU[Purposive]
NP --> QU[Quota]
NP --> SN[Snowball]
classDef default fill:#003366,color:#ffffff,stroke:#ffcc00,stroke-width:3px,rx:10px,ry:10px;
46 Sampling and estimation: Concepts; Methods of sampling - probability and non-probability methods; Sampling distribution; Central limit theorem; Standard error; Statistical estimation
46.1 Population, Sample, Census
A population (universe) is the entire set of units relevant to a research question. A sample is a subset of the population. Sampling is the process of selecting a sample, and the sampling design is the plan for doing so. The alternative — census — surveys every unit. Census gives complete coverage but is costly, time-consuming, and often impractical; sampling is quicker, cheaper, and — when properly designed — accurate enough. India conducts a decennial Census (last completed in 2011; 2021 census deferred); routine surveys use sampling. Sampling theory rests on two foundations: the Central Limit Theorem (sample means tend to normal) and the Law of Large Numbers (sample mean converges to population mean).
46.2 Why Sample?
- Lower cost.
- Less time.
- Greater detail per unit — better quality.
- Necessary when population is infinite or destructive testing is involved.
- Reliable under proper sampling design.
46.3 Probability Sampling Methods
In probability sampling, every unit has a known, non-zero probability of selection. Allows statistical inference to the population.
| Method | Working |
|---|---|
| Simple Random Sampling (SRS) | Every unit has equal chance; with or without replacement |
| Systematic Sampling | Pick every k-th unit after random start; k = N/n |
| Stratified Sampling | Population divided into homogeneous strata; random sample from each |
| Cluster Sampling | Population divided into clusters; some clusters randomly selected; all units in chosen clusters surveyed |
| Multi-stage Sampling | Sample units selected in stages (e.g., state → district → village → household) |
| Probability Proportional to Size (PPS) | Probability of selection proportional to unit size |
46.3.1 Stratified vs Cluster Sampling
| Aspect | Stratified | Cluster |
|---|---|---|
| Strata composition | Homogeneous within, heterogeneous between | Heterogeneous within, homogeneous between |
| Sample | From every stratum | From selected clusters only |
| Efficiency | Higher precision per cost | Lower precision but cheaper |
| Used when | Population is naturally divisible into subgroups | Population is geographically dispersed |
46.4 Non-Probability Sampling Methods
In non-probability sampling, units are selected on bases other than chance. Inference to population is more risky.
| Method | Working |
|---|---|
| Convenience sampling | Units chosen because they are easy to access |
| Purposive / Judgement | Researcher selects units based on judgement |
| Quota | Fill predefined quotas of categories (gender, age) |
| Snowball / Chain referral | One respondent refers others — for hidden populations (e.g., immigrants, drug users) |
| Self-selection / Voluntary | Volunteers participate (online polls) |
46.5 Sampling Error and Non-Sampling Error
| Type | Source | Reduced by |
|---|---|---|
| Sampling error | Difference between sample estimate and population parameter; arises from chance | Larger sample; better design |
| Non-sampling error | Errors in measurement, response, processing, non-response | Better instrument and procedures |
46.6 Sampling Distribution
The sampling distribution of a statistic (e.g., sample mean) is the probability distribution of its values across all possible samples of a fixed size from the population. Its standard deviation is called the standard error.
| Statistic | Formula |
|---|---|
| Standard error of mean (σ known) | SE(x̄) = σ/√n |
| Standard error of mean (σ unknown, sample size large) | SE(x̄) = s/√n |
| Standard error of proportion | SE(p̂) = √(p(1−p)/n) |
46.7 Central Limit Theorem (CLT) — Revisited
CLT: For random samples of size n from a population with mean μ and finite variance σ², the sampling distribution of x̄ tends to be normal with mean μ and SE σ/√n as n → ∞, regardless of the population’s distribution. Rule of thumb: n ≥ 30.
46.8 Law of Large Numbers
The Law of Large Numbers states that the sample mean converges in probability to the population mean as the sample size grows. This is the formal foundation for “more data is better”.
46.9 Statistical Estimation
Estimation is the use of sample statistics to infer population parameters. Two forms:
| Form | Working |
|---|---|
| Point estimation | Single value (e.g., x̄ for μ) |
| Interval estimation (Confidence Interval) | A range with stated confidence (e.g., 95 % CI for μ) |
46.9.1 Properties of a Good Estimator
- Unbiasedness — E(θ̂) = θ.
- Consistency — θ̂ → θ as n → ∞.
- Efficiency — minimum variance among unbiased estimators.
- Sufficiency — uses all information about θ in the sample.
- (Gauss-Markov: OLS is BLUE — Best Linear Unbiased Estimator.)
46.9.2 Confidence Interval
For a sample mean from a large sample (n ≥ 30):
\[CI = \bar{x} \pm z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}\]
For small samples (n < 30, σ unknown), use t-distribution:
\[CI = \bar{x} \pm t_{\alpha/2, n-1} \cdot \frac{s}{\sqrt{n}}\]
Critical z-values: - 90 % CI: ±1.645 - 95 % CI: ±1.96 - 99 % CI: ±2.58
46.10 Sample Size Determination
For estimating a population mean with margin of error E at confidence level (1 − α):
\[n = \left(\frac{z_{\alpha/2} \cdot \sigma}{E}\right)^2\]
For estimating a population proportion:
\[n = \frac{z_{\alpha/2}^2 \cdot p(1-p)}{E^2}\]
(Maximum at p = 0.5.)
PYQ trap: Standard error of mean = σ/√n, not σ²/n. Confidence interval involves z (or t) multiplied by SE.
46.11 Practice Questions
A **census** surveys:
View solution
In SRS:
View solution
Match each method with its description:
| Method | Description | ||
| (i) | Stratified | (a) | Random selection of clusters |
| (ii) | Cluster | (b) | Random selection from each subgroup |
| (iii) | Systematic | (c) | Every k-th unit selected after random start |
| (iv) | Quota | (d) | Predefined quotas of categories filled |
View solution
σ = 20; n = 100. Standard error of the sample mean is:
View solution
By the Central Limit Theorem, the sampling distribution of the sample mean is approximately Normal when:
View solution
For a 95 % CI for the population mean (large sample), the critical z-value is:
View solution
x̄ = 50, σ = 10, n = 100. 95 % CI for μ:
View solution
An estimator is *unbiased* if:
View solution
An estimator is *consistent* if:
View solution
Stratified vs cluster sampling differ chiefly in that:
View solution
Which is a *non-probability* method?
View solution
Snowball sampling is most suitable for:
View solution
Required sample size for estimating μ at 95 % confidence with margin of error E = 2, σ = 10:
View solution
Sampling error is reduced primarily by:
View solution
By the Gauss-Markov theorem, OLS estimators are:
View solution
In systematic sampling with N = 1000 and n = 50, the interval k is:
View solution
India conducts a census every:
View solution
Standard error of a sample proportion is:
View solution
Critical z for 99 % CI:
View solution
Quota sampling is:
View solution
46.12 Quick Recall
- Population vs Sample; Sampling vs Census. India — decennial census since 1872; last completed 2011.
- Probability methods: SRS, Systematic (k = N/n), Stratified (homogeneous within strata), Cluster (heterogeneous within clusters), Multi-stage, PPS.
- Non-probability methods: Convenience, Purposive, Quota, Snowball, Self-selection.
- Errors: Sampling (chance, reduced by n) vs Non-sampling (measurement / response / processing).
- CLT: x̄ ~ Normal(μ, σ/√n) for large n.
- SE of mean = σ/√n; SE of proportion = √(p(1−p)/n).
- Estimation: Point vs Interval (CI). BLUE properties — unbiasedness, consistency, efficiency, sufficiency. Gauss-Markov: OLS is BLUE.
- Critical z: 90 % → 1.645; 95 % → 1.96; 99 % → 2.58.
- Sample size: n = (z σ / E)² for mean; n = z² p(1−p)/E² for proportion.