46  Sampling and estimation: Concepts; Methods of sampling - probability and non-probability methods; Sampling distribution; Central limit theorem; Standard error; Statistical estimation

46.1 Population, Sample, Census

A population (universe) is the entire set of units relevant to a research question. A sample is a subset of the population. Sampling is the process of selecting a sample, and the sampling design is the plan for doing so. The alternative — census — surveys every unit. Census gives complete coverage but is costly, time-consuming, and often impractical; sampling is quicker, cheaper, and — when properly designed — accurate enough. India conducts a decennial Census (last completed in 2011; 2021 census deferred); routine surveys use sampling. Sampling theory rests on two foundations: the Central Limit Theorem (sample means tend to normal) and the Law of Large Numbers (sample mean converges to population mean).

46.2 Why Sample?

TipAdvantages of Sampling over Census
  • Lower cost.
  • Less time.
  • Greater detail per unit — better quality.
  • Necessary when population is infinite or destructive testing is involved.
  • Reliable under proper sampling design.

46.3 Probability Sampling Methods

In probability sampling, every unit has a known, non-zero probability of selection. Allows statistical inference to the population.

TipMajor Probability Methods
Method Working
Simple Random Sampling (SRS) Every unit has equal chance; with or without replacement
Systematic Sampling Pick every k-th unit after random start; k = N/n
Stratified Sampling Population divided into homogeneous strata; random sample from each
Cluster Sampling Population divided into clusters; some clusters randomly selected; all units in chosen clusters surveyed
Multi-stage Sampling Sample units selected in stages (e.g., state → district → village → household)
Probability Proportional to Size (PPS) Probability of selection proportional to unit size

46.3.1 Stratified vs Cluster Sampling

TipStratified vs Cluster
Aspect Stratified Cluster
Strata composition Homogeneous within, heterogeneous between Heterogeneous within, homogeneous between
Sample From every stratum From selected clusters only
Efficiency Higher precision per cost Lower precision but cheaper
Used when Population is naturally divisible into subgroups Population is geographically dispersed

46.4 Non-Probability Sampling Methods

In non-probability sampling, units are selected on bases other than chance. Inference to population is more risky.

TipMajor Non-Probability Methods
Method Working
Convenience sampling Units chosen because they are easy to access
Purposive / Judgement Researcher selects units based on judgement
Quota Fill predefined quotas of categories (gender, age)
Snowball / Chain referral One respondent refers others — for hidden populations (e.g., immigrants, drug users)
Self-selection / Voluntary Volunteers participate (online polls)

46.5 Sampling Error and Non-Sampling Error

TipTwo Types of Error
Type Source Reduced by
Sampling error Difference between sample estimate and population parameter; arises from chance Larger sample; better design
Non-sampling error Errors in measurement, response, processing, non-response Better instrument and procedures

46.6 Sampling Distribution

The sampling distribution of a statistic (e.g., sample mean) is the probability distribution of its values across all possible samples of a fixed size from the population. Its standard deviation is called the standard error.

TipStandard Error of Sample Mean
Statistic Formula
Standard error of mean (σ known) SE(x̄) = σ/√n
Standard error of mean (σ unknown, sample size large) SE(x̄) = s/√n
Standard error of proportion SE(p̂) = √(p(1−p)/n)

46.7 Central Limit Theorem (CLT) — Revisited

CLT: For random samples of size n from a population with mean μ and finite variance σ², the sampling distribution of x̄ tends to be normal with mean μ and SE σ/√n as n → ∞, regardless of the population’s distribution. Rule of thumb: n ≥ 30.

46.8 Law of Large Numbers

The Law of Large Numbers states that the sample mean converges in probability to the population mean as the sample size grows. This is the formal foundation for “more data is better”.

46.9 Statistical Estimation

Estimation is the use of sample statistics to infer population parameters. Two forms:

TipTwo Forms of Estimation
Form Working
Point estimation Single value (e.g., x̄ for μ)
Interval estimation (Confidence Interval) A range with stated confidence (e.g., 95 % CI for μ)

46.9.1 Properties of a Good Estimator

TipFour Properties (BLUE)
  • Unbiasedness — E(θ̂) = θ.
  • Consistency — θ̂ → θ as n → ∞.
  • Efficiency — minimum variance among unbiased estimators.
  • Sufficiency — uses all information about θ in the sample.
  • (Gauss-Markov: OLS is BLUE — Best Linear Unbiased Estimator.)

46.9.2 Confidence Interval

For a sample mean from a large sample (n ≥ 30):

\[CI = \bar{x} \pm z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}\]

For small samples (n < 30, σ unknown), use t-distribution:

\[CI = \bar{x} \pm t_{\alpha/2, n-1} \cdot \frac{s}{\sqrt{n}}\]

Critical z-values: - 90 % CI: ±1.645 - 95 % CI: ±1.96 - 99 % CI: ±2.58

46.10 Sample Size Determination

For estimating a population mean with margin of error E at confidence level (1 − α):

\[n = \left(\frac{z_{\alpha/2} \cdot \sigma}{E}\right)^2\]

For estimating a population proportion:

\[n = \frac{z_{\alpha/2}^2 \cdot p(1-p)}{E^2}\]

(Maximum at p = 0.5.)

flowchart TB
  S[Sampling Methods] --> PR[Probability]
  S --> NP[Non-Probability]
  PR --> SRS[SRS]
  PR --> SYS[Systematic]
  PR --> STR[Stratified]
  PR --> CL[Cluster]
  PR --> MS[Multi-stage]
  NP --> CN[Convenience]
  NP --> PU[Purposive]
  NP --> QU[Quota]
  NP --> SN[Snowball]
    classDef default fill:#003366,color:#ffffff,stroke:#ffcc00,stroke-width:3px,rx:10px,ry:10px;

NoteDistractor warning

PYQ trap: Standard error of mean = σ/√n, not σ²/n. Confidence interval involves z (or t) multiplied by SE.

46.11 Practice Questions

Q 01DefinitionEasy

A **census** surveys:

  • AA random sample
  • BEvery unit of the population
  • CCluster only
  • D10 % of population
View solution
Correct Option: B
**Census** = complete enumeration.
Q 02SRSEasy

In SRS:

  • AEvery unit has equal chance of selection
  • BSelection is convenience-based
  • CResearcher uses judgement
  • DSnowball referrals are used
View solution
Correct Option: A
Equal-probability random selection.
Q 03MethodsMedium

Match each method with its description:

Method Description
(i) Stratified (a) Random selection of clusters
(ii) Cluster (b) Random selection from each subgroup
(iii) Systematic (c) Every k-th unit selected after random start
(iv) Quota (d) Predefined quotas of categories filled
  • A(i)-(b), (ii)-(a), (iii)-(c), (iv)-(d)
  • B(i)-(a), (ii)-(b), (iii)-(c), (iv)-(d)
  • C(i)-(c), (ii)-(b), (iii)-(a), (iv)-(d)
  • D(i)-(d), (ii)-(c), (iii)-(b), (iv)-(a)
View solution
Correct Option: A
Stratified — from each subgroup; Cluster — clusters; Systematic — k-th; Quota — categories.
Q 04SEMedium

σ = 20; n = 100. Standard error of the sample mean is:

  • A2
  • B0.2
  • C20
  • D100
View solution
Correct Option: A
SE = σ/√n = 20/10 = **2**.
Q 05CLTMedium

By the Central Limit Theorem, the sampling distribution of the sample mean is approximately Normal when:

  • APopulation is normal only
  • Bn ≥ 30, regardless of population distribution
  • Cn < 5
  • Dσ is unknown
View solution
Correct Option: B
**n ≥ 30** rule of thumb; CLT works regardless of population distribution.
Q 06CIMedium

For a 95 % CI for the population mean (large sample), the critical z-value is:

  • A1.645
  • B1.96
  • C2.33
  • D2.58
View solution
Correct Option: B
**z_{0.025} = ±1.96** for 95 % CI.
Q 07CI computeMedium

x̄ = 50, σ = 10, n = 100. 95 % CI for μ:

  • A[40, 60]
  • B[48.04, 51.96]
  • C[45, 55]
  • D[49, 51]
View solution
Correct Option: B
SE = 10/10 = 1; 50 ± 1.96 × 1 = **[48.04, 51.96]**.
Q 08EstimatorMedium

An estimator is *unbiased* if:

  • AIts variance is minimum
  • BE(θ̂) = θ
  • Cθ̂ → θ as n → ∞
  • DIt uses all sample information
View solution
Correct Option: B
**Expected value equals true parameter**.
Q 09ConsistencyMedium

An estimator is *consistent* if:

  • AExpected value equals parameter
  • BVariance is minimum among unbiased estimators
  • CEstimator converges (in probability) to parameter as n → ∞
  • DUses all sample data
View solution
Correct Option: C
**Consistency** — convergence to true value with sample size.
Q 10Strat vs ClusterHard

Stratified vs cluster sampling differ chiefly in that:

  • AStrata are homogeneous within; clusters are heterogeneous within
  • BBoth same
  • CClusters are smaller
  • DStrata are randomly selected
View solution
Correct Option: A
**Stratified** — homogeneous within strata; **Cluster** — heterogeneous within clusters.
Q 11Non-probMedium

Which is a *non-probability* method?

  • ASRS
  • BStratified
  • CConvenience
  • DCluster
View solution
Correct Option: C
**Convenience** — non-probability; the others are probability.
Q 12SnowballMedium

Snowball sampling is most suitable for:

  • ALarge general populations
  • BHidden or hard-to-reach populations (drug users, undocumented migrants)
  • CRandom surveys
  • DCensus
View solution
Correct Option: B
Snowball — chain-referral for hidden populations.
Q 13Sample sizeHard

Required sample size for estimating μ at 95 % confidence with margin of error E = 2, σ = 10:

  • A25
  • B96
  • C100
  • D1000
View solution
Correct Option: B
n = (1.96 × 10 / 2)² = 9.8² ≈ **96**.
Q 14ErrorMedium

Sampling error is reduced primarily by:

  • ABetter questionnaire wording
  • BLarger sample size and better sampling design
  • CBetter enumerator training
  • DSwitch to census
View solution
Correct Option: B
Sampling error ↓ with larger n and better design.
Q 15BLUEHard

By the Gauss-Markov theorem, OLS estimators are:

  • ABest Linear Unbiased Estimators (BLUE)
  • BAlways non-linear
  • CBiased
  • DInefficient
View solution
Correct Option: A
**Gauss-Markov: OLS is BLUE** under classical assumptions.
Q 16SystematicMedium

In systematic sampling with N = 1000 and n = 50, the interval k is:

  • A10
  • B20
  • C50
  • D100
View solution
Correct Option: B
k = N/n = 1000/50 = **20**.
Q 17Census IndiaMedium

India conducts a census every:

  • A5 years
  • B10 years (decennial)
  • C2 years
  • D25 years
View solution
Correct Option: B
Decennial since 1872; last completed **2011**.
Q 18SE proportionHard

Standard error of a sample proportion is:

  • Aσ/√n
  • B√(p(1−p)/n)
  • Cσ²/n
  • Dp/n
View solution
Correct Option: B
**SE(p̂) = √(p(1−p)/n)**.
Q 1999 % zMedium

Critical z for 99 % CI:

  • A1.645
  • B1.96
  • C2.33
  • D2.58
View solution
Correct Option: D
99 % CI → z = **±2.58**.
Q 20QuotaMedium

Quota sampling is:

  • AA probability method
  • BA non-probability method analogous to stratified, but with non-random selection within each quota
  • CSame as census
  • DRandom within strata
View solution
Correct Option: B
**Quota = non-probability analogue of stratified**.

46.12 Quick Recall

ImportantQuick recall
  • Population vs Sample; Sampling vs Census. India — decennial census since 1872; last completed 2011.
  • Probability methods: SRS, Systematic (k = N/n), Stratified (homogeneous within strata), Cluster (heterogeneous within clusters), Multi-stage, PPS.
  • Non-probability methods: Convenience, Purposive, Quota, Snowball, Self-selection.
  • Errors: Sampling (chance, reduced by n) vs Non-sampling (measurement / response / processing).
  • CLT: x̄ ~ Normal(μ, σ/√n) for large n.
  • SE of mean = σ/√n; SE of proportion = √(p(1−p)/n).
  • Estimation: Point vs Interval (CI). BLUE properties — unbiasedness, consistency, efficiency, sufficiency. Gauss-Markov: OLS is BLUE.
  • Critical z: 90 % → 1.645; 95 % → 1.96; 99 % → 2.58.
  • Sample size: n = (z σ / E)² for mean; n = z² p(1−p)/E² for proportion.