45  Sampling and Estimation

45.1 Population, Sample, Census

A population is the complete set of items or elements under investigation. A sample is a subset of the population chosen for actual study. A census surveys every member of the population (kothari2019?; gupta2021?).

The relationship is encoded in two terminologies:

TipPopulation vs Sample
Concept Population Sample
Mean \(\mu\) \(\bar X\)
Variance \(\sigma^2\) \(s^2\)
Proportion \(P\) \(p\)
Size \(N\) \(n\)
Numbers from data Parameter Statistic

45.2 Why Sample?

Samples are used because of:

  • Cost — surveying the whole population is expensive.
  • Time — sampling is faster.
  • Practicality — destructive testing makes census impossible.
  • Accuracy — well-designed samples can be more accurate than rushed censuses.
  • Comprehensive coverage — a sample can be more thoroughly investigated than each unit in a census.

45.3 Methods of Sampling

Sampling methods divide into probability (each unit has a known, non-zero chance of selection) and non-probability (selection is by judgement or convenience).

TipProbability vs Non-Probability Sampling
Family Methods
Probability Simple random, Stratified, Systematic, Cluster, Multi-stage, PPS
Non-probability Convenience, Judgement / Purposive, Quota, Snowball, Volunteer

45.3.1 Probability sampling — six methods

TipSix Probability-Sampling Methods
Method Working content When to use
Simple Random Sampling (SRS) Each unit has equal chance Homogeneous population
Stratified Random Sampling Population split into strata; sample drawn from each Heterogeneous strata; reduce variance
Systematic Sampling Every \(k\)-th unit after random start Ordered list; large frame
Cluster Sampling Population split into clusters; clusters chosen at random Wide geography; cost reduction
Multi-stage Sampling Sampling at successive stages National household surveys
Probability Proportional to Size (PPS) Larger units have higher chance of selection Clusters of unequal size

45.3.2 Non-probability sampling — five methods

TipFive Non-Probability Sampling Methods
Method Working content
Convenience Whoever is easiest to access
Judgement / Purposive Researcher’s expertise picks units
Quota Specified number from each subgroup, chosen non-randomly
Snowball Existing respondents refer further respondents
Volunteer Self-selected participants (online polls)

45.4 Sampling vs Non-Sampling Errors

TipTwo Sources of Error in Sample Surveys
Error Source Reduced by
Sampling error Random variation between sample and population Larger sample size, better design
Non-sampling error Faulty design, measurement, recording, processing Better questionnaire, training, editing

A census has zero sampling error but typically higher non-sampling error than a well-designed sample.

45.5 Sampling Distribution

The sampling distribution of a statistic is the probability distribution of the statistic computed across all possible samples of a given size from the population.

The most-tested example: the sampling distribution of the sample mean. By the Central Limit Theorem, for large \(n\):

\[ \bar X \sim N\left( \mu, \dfrac{\sigma^2}{n} \right) \]

The standard deviation of the sampling distribution is the standard error:

\[ \text{SE}(\bar X) = \dfrac{\sigma}{\sqrt{n}} \]

For a proportion: \(\text{SE}(p) = \sqrt{p(1-p)/n}\).

45.6 Estimation

Estimation is the process of using sample statistics to infer unknown population parameters (gupta2021?). Two kinds:

  • Point estimation — single best-guess value (e.g. \(\bar X\) for \(\mu\)).
  • Interval estimation — a range with associated confidence level (e.g. 95 % CI).
TipProperties of a Good Estimator
Property Working content
Unbiasedness \(E(\hat\theta) = \theta\)
Consistency \(\hat\theta \to \theta\) as \(n \to \infty\)
Efficiency Smallest variance among unbiased estimators
Sufficiency Uses all relevant information in the sample

45.7 Confidence Intervals

The general form for the population mean \(\mu\), with known \(\sigma\):

\[ \bar X \pm z_{\alpha/2} \cdot \dfrac{\sigma}{\sqrt{n}} \]

For unknown \(\sigma\) and small \(n\) (use Student’s t):

\[ \bar X \pm t_{\alpha/2, n-1} \cdot \dfrac{s}{\sqrt{n}} \]

TipCommon Confidence Levels and Z-Values
Confidence \(\alpha\) \(z_{\alpha/2}\)
90 % 0.10 1.645
95 % 0.05 1.96
99 % 0.01 2.58

45.8 Sample Size Determination

For estimating \(\mu\) within a margin of error \(E\) at confidence \(1 - \alpha\):

\[ n = \left( \dfrac{z_{\alpha/2} \cdot \sigma}{E} \right)^2 \]

For estimating \(P\):

\[ n = \dfrac{z^2 \cdot P(1-P)}{E^2} \]

When \(P\) is unknown, use \(P = 0.5\) for the most conservative (largest) sample size.

45.9 Worked Numerical

A sample of 100 students has mean income ₹50,000 with sample SD ₹5,000.

  • Standard error of mean = \(5,000 / \sqrt{100} = 500\).
  • 95 % confidence interval for \(\mu\): $50,000 = 50,000 = $ (₹49,020, ₹50,980).

To estimate \(\mu\) within ±₹100 at 95 % confidence with \(\sigma\) ≈ ₹5,000:

\[ n = (1.96 \times 5{,}000 / 100)^2 = 98^2 = 9{,}604 \]

45.10 Exam-Pattern MCQs

NoteEight-question set

Q1. Which of the following is not a probability-sampling method?

A. Simple random sampling B. Stratified random sampling C. Convenience sampling D. Cluster sampling

Answer: C. Convenience sampling is non-probability.


Q2. Match each sampling method with its description:

Method Description
(i) Simple Random (a) Population split into strata; random sample from each
(ii) Stratified (b) Every k-th unit after a random start
(iii) Systematic (c) Each unit has equal chance
(iv) Cluster (d) Population split into clusters; some clusters fully sampled

A. (i)-(c), (ii)-(a), (iii)-(b), (iv)-(d) B. (i)-(a), (ii)-(b), (iii)-(c), (iv)-(d) C. (i)-(b), (ii)-(c), (iii)-(d), (iv)-(a) D. (i)-(d), (ii)-(c), (iii)-(a), (iv)-(b)

Answer: A.


Q3. A sample of 400 has mean ₹2,000 and SD ₹100. The standard error of the mean is:

A. ₹0.25 B. ₹5 C. ₹10 D. ₹100

Answer: B. SE = $100 / = 100/20 = $ ₹5.


Q4. Match each property of a good estimator with its meaning:

Property Meaning
(i) Unbiasedness (a) Smallest variance among unbiased estimators
(ii) Consistency (b) Uses all relevant information
(iii) Efficiency (c) \(E(\hat\theta) = \theta\)
(iv) Sufficiency (d) \(\hat\theta \to \theta\) as \(n \to \infty\)

A. (i)-(c), (ii)-(d), (iii)-(a), (iv)-(b) B. (i)-(a), (ii)-(b), (iii)-(c), (iv)-(d) C. (i)-(b), (ii)-(c), (iii)-(d), (iv)-(a) D. (i)-(d), (ii)-(a), (iii)-(b), (iv)-(c)

Answer: A.


Q5. The 95 % confidence z-value is approximately:

A. 1.645 B. 1.96 C. 2.33 D. 2.58

Answer: B. z = 1.96 for 95 % confidence (two-tailed).


Q6. What sample size is needed to estimate a population proportion within ±5 % at 95 % confidence, when \(P\) is unknown?

A. 96 B. 196 C. 385 D. 1,000

Answer: C. $n = (1.96)^2 / (0.05)^2 = 3.8416 / 0.0025 ≈ $ 385.


Q7. Arrange the steps of inferential statistics in correct order:

  1. Compute confidence interval
  2. Determine sample size
  3. Collect sample
  4. Calculate sample statistic and standard error

A. (ii), (iii), (iv), (i) B. (i), (ii), (iii), (iv) C. (iv), (iii), (ii), (i) D. (iii), (iv), (ii), (i)

Answer: A. Sample-size → Collect → Compute statistic and SE → Confidence interval.


Q8. Match each error with its source / mitigation:

Error Source / Mitigation
(i) Sampling error (a) Faulty questionnaire; reduced by training and editing
(ii) Non-sampling error (b) Random variation; reduced by larger n and better design

A. (i)-(b), (ii)-(a) B. (i)-(a), (ii)-(b)

Answer: A.

ImportantQuick recall
  • Population (size \(N\)) vs Sample (size \(n\)). Parameter (\(\mu\), \(\sigma\), \(P\)) vs Statistic (\(\bar X\), \(s\), \(p\)).
  • Probability sampling: SRS, Stratified, Systematic, Cluster, Multi-stage, PPS.
  • Non-probability sampling: Convenience, Judgement, Quota, Snowball, Volunteer.
  • Census has zero sampling error but typically larger non-sampling error.
  • Standard error of mean = \(\sigma / \sqrt{n}\). SE(p) = \(\sqrt{p(1-p)/n}\).
  • CLT: \(\bar X \sim N(\mu, \sigma^2/n)\) for large \(n\).
  • Properties of good estimator: unbiased, consistent, efficient, sufficient.
  • 95 % CI: \(\bar X \pm 1.96 \cdot \sigma / \sqrt{n}\). (Use t for unknown \(\sigma\) and small \(n\).)
  • Sample size: \(n = (z \sigma / E)^2\). For proportions, conservative \(P = 0.5\) → ≈ 385 for ±5 % at 95 %.