44  Data Collection and Classification

44.1 Meaning of Data

Data are facts and figures collected, presented and analysed to draw inferences and aid decision-making (kothari2019?; gupta2021?). They form the raw material of research; without good data, even the best methodology yields wrong conclusions.

TipTwo Major Classes of Data
Class Definition Example
Primary data Collected first-hand by the researcher for the specific problem Survey responses, interviews, observation
Secondary data Collected earlier by someone else for another purpose Government reports, journals, books, databases

44.2 Methods of Primary-Data Collection

TipSix Methods of Collecting Primary Data
Method Working content Strengths Weaknesses
Observation Researcher directly observes events Natural data; non-verbal Time-consuming; subjective
Interview (personal / telephonic) Direct verbal exchange Depth, flexibility Cost, interviewer bias
Questionnaire (mailed / online) Structured set of questions, self-administered Wide reach, anonymous Low response rate
Schedule Questions filled by enumerator in person Higher response rate Cost
Focus group Group discussion with moderator Rich qualitative insight Group think; small sample
Experimental data From a controlled experiment Causal inference Artificial setting

44.3 Methods of Secondary-Data Collection

Secondary sources include published sources (books, journals, government reports, NSO, RBI bulletins, World Bank, IMF, OECD, NSSO surveys, Census of India, Economic Survey, EPW, sectoral statistical handbooks) and unpublished sources (firm records, theses, internal reports). Secondary data are cheaper and faster but require careful checking for reliability, suitability and adequacy before use.

44.4 Sources of Indian Economic Data

TipMajor Indian Sources of Secondary Data
Source Content
Census of India (decennial) Population, demography
National Sample Survey (NSS) Consumption, employment, education, health
MoSPI / NSO National Income, IIP, CPI
RBI Money, banking, finance, BoP
CMIE Industrial and economic data
Economic Survey (annual) Government’s annual review
Stat. Abstract / India Stat Aggregated tabular data
World Bank, IMF, IFS, UN, OECD International comparable data

44.5 Designing a Questionnaire

A questionnaire’s quality determines the data’s quality. Five principles (kothari2019?):

TipFive Principles of Questionnaire Design
Principle Working content
Clarity Plain language; no ambiguity
Brevity Avoid unnecessary questions
Logical order Easy to hard, general to specific, related questions grouped
Avoid leading questions Wording should not steer the response
Pre-test Pilot the instrument before mass roll-out

Question types include: closed-ended (yes/no, multiple choice, Likert scale, dichotomous, ranking) and open-ended (free-form responses).

44.6 Scales of Measurement — Stevens (1946)

S.S. Stevens (1946) classified measurement into four levels (stevens1946?):

TipStevens’s Four Scales of Measurement
Scale Properties Permitted operations Example
Nominal Identity only Counting, mode, frequency Gender, religion, blood group
Ordinal Identity + order + Median, percentile Rank, level of agreement (Likert)
Interval + Equal intervals + Mean, SD, addition / subtraction Temperature in °C, calendar year
Ratio + True zero + Multiplication / division, GM, HM Income, weight, distance

The defining test: nominal labels; ordinal orders; interval has equal intervals (but no true zero); ratio has a true zero (so ratios are meaningful).

44.7 Classification of Data

Classification is the grouping of data into categories on the basis of common characteristics.

TipBases of Classification
Basis Categories
Geographical Region, state, district
Chronological Time periods
Qualitative / Descriptive Attributes (gender, religion)
Quantitative Numerical values; further into discrete or continuous

A frequency distribution groups quantitative data into class intervals. Key terms: class limits, class boundaries, class mark, class width, frequency, cumulative frequency, relative frequency.

44.8 Tabulation

Tabulation is the systematic arrangement of classified data in rows and columns. A good table has: a clear title, well-marked captions (column headings), stubs (row headings), the body, footnotes and the source. Tabulation aids comparison and is the foundation of further analysis.

44.9 Diagrammatic and Graphical Presentation

TipCommon Graphical Devices
Type Best for
Bar diagram (simple, multiple, sub-divided) Discrete categorical data
Pie chart Parts of a whole
Histogram Frequency distribution
Frequency polygon / curve Continuous frequency data
Ogive (cumulative-frequency curve) Cumulative frequency, percentile reading
Scatter diagram Bivariate data
Line chart Time series
Box plot Five-number summary

44.10 Editing, Coding and Tabulation

After collection, data must be processed:

  • Editing — checking completeness, accuracy, consistency.
  • Coding — assigning numerical codes to non-numeric responses.
  • Classification — grouping responses.
  • Tabulation — preparing summary tables.
  • Tabulation may be by hand, mechanical or electronic — modern practice uses statistical software (SPSS, Stata, R, Python).

44.11 Exam-Pattern MCQs

NoteEight-question set

Q1. Which of the following is primary data?

A. Census of India figures B. RBI Bulletin data C. A market-research survey conducted for this study by the researcher D. Economic Survey

Answer: C. Primary data are first-hand; the other three are secondary.


Q2. Match each scale of measurement with the operation it permits:

Scale Operation permitted
(i) Nominal (a) Multiplication and division (true zero)
(ii) Ordinal (b) Mean, standard deviation, addition
(iii) Interval (c) Mode, frequency only
(iv) Ratio (d) Median, percentile

A. (i)-(c), (ii)-(d), (iii)-(b), (iv)-(a) B. (i)-(a), (ii)-(b), (iii)-(c), (iv)-(d) C. (i)-(d), (ii)-(c), (iii)-(b), (iv)-(a) D. (i)-(b), (ii)-(a), (iii)-(d), (iv)-(c)

Answer: A.


Q3. Which of the following is not a method of collecting primary data?

A. Personal interview B. Mailed questionnaire C. Observation D. Reading a published RBI report

Answer: D. Reading a published RBI report is secondary data collection.


Q4. Match each Indian source with what it provides:

Source Content
(i) Census of India (a) Money, banking, finance, BoP
(ii) NSS (b) Population and demography
(iii) RBI (c) Government’s annual review
(iv) Economic Survey (d) Consumption, employment, education

A. (i)-(b), (ii)-(d), (iii)-(a), (iv)-(c) B. (i)-(a), (ii)-(b), (iii)-(c), (iv)-(d) C. (i)-(c), (ii)-(a), (iii)-(d), (iv)-(b) D. (i)-(d), (ii)-(c), (iii)-(b), (iv)-(a)

Answer: A.


Q5. Stevens’s four scales of measurement, in order of increasing mathematical structure, are:

A. Nominal, Ordinal, Interval, Ratio B. Ratio, Interval, Ordinal, Nominal C. Ordinal, Nominal, Interval, Ratio D. Interval, Ratio, Nominal, Ordinal

Answer: A. N-O-I-R: Nominal → Ordinal → Interval → Ratio.


Q6. A Likert scale asking “agree / strongly agree / neutral / disagree / strongly disagree” generates data on which scale?

A. Nominal B. Ordinal C. Interval D. Ratio

Answer: B. The categories have order but not equal intervalsordinal.


Q7. Arrange the following data-processing steps in correct sequence:

  1. Tabulation
  2. Editing
  3. Coding
  4. Classification

A. (ii), (iii), (iv), (i) B. (i), (ii), (iii), (iv) C. (iv), (iii), (ii), (i) D. (iii), (i), (ii), (iv)

Answer: A. Edit → Code → Classify → Tabulate.


Q8. Match each diagrammatic device with the data it best presents:

Device Best for
(i) Histogram (a) Time series
(ii) Pie chart (b) Five-number summary
(iii) Line chart (c) Continuous frequency distribution
(iv) Box plot (d) Parts of a whole

A. (i)-(c), (ii)-(d), (iii)-(a), (iv)-(b) B. (i)-(a), (ii)-(b), (iii)-(c), (iv)-(d) C. (i)-(d), (ii)-(c), (iii)-(b), (iv)-(a) D. (i)-(b), (ii)-(a), (iii)-(d), (iv)-(c)

Answer: A.

ImportantQuick recall
  • Data = facts and figures used in research; primary (first-hand) vs secondary (existing).
  • Primary methods: observation, interview, questionnaire, schedule, focus group, experimental data.
  • Major Indian sources: Census, NSS, MoSPI/NSO, RBI, CMIE, Economic Survey, World Bank, IMF.
  • Stevens (1946) four scales: Nominal → Ordinal → Interval → Ratio (mnemonic NOIR).
  • Likert agreement scales are ordinal.
  • Classification basis: geographical, chronological, qualitative, quantitative.
  • Frequency distribution: class limit, boundary, mark, width, frequency, cumulative, relative.
  • Tabulation parts: title, captions, stubs, body, footnote, source.
  • Diagrams: bar, pie, histogram, frequency polygon, ogive, scatter, line, box plot.
  • Data processing sequence: Edit → Code → Classify → Tabulate.