44 Data Collection and Classification
44.1 Meaning of Data
Data are facts and figures collected, presented and analysed to draw inferences and aid decision-making (kothari2019?; gupta2021?). They form the raw material of research; without good data, even the best methodology yields wrong conclusions.
| Class | Definition | Example |
|---|---|---|
| Primary data | Collected first-hand by the researcher for the specific problem | Survey responses, interviews, observation |
| Secondary data | Collected earlier by someone else for another purpose | Government reports, journals, books, databases |
44.2 Methods of Primary-Data Collection
| Method | Working content | Strengths | Weaknesses |
|---|---|---|---|
| Observation | Researcher directly observes events | Natural data; non-verbal | Time-consuming; subjective |
| Interview (personal / telephonic) | Direct verbal exchange | Depth, flexibility | Cost, interviewer bias |
| Questionnaire (mailed / online) | Structured set of questions, self-administered | Wide reach, anonymous | Low response rate |
| Schedule | Questions filled by enumerator in person | Higher response rate | Cost |
| Focus group | Group discussion with moderator | Rich qualitative insight | Group think; small sample |
| Experimental data | From a controlled experiment | Causal inference | Artificial setting |
44.3 Methods of Secondary-Data Collection
Secondary sources include published sources (books, journals, government reports, NSO, RBI bulletins, World Bank, IMF, OECD, NSSO surveys, Census of India, Economic Survey, EPW, sectoral statistical handbooks) and unpublished sources (firm records, theses, internal reports). Secondary data are cheaper and faster but require careful checking for reliability, suitability and adequacy before use.
44.4 Sources of Indian Economic Data
| Source | Content |
|---|---|
| Census of India (decennial) | Population, demography |
| National Sample Survey (NSS) | Consumption, employment, education, health |
| MoSPI / NSO | National Income, IIP, CPI |
| RBI | Money, banking, finance, BoP |
| CMIE | Industrial and economic data |
| Economic Survey (annual) | Government’s annual review |
| Stat. Abstract / India Stat | Aggregated tabular data |
| World Bank, IMF, IFS, UN, OECD | International comparable data |
44.5 Designing a Questionnaire
A questionnaire’s quality determines the data’s quality. Five principles (kothari2019?):
| Principle | Working content |
|---|---|
| Clarity | Plain language; no ambiguity |
| Brevity | Avoid unnecessary questions |
| Logical order | Easy to hard, general to specific, related questions grouped |
| Avoid leading questions | Wording should not steer the response |
| Pre-test | Pilot the instrument before mass roll-out |
Question types include: closed-ended (yes/no, multiple choice, Likert scale, dichotomous, ranking) and open-ended (free-form responses).
44.6 Scales of Measurement — Stevens (1946)
S.S. Stevens (1946) classified measurement into four levels (stevens1946?):
| Scale | Properties | Permitted operations | Example |
|---|---|---|---|
| Nominal | Identity only | Counting, mode, frequency | Gender, religion, blood group |
| Ordinal | Identity + order | + Median, percentile | Rank, level of agreement (Likert) |
| Interval | + Equal intervals | + Mean, SD, addition / subtraction | Temperature in °C, calendar year |
| Ratio | + True zero | + Multiplication / division, GM, HM | Income, weight, distance |
The defining test: nominal labels; ordinal orders; interval has equal intervals (but no true zero); ratio has a true zero (so ratios are meaningful).
44.7 Classification of Data
Classification is the grouping of data into categories on the basis of common characteristics.
| Basis | Categories |
|---|---|
| Geographical | Region, state, district |
| Chronological | Time periods |
| Qualitative / Descriptive | Attributes (gender, religion) |
| Quantitative | Numerical values; further into discrete or continuous |
A frequency distribution groups quantitative data into class intervals. Key terms: class limits, class boundaries, class mark, class width, frequency, cumulative frequency, relative frequency.
44.8 Tabulation
Tabulation is the systematic arrangement of classified data in rows and columns. A good table has: a clear title, well-marked captions (column headings), stubs (row headings), the body, footnotes and the source. Tabulation aids comparison and is the foundation of further analysis.
44.9 Diagrammatic and Graphical Presentation
| Type | Best for |
|---|---|
| Bar diagram (simple, multiple, sub-divided) | Discrete categorical data |
| Pie chart | Parts of a whole |
| Histogram | Frequency distribution |
| Frequency polygon / curve | Continuous frequency data |
| Ogive (cumulative-frequency curve) | Cumulative frequency, percentile reading |
| Scatter diagram | Bivariate data |
| Line chart | Time series |
| Box plot | Five-number summary |
44.10 Editing, Coding and Tabulation
After collection, data must be processed:
- Editing — checking completeness, accuracy, consistency.
- Coding — assigning numerical codes to non-numeric responses.
- Classification — grouping responses.
- Tabulation — preparing summary tables.
- Tabulation may be by hand, mechanical or electronic — modern practice uses statistical software (SPSS, Stata, R, Python).
44.11 Exam-Pattern MCQs
Q1. Which of the following is primary data?
A. Census of India figures B. RBI Bulletin data C. A market-research survey conducted for this study by the researcher D. Economic Survey
Answer: C. Primary data are first-hand; the other three are secondary.
Q2. Match each scale of measurement with the operation it permits:
| Scale | Operation permitted | ||
|---|---|---|---|
| (i) | Nominal | (a) | Multiplication and division (true zero) |
| (ii) | Ordinal | (b) | Mean, standard deviation, addition |
| (iii) | Interval | (c) | Mode, frequency only |
| (iv) | Ratio | (d) | Median, percentile |
A. (i)-(c), (ii)-(d), (iii)-(b), (iv)-(a) B. (i)-(a), (ii)-(b), (iii)-(c), (iv)-(d) C. (i)-(d), (ii)-(c), (iii)-(b), (iv)-(a) D. (i)-(b), (ii)-(a), (iii)-(d), (iv)-(c)
Answer: A.
Q3. Which of the following is not a method of collecting primary data?
A. Personal interview B. Mailed questionnaire C. Observation D. Reading a published RBI report
Answer: D. Reading a published RBI report is secondary data collection.
Q4. Match each Indian source with what it provides:
| Source | Content | ||
|---|---|---|---|
| (i) | Census of India | (a) | Money, banking, finance, BoP |
| (ii) | NSS | (b) | Population and demography |
| (iii) | RBI | (c) | Government’s annual review |
| (iv) | Economic Survey | (d) | Consumption, employment, education |
A. (i)-(b), (ii)-(d), (iii)-(a), (iv)-(c) B. (i)-(a), (ii)-(b), (iii)-(c), (iv)-(d) C. (i)-(c), (ii)-(a), (iii)-(d), (iv)-(b) D. (i)-(d), (ii)-(c), (iii)-(b), (iv)-(a)
Answer: A.
Q5. Stevens’s four scales of measurement, in order of increasing mathematical structure, are:
A. Nominal, Ordinal, Interval, Ratio B. Ratio, Interval, Ordinal, Nominal C. Ordinal, Nominal, Interval, Ratio D. Interval, Ratio, Nominal, Ordinal
Answer: A. N-O-I-R: Nominal → Ordinal → Interval → Ratio.
Q6. A Likert scale asking “agree / strongly agree / neutral / disagree / strongly disagree” generates data on which scale?
A. Nominal B. Ordinal C. Interval D. Ratio
Answer: B. The categories have order but not equal intervals — ordinal.
Q7. Arrange the following data-processing steps in correct sequence:
- Tabulation
- Editing
- Coding
- Classification
A. (ii), (iii), (iv), (i) B. (i), (ii), (iii), (iv) C. (iv), (iii), (ii), (i) D. (iii), (i), (ii), (iv)
Answer: A. Edit → Code → Classify → Tabulate.
Q8. Match each diagrammatic device with the data it best presents:
| Device | Best for | ||
|---|---|---|---|
| (i) | Histogram | (a) | Time series |
| (ii) | Pie chart | (b) | Five-number summary |
| (iii) | Line chart | (c) | Continuous frequency distribution |
| (iv) | Box plot | (d) | Parts of a whole |
A. (i)-(c), (ii)-(d), (iii)-(a), (iv)-(b) B. (i)-(a), (ii)-(b), (iii)-(c), (iv)-(d) C. (i)-(d), (ii)-(c), (iii)-(b), (iv)-(a) D. (i)-(b), (ii)-(a), (iii)-(d), (iv)-(c)
Answer: A.
- Data = facts and figures used in research; primary (first-hand) vs secondary (existing).
- Primary methods: observation, interview, questionnaire, schedule, focus group, experimental data.
- Major Indian sources: Census, NSS, MoSPI/NSO, RBI, CMIE, Economic Survey, World Bank, IMF.
- Stevens (1946) four scales: Nominal → Ordinal → Interval → Ratio (mnemonic NOIR).
- Likert agreement scales are ordinal.
- Classification basis: geographical, chronological, qualitative, quantitative.
- Frequency distribution: class limit, boundary, mark, width, frequency, cumulative, relative.
- Tabulation parts: title, captions, stubs, body, footnote, source.
- Diagrams: bar, pie, histogram, frequency polygon, ogive, scatter, line, box plot.
- Data processing sequence: Edit → Code → Classify → Tabulate.