This document discusses how to merge household and individual files from two rounds of a survey, the India Human Development Survey (IHDS). It provides steps to link households and individuals surveyed in both rounds using a linking file, and explains concepts like replacement households, split households, and attrition. The key steps are: 1) Linking the round 2 data to the linking file to get round 1 IDs, 2) Merging this new round 2 file with the round 1 file. The merged files will be a superset containing individuals surveyed in one or both rounds.
3. Relationship between IHDS-I
and IHDS-II households
IHDS-I sample
(N=41,554)
Replacement
households in
IHDS-II (N=2,134)
Split households
from round 1
(N=5,397)
Reinterview
Households
(N=34,621)
Attrition (N=6,911)
Most important
concept in merging
two data files
1. Some households in
round 1 with no
match in round 2
and vice versa
2. Households in
round 1 match with
more than 1
household in round
2
4. Any questions?
Who were chosen for reinterview?
Recontact rate of 83%? What does it mean?
How were replacement households chosen?
What is a split household?
5. What is needed to merge
household files?
1. Round 1 household file – N=41,554
2. Round 2 household file – N=42,152
(Why are there more cases in round 2?)
3. Linking file – N=42,152 – gives Round 1
identification codes for all Round 2
households that were reinterviewed, missing
linking codes for 2,134 households that are
new
6. Step 1 – Link round 2 data to
linking file to get round 1 ID
use linkhh, clear
sort STATEID DISTID PSUID HHID
HHSPLITID
merge 1:1 STATEID DISTID PSUID HHID
HHSPLITID using round2HH
sort STATEID DISTID PSUID HHID2005
HHSPLITID2005, gen(_mergeR2link)
save round2HH_plus, replace
8. Cases in Merged file is superset
Households surveyed in both rounds N=40,018
Households surveyed in round 1 only (attrition)
N=6,911
Households surveyd in round 2 only
(replacement) N=2,134
Total N=49,063
Keep only _mergeR1R2==3 for panel analysis
(N=40,018)
10. Relationship between IHDS-I
and IHDS-II individuals
IHDS-I sample
(N=215,754)
New
individulas, new
HH (N=9,760)
New Ind in R1
HH (N=43,822)
Reinterview Ind
(N=150,995)
HH attrition
(N=29,299)
Ind. attrition in
interview hh
(N=35,464)
Most important
concept in merging
two data files
1. Even reinterview
households have
new members
(births, marriages)
2. Even reinterview
households have
some members who
are no longer there
(deaths, marriages,
migration)
11. What is needed to merge
individual files?
1. Round 1 household file – N=215,754
2. Round 2 household file – N=204,568
(Why are there more cases in round 2?)
3. Linking file – N=204,568 – gives Round 1
identification codes for all Round 2
households that were reinterviewed, missing
linking codes for 2,134 households that are
new
12. Step 1 – Link round 2 data to
linking file to get round 1 ID
use linkind, clear
sort STATEID DISTID PSUID HHID
HHSPLITID PERSONID
merge 1:1 STATEID DISTID PSUID HHID
HHSPLITID PERONID using round2IND
sort STATEID DISTID PSUID HHID2005
HHSPLITID2005, gen(_mergeR2link)
save round2IND_plus, replace
14. Cases in Merged file is superset
Individuals surveyed in both rounds N=150,988
Individuals surveyed in round 1 only
(attrition/death/migration) N=64,766
Individuals surveyd in round 2 only
(replacement/new) N=53,580
Total N=269,334
Keep only _mergeR1R2==3 for panel analysis
(N=150,988)
16. Same process as individual file
linkage
But only one thing to note, there was no ever
married woman file for 2004-5 so you will be
merging with the household file from 2004-5
18. Merging overwrites variables
So if you want to keep variables from round 1
and round 2 separate, before merging you may
want to rename all round 1 variables
Typically we use the command
Rename * x*
Rename xSTATEID STATEID et. For merging
So xr05 will be age in 20045 and r05 will be
age in 2011-12