Project

General

Profile

Support #21

Merging household and individual data set-wave1, 2009-2010

Added by Anita Staneva over 8 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
High
Assignee:
Redmine Admin
Category:
Data analysis
Target version:
Start date:
01/20/2012
Due date:
% Done:

50%

Estimated time:

Description

I want to merge the data set into one file. I started first with merging all household files. Let’s say I want to merge a_hhsamp with the a_hhresp. I suppose to keep (1 3) of the resulting merge, however I had only merge= 2 3, or I have exactly the number of the first household file in 2, which means the two files were not merged.
I try with distributing household level information to the individual level, where I am using the a_hidp as identifier and follow the example you gave in the documents. Now my merge is fine, but by keeping merge 1 and 3 my sample size increase dramatically and I had duplicate observations.
Next I continue with the I individual files, where I am using a_hidp and a_pno as unique identifier in order to match correctly individual files, however again the resulting merge is not fine.
Could you advise me please how to deal with matching the files? Do you have some users do files which would help us to combine all the data sets from the wave 1, 2009-2010?
Many thanks
Anita

History

#1 Updated by Redmine Admin over 8 years ago

  • Category set to Data analysis
  • Status changed from New to In Progress
  • Assignee set to Redmine Admin
  • % Done changed from 0 to 50

Anita,

I have tried to reconstruct your example here:

use a_hidp using a_hhresp,clear
merge 1:1 a_hidp using a_hhsamp,keepus(a_ivfho_dv)
table a_ivfho_dv _m,row col

-----------------------------------------------------------------------------------------
                                         |                     _merge                    
              household response outcome | using only (2)     matched (3)           Total
-----------------------------------------+-----------------------------------------------
         f2f - all eligible hh intv      |                         21,694          21,694
         f2f - interviews + proxies      |                          2,630           2,630
         f2f - interviews + refusal      |                          5,708           5,708
                 hh comp + ques only     |                            137             137
                 lost capi interview     |             21                              21
                 demolished/derelict     |            605                             605
                 building not complete   |            133                             133
         institution, not private hh     |            198                             198
                 no hh member contact    |          2,240                           2,240
         unable to locate address        |            201                             201
contact made but not with correct people |            526                             526
                 unknown eligibility     |            483                             483
                 other non-contact       |          3,121                           3,121
                 refus to rsrch cntre    |            976                             976
                 refusal to intviewer    |         17,183                          17,183
                 language problems       |            531                             531
                 other ineligible        |         38,921                          38,921
                                         | 
                                   Total |         65,139          30,169          95,308
-----------------------------------------------------------------------------------------

The master data set is a_hhresp.dta, the using data set is a_hhsamp.
The households that match (_m==3) are those with a productive interview outcome, while the unmatched households are those with unproductive outcomes (_m==2).
This fits with the description of a_hhsamp as the data file with data on all enumerated households and a_hhresp for all responding households.
If we had chosen to open a_hhsamp first and then merged it to a_hhresp, the results would have been the same except for the _merge variable would have had the values 1 and 3 instead.

Next I continue with the I individual files, where I am using a_hidp and a_pno as unique identifier in order to match
correctly individual files, however again the resulting merge is not fine.

You can use pidp as the personal identifier on all individual level data files.
Do you have a specific example here?

Some more general advice...
The data are released in a set of data files that allows users to construct working data sets for a multitude of purposes. Due to the relative complex data structure, we recommend that you study the questionnaires and online data documentation and select the variables you need for a given study purpose. In that way, the working data sets remain of a manageable size and there should also be less scope for confusing variables with similar names but different meaning on from different files.
See also free course materials from some of our training courses or news of forthcoming training courses

Hth

Jakob Petersen

#2 Updated by Redmine Admin over 8 years ago

  • Status changed from In Progress to Closed

#3 Updated by Gundi Knies over 4 years ago

  • Target version set to M1

Also available in: Atom PDF