Missing parental data
I'm merging data from the USoc Wave 1 youth questionnaire (a_youth) and the Wave 1 adult individual questionnaire (a_indresp) so that I can consider the role of parents' characteristics in my analysis of young people's occupational aspirations. I'm using the variables a_mnspid (to identify mothers) and a_fnspid (to identify fathers) in a_youth and the variable pidp in a_indresp to match young people with their parents. I'm using the /TABLE command in SPSS to carry out a one-to-many match so that siblings in a_youth (who have the same value for a_mnspid or a_fnspid) all receive parental data from a_indresp.
After the matching there is quite a lot of missing data. 360 cases have missing data on mothers: 190 of these are coded -8 for a_mnspid ("natural/adoptive/step mother not in household"), and the remaining 170 have a value for a_mnspid which does not match any of the cross-wave person identifiers (pidp) in a_indresp.
Likewise, 1840 cases have missing data on fathers. 1349 of these are coded -8 for a_fnspid ("natural/adoptive/step father not in household"), and the remaining 491 have a value for a_fnspid which does not match any of the cross-wave person identifiers (pidp) in a_indresp.
My three questions are:
1) How should I interpret a value of -8 for a_mnspid or a_fnspid?
2) Why are there parent identifiers in the youth dataset which do not exist in the main adult dataset?
3) Is there a better way to go about matching data on young people with data on their parents?
Thanks for your help,
#1 Updated by j petersen almost 7 years ago
I don't have access to the data right now to reproduce your results, but can offer a few hints...
The relationships are recorded on the household grid with reference to household and person numbers (can be found on w_INDALL for all household members). The datasets also holds some cross-wave personal identifiers or pids, see e.g.;
Pids are only assigned to individuals that have been enumerated at some point during the study and -8 values represent the opposite case.
w_INDRESP only holds data on adult respondents and is a subset of w_INDALL. Non-response can occur if the person is temporarily away, refuses or for other reasons is unable to take part (see w_IVFIO for interview outcome info).
Your description sounds fine. The user guide should also has an example of creating a dataset based on relationship variables.