Weighting for youth questionnaire
I have a question about sample weighting. I'm using a sub-sample of children who have experienced their parents separating/divorcing & this information I only have available for waves 3, 5 and 7 of USoc. I want to run a regression where my outcome of interest is the child response to a behavioural outcome question in the youth questionnaire. I am including the mothers characteristics as control variables e.g. whether she receivies child support, education levels, marital status etc. I also merge this with the chmain module - where the mother responds about each individual child identified by the childpno. My analysis is not exploiting the panel aspect of the data i.e. I treat each individual wave as a separate cross-section.
I am thinking of using the cross-sectional weights for each wave from the youth file W_ythscub_xw? Please could you advise me if this sounds right?
#2 Updated by Olena Kaminska 3 months ago
Thank you for your question. I just want to clarify a few details of your analysis before answering the question.
Does the information about the mothers (whether she receives child support, education levels, marital status etc) and information from chmain come from the same wave as the information from youth questionnaire? If this information is from the same wave, and the information is available for all youth from the youth questionnaire then the W_ythscub_xw or W_ythscui_xw weight is the correct one to use.
If the information comes from only responding mothers, the W_ythscui(b)_xw will still be good, but it is worth checking how many of the young people have missing information from their mothers. If the proportion is high you may want to add additional nonresponse correction.
Hope this helps,
#3 Updated by Charlotte Edney 3 months ago
Thanks for the prompt and helpful reply.
To clarify, the information about mothers does come from the same wave as the information from the youth questionnaires. But only if the mother responds to the questions (I have merged the indall indresp youth and chmain files). There is a considerable amount of non-response, particularly in the chmain files so I suppose the additional non-response correction would be important, however I'm not at all familiar with it. Could you suggest which variables I would need to use? Or any recommendations for finding out more information?
#5 Updated by Olena Kaminska 2 months ago
The best variables for nonresponse correction are the ones related to your y variable and to nonresponse (at the same time). But they need to be available for respondents and nonrespondents.
If you think that only one or two variables have much missing information and otherwise not much extra missingness with our weights - you may want to consider imputing values of these variables.
Otherwise you can correct for all the missingness (people not responding and missingness in answers) through tailored weighting.
I will be happy to share with you an example of how one can create their own weight. If you are interested, could you email your request to this email please: firstname.lastname@example.org and mention the reference number of this discussion: num1305.
Finally, you can still use the suboptimal weight that was mentioned earlier in the meantime for your analysis.