weights for pooled cross-sections over waves (a)-(f)
I am running hourly wage (constructed with w_paygu_dv) on a number of regressors in a pooled cross-section over all six waves. So far, I am using the whole sample based on GPS, EMBS, BHPS, IEMBS. I am not sure what kind of weights to use in this context given that I want to use all four samples. f_indinui_xw is available for all four for wave 6, so do I just go ahead and use that one?
Any piece of advice would be terrific.
Thanks a lot!
#1 Updated by Victoria Nolan over 3 years ago
- Status changed from New to In Progress
- Assignee changed from Victoria Nolan to Nico Ochmann
- % Done changed from 0 to 10
- Private changed from Yes to No
Many thanks for your enquiry, I am passing it on to our weighting team to look into.
Best wishes, Victoria
On behalf of the Understanding Society data user support team
#2 Updated by Peter Lynn about 3 years ago
- Target version set to X M
- % Done changed from 10 to 50
That would be a correct weight to use, in the sense that it will give population representation when using all 4 samples together. But note that your analysis will then only include people who participated at wave 6.
An alternative is for you to derive a new weight variable, which consists of f_indinui_xw for the wave 6 observations, e_indinub_xw for the wave 5 observations, and so on. See this note, which may help: #494.
#3 Updated by Nico Ochmann about 3 years ago
I appreciate your reply very much. I have read your nice little note you coauthored with Olena. It is quite helpful. Let me first write this to make sure I properly understood your note. It seems to me that although wave 6 has been released, I do not get around this additional wrinkle of rescaling because I am pooling data from all six waves. Given that, I focus on Box 2 of your note. So, I generate for the years 2009-2015 strata_year and psu_year following your coding. For the outcome variable, I replace jbstat with paygu_dv and do the same for all seven years. Now and most important, I must derive the new weight variable given f_indinui_xw for wave 6 and e_indinub_xw for wave 5 etc. At this point I am not quite sure how to proceed and I would certainly appreciate it if you gave me a minor hint in one or two coding lines as to how to combine the original weights f_indinui_xw and e_indinub_xw (let's just stick with the two wave example) into one new weight variable. I looked at the online 'Intro to USoc using Stata' course, which is an excellent resource but it does not have any hints on weighting procedures beyond one wave.
If you happen to have any other resources with regard to my issue, please feel free to make any suggestions.
Thank you very much!
#5 Updated by Peter Lynn about 3 years ago
- % Done changed from 60 to 70
When you pool the data, let's assume you add to each record a variable, wave, to indicate from which wave the record came. Then, you could create the weight with syntax like this:
ge newwgt= f_indinui_xw if wave==6
replace newwgt= e_indinub_xw if wave==5
#7 Updated by Nico Ochmann about 3 years ago
thanks a lot for your reply. I pooled the data for all six waves and added for each record a wave variable. I then went ahead and generated my newwgt variable as follows:
gen newwgt = indinui_xw if wave==6
replace newwgt = indinub_xw if wave==5
replace newwgt = indinub_xw if wave==4
replace newwgt = indinub_xw if wave==3
replace newwgt = indinub_xw if wave==2
replace newwgt = indinus_xw if wave==1
Last but not least, I run logrealhourlywage on x1 x2 [pw=newwgt], cluster(pidp)
Is this reasonable or am I still completely off?
I might have to stop my at your seminar on weighting if I am doing this wrong.