Creating longitudinal weights
I'm doing analysis using harmonised BHPS and USoc to look at the impact of tax / benefit policy over several years. I would like to include in the final sample people who are in the survey for most, but not necessarily all, waves (e.g. include those who are present for at least 20 out of the 25 waves). For those who attrit, these people don't have longitudinal weights so I need to design my own.
I have been trying to replicate the BHPS/USoc longitudinal weights first, and then after that am planning on doing a similar exercise for people in most but not all waves. However, I‘ve been unable to get particularly close to the longitudinal weights in the data. I am using the CHAID process and post stratifying the data.
1: Do you have more information on how the longitudinal weights are constructed? E.g. what predictor variables are used?
2: Do you think this is the best way to proceed? An alternative approach – which has been discussed in other forum posts - would be to fit a model (e.g. logit) and adjust the cross-sectional weights accordingly.
#2 Updated by Olena Kaminska over 1 year ago
If you design your own nonresponse correction I would expect the 'weights' to look different to ours but importantly to lead to largely similar results. If you receive a very different result with your own nonresponse correction I would expect that something went wrong in the process. Please make sure to incorporate wave 1 weight when designing your own weights.
If you are using harmonized UKHLS and BHPS data over 20 years I assume you are not using longitudinal data but using cross-sectional data through pooling. In this situation you won't achieve much gain from using your own weights in comparison to ours. We describe the method of calculating our weights in the User Guide. Unfortunately at the moment we do not have a list of variables that go into each model, but please note that over 25 years there will be at least 51 models that led to the weights that you are using - as we calculate weights using the information from the previous wave.
In post-stratification also be careful to match to the statistics on longitudinal population rather than on cross-sectional population that most government statistics is available for.
Also, it largely doesn't matter which method you use to construct the correction - as long as the method is correct and all the parts of the nonresponse and unequal selection probabilities are taken into account it should be fine and yield largely the same results.
Hope this helps,
#4 Updated by Joseph Woods over 1 year ago
Many thanks Olena.
For this analysis I am looking to exploit the longitudinal aspect of the data. Essentially, I'm running each wave through a microsimulation model which allows me to simulate a hypothetical policy reform, and I then follow the impact of this reform on an individual's income over X years to calculate the average impact over this time horizon. (The dataset outputted is long data format, where each row is a unique adult-year combination).
As things currently stand, my sample is limited to adults who appear in BHPS wave 1 and have not attrited by the latest USoc wave - these people have non-zero longitudinal weights in the data. Ideally, I would like more flexibility to include additional observations and boost sample size. For instance, if you first appear in BHPS wave 2 and do not attrit, your longitidunal weight in the latest USoc wave is zero. I would like to create my own non-zero weight for this type of person.
Hopefully this clears up my original post.
#7 Updated by Olena Kaminska about 1 year ago
I see the problem you are facing and I see the reasoning behind it. I will make two further suggestions:
1. If you create your own weights - make sure that you think about the population definition that you want to represent, especially in terms of birth and death. Do you want to represent people you live in 1992 but died since? Keep this in mind when you are describing the dataset and when you are talking about your results.
2. Run the results with the new weights and compare them to the results obtained with our weights. The expectation is that the model values won't change but the confidence intervals /may/ (though don't have to) get smaller - so you may find significant results where the previous results were marginal.
Best of luck,