Support #1248

Longitudinal Weights- BHPS and UKHLS

Added by Helen Burkhardt 12 months ago. Updated 11 months ago.

Target version:
Start date:
Due date:
% Done:


Estimated time:


I am working with data from the adult 16+ individual questionnaire, namely the *_indresp data files. I am interested in the subjective health measure and mental health score questions that are consistent across several waves of both the UKHLS and BHPS.

I am interested in developing estimates using a weighted longitudinal sample of any individual who answered one of these health questions during any wave of the BHPS or UKHLS. Thus, the sample would include individuals who only appeared in the BHPS, only appeared in the UKHLS, and appeared in both the BHPS and UKHLS.

I see that longitudinal survey weights are provided by each survey separately, such as the set of longitudinal survey weights for enumerated individuals from the BHPS (_lewght) and another set of longitudinal survey weights for individuals from the UKHLS (_indinub_lw). Is there a set of provided weights that is designed for dealing with the UKHLS and BHPS jointly?

Alternatively, I am interested in constructing some derivative of the provided longitudinal weights for my joint sample. How might I go about designing this set of weights?




#1 Updated by Olena Kaminska 12 months ago


I think what you are doing is pooled analysis. For this you pool information from different waves rather than studying people over time.

For a usual longitudinal analysis which would start anytime from wave 2 of UKHLS onwards you could use 'ub' weight that combines BHPS with UKHLS.

But I understand you don't want to study people over time - instead you want to pool information from different time points together.

If you are studying one wave only each time - for each person pick the cross-sectional weight relevant to that wave.
If you are looking at a change and are using for example 2 waves - pick a longitudinal weight from the last wave for each set of waves.
Please remember to take into account clustering - not taking it into account will give you wrong results.

If you are using information from BHPS time and then UKHLS and are combining it together you need to scale the BHPS sample up - so that BHPS years contribute the same amount as UKHLS years - otherwise your results will be dominated by more resent years.

If you have any further questions please do not hesitate to ask us.
Thank you,

#2 Updated by Stephanie Auty 12 months ago

  • Private changed from Yes to No
  • % Done changed from 0 to 50
  • Assignee changed from Olena Kaminska to Helen Burkhardt

#3 Updated by Helen Burkhardt 12 months ago

Thanks for the quick response, Olena.

Yes, I am looking to pool information by individual across different time points. I take all non-missing responses from the BHPS and UKHLS and average them by individual. For example, if individual A answered 3 in wave 1 of the BHPS and 4 in wave 2 of the UKHLS, I would calculate an unweighted average of 3.5 for that individual. Ideally, I would calculate a weighted average of both the BHPS and UKHLS responses by individual.

What weights should I use to scale up the BHPS years? Could I use a cross-sectional weight for each wave that has a non-missing response and then average the rescaled responses?



#4 Updated by Olena Kaminska 12 months ago


Averaging weighted scores over time would have a very strange meaning. So, it would be useful to know at this stage what exactly you are trying to estimate:
- what would your averaged measure mean? - if it is a characteristic of a person you shouldn't weight it at this stage - but just average raw answers over time;
- I understand that if you have at least one score per person you would include this person in your analysis: if this is the case you may want to use a wave 2 or wave 6 inclusion weight: b_psnenub_li or f_psnenui_li.

Only one thing you may want to think about is death - some people died during the time you are looking at - and they may be in your analysis. You would need to take this into account when describing your population. For example a person who lived in 1991/2 and died in 1993 would contribute to your analysis.

Hope this helps,

#5 Updated by Stephanie Auty 11 months ago

  • % Done changed from 50 to 70
  • Status changed from New to Feedback

Also available in: Atom PDF