Support #700

longitudinal weights for small sub-samples

Added by David Bartram over 3 years ago. Updated over 3 years ago.

Target version:
Start date:
Due date:
% Done:


Estimated time:


Some questions re longitudinal weights. Having read “Weighting Strategy for Understanding Society”, I gather that absence from a single wave leads to longitudinal weights of 0 for all later waves. In my case, this means that I am losing more than 10% of the small (sub)sample (n=997) I want to analyze. For context, my analysis starts with people who were non-citizens in Wave 1 and then revisits them in Wave 6 (by which point some have become UK citizens).

To avoid losing so many cases I’m wondering about replacing the zero values with some reasonable substitute. Three possibilities, perhaps: a) the mean of f_indinus_lw for the sub-sample (as calculated -- here, 0.7254; b), the cross-sectional individual weight at Wave 6, f_indinui_xw; or c), a cross-sectional individual weight at Wave 1. No doubt each is sub-optimal, but losing >10% of the sample is also sub-optimal. Any comments as to the relative merits of these three options? (Or are they all bad...)

If I could assume that attrition is not related to the response variable, then perhaps instead use the cross-sectional weight from Wave 1 for everyone? Response variables are life satisfaction (sclfsato), interest in politics (Vote6), and importance of British identity (britid). Obviously it’s up to me to make some sort of informed choice about this, but I’d be grateful for any comment.

One additional point, perhaps important for context. Because svy doesn’t work with xt- commands, I can’t use subpop –- so, I have assigned a weight of 0 to all those not in the subpopulation of interest (i.e., all but the 997). Is that the correct approach for using e.g. xtologit? The mean of the weights for the small subpopulation is then no longer 1; do the weights nonetheless ensure that the subsample is (reasonably) representative of the subpopulation?


#1 Updated by David Bartram over 3 years ago

Correction -- the mean of f_indinus_lw for the sub-sample is 1.0347, not 0.7254. (I had been including the zero values in the calculation of the mean -- ugh...) Using only those in the sub-sample who have non-zero values, the mean is 1.0347.

#2 Updated by Victoria Nolan over 3 years ago

  • Category set to Weights
  • Status changed from New to In Progress
  • Assignee set to David Bartram
  • % Done changed from 0 to 10
  • Private changed from Yes to No

Dear David,

Many thanks for your query. I have passed this on to our weighting team who will look into it for you.

Best wishes, Victoria.

On behalf of the Understanding Society User Support Team

#3 Updated by Peter Lynn over 3 years ago

Hello David.

To my mind the best solution would be for you to use a_indinus_xw as a base weight and then multiply it by an adjustment factor for loss from the sample by w6. You would have to calculate this factor by fitting a model (e.g. logit) based on all wave 1 respondents in your subgroup of interest, in which the dep var is a 0/1 indicator of whether they also responded at w6 (and removing from the base any known to have died or emigrated before w6). Predictor variables can be anything relevant observed at w1. This will give you a predicted probability for every w1 respondent of responding at w6. Call this P. You then need to adjust a_indinus_xw by multiplying it by 1/P for all the cases that can be included in your analysis.

An amended version of your a) might be second-best option. Instead of just taking the mean, take the mean within groups defined by relevant (to your analysis) variables. It sounds like you have 800+ cases with a non-zero weight (and 100 or so with zero?) so you have a big enough sample to divide into a good number of groups (10 to 20?)

Do not use approach b), as that would be very distorting due to the inclusion of the new boost sample in the "ui" weights, but not in your analysis (i.e. all ethnic minorities and immigrants will be greatly weighted down - much more than they should be).

And I wouldn't recommend c) either, as I would doubt that 5 years of attrition is ignorable.


#4 Updated by Victoria Nolan over 3 years ago

  • Status changed from In Progress to Feedback
  • % Done changed from 10 to 80

#5 Updated by David Bartram over 3 years ago

Thank you Peter -- that's a very helpful response.

#6 Updated by Victoria Nolan over 3 years ago

  • Status changed from Feedback to Closed
  • % Done changed from 80 to 100

Also available in: Atom PDF