Dear Support

I have two questions regarding strata formation and clustering in analysis:

1. After reading Lynn (2009) - Sample Design for Understanding Society (and consulting both your online class and the manual), I am left with a question regarding the enumeration of strata seen in the variable 'strata'. This surely comes down to my a lapse in understanding of the survey design.
For the GPS Lynn details the stratification of postal code into 103 distinct strata (12 regions X 3 SEG-bands X 3 pop. density bands), however when tabulating Strata for GPS members in wave 1 of UKHLS I see 1200 strata. Where is the disconnect?
I also find a discrepancy between strata_bh and the characterization given in Lynn (2006) "Quality Profile: British Household Panel Survey". Lynn details 82 minor strata while the strata_bh variable takes 75 values at wave 1 of BHPS.

2. Should I specify two levels of clustering if studying individuals (say in stata svy enviroment)?
If my interest is in individuals (adult respondents), then for the GPS my current understanding of the structure is: 1. Postal codes are translated into sectors which are sorted into 103 strata. 2. PSU's are drawn (first clustering level) with proportionate probability, 3. Addresses/delivery points are drawn at random from PSU (second level of clustering?) with correction for multiple household at the same address.

I would be thankful for any help you could provide
Andreas W. Andersen


2. Question implies I want a definitive advice on how to cluster in practice. What I meant is: Is there theoretically/strictly speaking 2 levels of clustering.
I see from various examples of stata "syv set"-functions, that you often apply only one level of clustering (PSU) and I fully intend to do so myself.

#5 Updated by Olena Kaminska 4 months ago

Dear Andreas,

Thank you for your questions.

1. The strata variable is correct and correctly reflects the sample design. Trust it. The details on the stratification design are probably hidden somewhere in documentation.
2. Unless you use multilevel analysis or pooled analysis, you should only use PSU variable as your cluster variable. The higher geographies to psu do not matter as they did not influence our sample design clustering. But technically indeed we have waves nested within individuals nested within households nested within psu's. In this situation taking into account clustering within psu (in other words the highest level of clustering) will take into account clustering at lower levels as well - read more on this in statistical books.

Hope this helps,

