BMI variables in the youth panel - potential errors?
I've come across something strange with the BMI data in the youth panel (available in waves b, d and f). The limits of ypbmi_dv are around 8 and 90, in each wave the data are available. A BMI of 8 (or 90!) would be impossible, so I went back to the height and weight data, and have found that there are BMI values present even when there is no reported height and weight. The only coding is -9 for missing; there is no 'wildcard' option.
Furthermore, even for rows where height and weight are reported, BMI is not consistent e.g. a respondent with a BMI of 9.8 is 157cm tall and 41.3kg - by the standard BMI calculation, this respondent's BMI should be 16.7. I've done a few more spot checks and can't unpick what's going on with the BMI variable. Can you shed any light on this situation?
It's quite urgent I'm afraid - I work for the Department of Health and Social Care, and this task is part of a high priority workstream on inequalities in childhood obesity. I'd be really grateful if you could help!
#1 Updated by Alita Nandi almost 2 years ago
- Category set to Derived variables
- Status changed from New to In Progress
- Assignee set to Gundi Knies
- Target version set to X M
- % Done changed from 0 to 10
- Private changed from Yes to No
Many thanks for your enquiry. The Understanding Society team is looking into it and we will get back to you as soon as we can.
#2 Updated by Gundi Knies almost 2 years ago
- Assignee changed from Gundi Knies to Sian Hughes
- % Done changed from 10 to 50
information about height and weight is collected in a paper-and-pencil interview which children/adolescents fill out privately. It is, unfortunately, a tricky instrument. Please take a look at the questionnaire: There is the option to report height and weght in imperial or metric units but some respondents choose to provide the information in both units. The information is not necessarily consistent - which may be due to poor mathematical skills or a typo. Sometimes respondents add the total number of "feet and inches" in inches in the "inches" field, instead of reporting feet in the "feet" field and additional inches in the "inches" field. Or they report "feet" in fractions. Last but not least, there may be a misinterpretation of the scribbles by the programme that extracts responses from the paper-pencil-questionnaire following fieldwork.
To understand how we derive youth BMI you need to consult the online variable documentation. See, for example: https://www.understandingsociety.ac.uk/documentation/mainstage/dataset-documentation/wave/2/datafile/b_youth/variable/b_ypbmi_dv.
It say the variable _ypbmi_dv is derived from two variables: _ypwghtkg_dv _yphghtcm_dv. You then need to read the descriptions of these input variables. For example:
The variable note to b_ypwghtkg_dv reads: "Weight in kilograms. Derived from scanned freetext information provided in B_YPHLWTK. Equals B_YPHLWTK divided over 10, or B_YPHLWTK if implied B_YPHLWTK_DV is less than 10 or less than 14 and if implied Body Mass Index (BMI) less than 8 (also see B_YPBMI_DV)."
The variable note to b_yphghtcm_dv reads: "Height in cm. Derived from B_YPHLHTF_DV and B_YPHLHTI_DV, converted to metric units, and B_YPHLHTC. B_YPHLHTC is set to missing [-9], or imputed from non-missing B_YPHLHTF, if B_YPHLHTC greater than 231 (i.e., 7ft 7 inches). If B_YPHLHTC is less than 90 and B_YPHLHTF is missing, height assumed to be reported in inches only."
You will see that we give preference to imperial units over metric units but resolve some conflicts for the derived variables "weight in kg" and "height in cm". We do not change the underlying non-edited information from the paper-pencil-questionnaire. BMI is capped at 8 at the bottom but not at the top. Users may wish to not use cases where the BMI is outside the range they consider realistic.
We very much welcome user feedback on our derived variables. Would you recommend that we impose different cut-offs for youth BMI? Is there a reference to an authorative institution recommending any upper and lower thresholds that we could base a decision on? Many thanks!