1 / 24

Can methods that deal with missing data reduce bias or increase precision in longitudinal studies?

Can methods that deal with missing data reduce bias or increase precision in longitudinal studies?. Jonathan Sterne, Margaret May, Jon Heron, Ross Harris ALSPAC / MRC Health Services Research Collaboration, Department of Social Medicine, University of Bristol. ART-LINC. Outline.

vern
Download Presentation

Can methods that deal with missing data reduce bias or increase precision in longitudinal studies?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Can methods that deal with missing data reduce bias or increase precision in longitudinal studies? Jonathan Sterne, Margaret May,Jon Heron, Ross Harris ALSPAC / MRC Health Services Research Collaboration, Department of Social Medicine,University of Bristol

  2. ART-LINC Outline • Missing data in the ALSPAC study • Commonly used methods for dealing with missing data • Valid methods to deal with missing data • Example applications • Issues and concluding remarks

  3. ALSPAC • Avon Longitudinal Study of Parents and Children • Birth cohort study of ~13,000 children and their parents, based in south-west England, established by Prof Jean Golding and colleagues ~1990 • Designed to determine ways in which the individual’s genotype combines with environmental pressures to influence health and development • Children now aged 14-15, 5 year core support recently agreed by MRC/Wellcome

  4. ALSPAC data • Self completion questionnaires • Hands on assessments • Data from external sources • Biological samples • DNA

  5. Maintaining Response • Handling non-response: • Two reminder letters • Telephone call • Visits • Maintaining study profile: • Newsletters • Media coverage • Discovery club for children

  6. Response rates • Child-based:

  7. Missing data in ALSPAC • An inevitable problem in analyses that use data from multiple time points • i.e. the analyses for which the cohort was designed • Analyses based on children with complete data (“available case analyses”) can typically use 50% or fewer of the children in the cohort • Social background is strongly associated with the probability that that data are missing

  8. Analysts dilemma • Exclude subjects with missing data? • Omit covariates with missing data? • Deal with missing data?

  9. Consequences of missing data • Bias - those with complete data may differ from those with incomplete data • Estimation based on subset with complete data “available cases” may give biased estimate of population parameter of interest • Loss of precision/power • Missing data reduces sample size

  10. Classification of missing data • Model for distribution of missingness (DoM) • Introduced by Rubin (1976) Sets of variables: Z with missing data, X with complete data • MCAR missing completely at random • probability of Z missing not related to either X or true value of Z • MAR missing at random • Probability of Z missing is not related to unobserved values of Z, but is related to observed values of X • MNAR missing not at random • Probability of Z missing still depends on unobserved values of Z even after allowing for dependence on X • statistical analyses cannot deal with this

  11. Simple “ad hoc” missing data methods • Available case analysis • unbiased if data MCAR, but inefficient • Mean imputation • association attenuated • Last value carried forward (for repeated measures) • Distorts trends over time • Missing category indicator • always biased (see Vach and Blattner AJE 1991) • Single imputation from model for missing data • distorts standard errors

  12. Valid methods to deal with data that are missing at random (MAR) • Likelihood-based (EM algorithm) • Multiple imputation • derive predictive distributions for the missing values • use these to produce multiple complete datasets • use standard methods for analysis • combine results to get valid parameter estimates and standard errors • This is not “making up data”!! • Efficient, robust methods that use weighted estimation

  13. Multiple imputation in practice • Very rapid software development in recent years • Two flavours of MI: • methods based on the multivariate normal distribution (good theoretical foundation, problems with categorical variables) • “chained equations” (little theoretical foundation, good for categorical variables, becoming widely used) • Few guidelines for analysts • Highly complex models in typical situations • Very difficult to report methods in adequate detail, in applied papers

  14. ART-LINC Example 1: predicting mortality in HIV-1 infected people treated with antiretroviral therapy in low income countries • Data from the ART-LINC collaboration • 2,725 patients with active follow up in 14 treatment programmes in Africa, Asia and South America • Prognostic model for patients starting antiretroviral therapy • Estimate mortality hazard ratio according to whether patients had AIDS at baseline, using methods for missing data • This information was missing for 649 patients (24%)

  15. […cut…]

  16. Example 2: prognostic value of anaemia in HIV-1 infected people treated with antiretroviral therapy in developed countries • Prognostic model already developed • Want to include anaemia, but this is missing about 30% of the time • Haemoglobin is strongly associated with other prognostic variables, in particular CD4 cell count • Can we (a) reduce bias (b) increase precision by using missing data methods?

  17. […cut…]

  18. Example 3: what can we gain from missing data methods in ALSPAC? • Estimate: • the prevalence of wheeze at different times • associations between wheeze and maternal asthma • Possible analyses: • restrict to cases with complete data at all time points • restrict to cases with complete data at a single time point • Impute using information measured at a particular time • Impute using longitudinal information

  19. […cut…]

  20. Example 3: what can we gain through using missing data methods? • Most dramatic changes were between available case analyses • Prevalence estimates based on missing data methods were plausibly less biased • Surprisingly small changes in standard errors for associations when missing data methods were used • We know very little about when estimates of associations are likely to be biased because of missing data

  21. Concluding Remarks • Analyses restricted to patients with no missing data are widely used, but are biased when data are not missing completely at random (MCAR) and result in a loss of statistical power • Use of missing data methods may reduce bias and increase precision. However, we pay a price in model complexity • There is increasing usage of these methods, but they require great care and can produce misleading results • Guidelines for conduct and reporting are needed • Never, ever present analyses using missing data methods but not the available case analyses

  22. Final message It’s better to make the measurement than to try to use statistical methods to compensate for the fact that it is missing

More Related