1 / 19

MICE for multiple imputation of missing values

MICE for multiple imputation of missing values. Patrick Royston MRC Clinical Trials Unit, London 11 th London Stata Users’ Meeting 17-18 May 2005. Outline. What is multiple imputation? Types of missing data Multiple imputation with the MICE method Example: Fetal growth study

lorenaf
Download Presentation

MICE for multiple imputation of missing values

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MICE for multiple imputation of missing values Patrick Royston MRC Clinical Trials Unit, London 11th London Stata Users’ Meeting 17-18 May 2005

  2. Outline • What is multiple imputation? • Types of missing data • Multiple imputation with the MICE method • Example: Fetal growth study • Passive imputation • Coping with categorical variables • Notes and conclusions

  3. What is multiple imputation (MI)? • Context: Multiple regression (in general) • Replace missing values with “plausible” substitutes • Based on distribution of given data • Inject the right amount of randomness to reflect uncertainty • Do this several times, create m > 1 datasets • Analyse datasets individually, but identically • Combine the estimates, get confidence intervals using Rubin’s rules (micombine)

  4. Types of missing data: The Holy Triad • MCAR (missing completely at random) • MAR (probability of missingness does not depend on unobserved information) • MNAR (probability of missingness does depend on unobserved information) Will not be considering MNAR data here - data will be assumed MAR at worst

  5. Multiple imputation with MICE • MICE = “multiple imputation by chained equations” (van Buuren et al Stat Med 1999) • The MICE approach has three components: • Univariate – implemented in uvis • Multivariate – implemented in ice • Multiple – implemented in ice • ice = imputation by chained equations

  6. Univariate imputation with uvis • Suppose have variables x1, x2, …, xk on n cases • Suppose the variable to be imputed is x1 • x1 has some observations “missing at random” • x2, …, xk are complete (no missing data) • Regress x1 on x2, …, xk • Draw * from posterior distribution of regression coefficients (or use bootstrap – boot option) • Use prediction-matching to estimate missing x1 • Predict all x1 values using *(x2, …, xk)T • Find non-missing prediction nearest to missing-value prediction and impute using corresponding value of x1 • Or, predict missing values of x1 from posterior predictive distribution of x1 (draw option)

  7. Univariate imputation with uvis uvis regression_cmd yvar xvarlist [if exp] [in range] [weight], gen(newvarname) [ boot draw seed(#) ] • Quite general - regression_cmd may be regress, logit, ologit or mlogit for different types of yvar

  8. Multiple imputation with ice • Variables x1, …, xk may have missing data • Eliminate cases with all variables missing • Initialise – fill in all missing values at random • Apply uvis to x1 regressing on x2, …, xk • Replace missing values in x1 • Repeat for x2 , …, xk on other x’s (cycle 1) • Repeat for about 10 cycles • Repeat whole process m times • gives m imputed datasets with complete observations

  9. Multiple imputation with ice ice varlist using filename[.dta] [if exp] [in range] [weight], [m(#) cmd(cmdlist) cycles(#) boot draw seed(#) dryruneq(eqlist)passive(passivelist) noshoweqsubstitute(sublist)other_options] Red options are new with ice cf. mvis – I will illustrate some aspects of these today

  10. Example: Fetal size data • Ultrasound study of fetal growth (Lyn Chitty) • n = 649 singleton pregnancies • Many measurements – will concentrate on ac (abdominal circumference), hc (head circ.), ml (mandible length) and gest. age (ga) • Gestational age range 12-42 weeks • Rank correlations: all  0.95 • Missing: ac 6%, hc 8%, ml 75%, ga 0% • ml ‘unreliable’ after 28 weeks • Wish to see what ml might look like > 28 wks • Heteroscedasticity – log transformations used

  11. Mandible length after 28 weeks … ?

  12. Close relationships – accurate imputation

  13. Multiple imputation • Prediction equations for lnac, lnhc, lnml • MFP modelling on ga, otherwise linear:

  14. Creating one imputation with ice eq(lnac:lnhc ga_1 ga_2, lnhc:lnac ga_3 ga_4, lnml:lnac lnhc

  15. Result for log ML

  16. Result for log ML – random draws from posterior distribution (draw option)

  17. Suppose ga had had missing values – introducing the passive() option

  18. Coping with categorical variables – using passive() with substitute() No good – all the prediction equations are illogical!

  19. Notes and Conclusions • MICE method is very flexible – but demands thought when creating the imputation model • Strongly recommend mastering the eq(), passive() and substitute() options • Can deal with interactions using passive() • Choice of m is important • may need to be (much) larger than 5 • See Royston (2004, SJ 4:227-41) for discussion • ice software is available (?on CD) • or send email to pr@ctu.mrc.ac.uk

More Related