Lecture 20. Missing Data and random effect modelling. Lecture Contents. What is missing data? Simple ad-hoc methods. Types of missing data. (MCAR, MAR, MNAR) Principled methods. Multiple imputation. Methods that respect the random effect structure.
Missing Data and random effect modelling
Thanks to James Carpenter (LSHTM) for many slides!!
When it comes to analysis, whether we adopt a frequentist or a Bayesian approach the likelihood is central.
In these slides, for convenience, we discuss issues from a frequentist perspective, although often we use appropriate Bayesian computational strategies to approximate frequentist analyses.
Prior Belief + Likelihood.
Missing data introduce an element of ambiguity into statistical analysis, which is different from the traditional sampling imprecision. While sampling imprecision can be reduced by increasing the sample size, this will usually only increase the number of missing observations! As discussed in the preceding sections, the issues surrounding the analysis of incomplete datasets turn out to centre on assumptions and computation.
This is bad practice because:
Pr(R | Yo, Ym).
1. The chance of non-response to questions about income usually depend on the person\'s income.
2. Someone may not be at home for an interview because they are at work.
3. The chance of a subject leaving a clinical trial may depend on their response to treatment.
4. A subject may be removed from a trial if their condition is insufficiently controlled.
What this means is
either generate statistical information about each missing value, e.g.distributional information: given what we have observed, the missing observation has a normal distribution with mean a and variance b , where the parameters can be estimated from the data.
and/or generate information about the missing value mechanism.
A full statistical model is written down for the complete data.
Analysis (whether frequentist or Bayesian) is based on the likelihood.
Assumptions must be made about the missing data mechanism:
If it is assumed MCAR or MAR, no explicit model is needed for it.
Otherwise this model must be included in the overall formulation.
Such likelihood analyses requires some form of integration (averaging) over the missing data. Depending on the setting this can be done implicitly or explicitly, directly or indirectly, analytically or numerically. The statistical information on the missing data is contained in the model. Examples of this would be the use of linear mixed models under MAR in SAS PROC MIXED or MLwiN.
We will examine this in the practical.
In the practical we will consider two approaches:
Model based MCMC estimation of a multivariate response model.
Generating multiple imputations from this model (using MCMC) that can then be used to fit further models using any estimation method.
James Carpenter has developed MLwiN macros that perform multiple imputation using MCMC.
These build around the MCMC features in the practical but run an imputation model independent of the actual model of interest.
See www.missingdata.org.uk for further details including variants of these slides and WinBUGS practicals.