- 387 Views
- Uploaded on

Download Presentation
## Multiple Imputation

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Outline

- Multiple Imputation (MI)
- How to impute (i.e. how to fill in values)
- How to analyze and draw inferences
- How many times to impute
- Alternatives to MI
- Applications
- Software

Multiple Imputation -

- Idea: replace each missing item with 2 or more acceptable values, representing a distribution of possibilities (Rubin, 1987).
- This results in m complete datasets (each one is analyzed using standard methods, and estimated parameters are averaged).
- Can often be generated from simple modifications of existing single-imputation methods such as hot-deck or regression.

Dataset with m imputations

m imputations

k variables

N units in the survey

…most useful when the fraction of values missing is not excessive and when m is modest (say 2 to 10)

Each row vector of imputations is of length m, where

model for 1st imputation = …

model for 2nd imputation = …

…

model for mth imputation = …

Advantages:

- Allows to use standard complete-data methods
- Can incorporate data collector’s knowledge to reflect the uncertainty about imputed values (sampling variability and uncertainty about the reasons for nonresponse)
- Increases efficiency of estimation
- Provides valid inferences (for variance estimators) under an assumed model for nonresponse
- Allows one to study sensitivity to various models

Disadvantages:

- More work is needed to generate multiple imputations

Often not difficult to implement using the existing single-imputation scheme

- More space is needed to store the data
- More work required to analyze the data (not serious when m is modest) –

Often not difficult to implement using and standard statistical programs

How to fill in the values :

- BAYESIAN PERSPECTIVE (Rubin, 1987): draw multiple imputations to simulate a Bayesian posterior distribution of missing values, that is, conditional distribution of the missing data given the observed data,

where, obs = set of observed values

inc = set of units included in the sample

I = an indicator for inclusion

How to fill in the values :

- Impose a probability model on the complete data and nonresponse mechanism(i.e., normal regression or loglinear model)
- Create imputations through a 2-step Bayesian process:
- Specify prior distributions and draw unknown model parameters, and
- Simulate m independent draws from the conditional distribution of the missing data given the observed data

How to fill in the values :

- This requires deriving the posterior distribution. In simple problems, closed-form solutions exist
- In more complex applications, rely on special computational techniques such as Markov chain Monte Carlo (MCMC)
- Other possibilities: approximate Bayesian bootstrap (Rubin, 1987)
- Modeling propensity scores to form sampling groups (Lavori et. al., 1995)

Approx. Bayesian Bootstrap (ABS)

- Draw n1 values randomly with replacement form Yobs (i.e. create a hot deck)
- Draw the n0 = n - n1 components of Ymisrandomly with replacement from Y*obs

See Rubin (1987) for details on:

- Bayesian Bootstrap (BB) - p. 44,
- Approximate Bayesian Bootstrap (ABS) - p. 124

Inference on combined estimates :

- The estimate is the average of m repeated complete-data estimates
- Let – be the average of m repeated

complete-data variances, and

– variance between imputations

- The total variance is approximately the sum of the two:

Inference on combined estimates

- Confidence intervals and significance tests can be computed using a t reference distribution with

degrees of freedom, where rm is the relative increase in variance due to nonresponse (Rubin, Ch. 3)

How many imputations needed?

- Rubin (1987, p. 114) shows that the relative efficiency of a finite-m estimator is

where is the rate of missing information for the quantity being estimated.

- Values shown below. For small , m =2 or 3 is nearly fully efficient.

Problems

- Difficulties with MI variance estimator discussed by Binder & Sun (1996), Fay (1996), and others
- Gives inconsistent variance estimates under some simple conditions (improper imputation)
- Kott (1995) observes that sampling weights must be used for both point and variance estimation in order to satisfy the conditions of being proper
- Wang and Robins (1998) explore large-sample properties of MI estimators

Alternatives

- Advances made on making efficient and asymptotically valid inferences from single imputations
- Shao (2002) and Rao (2000, 2005): jacknife variance estimator for hot-deck imputation in which donors are selected W/R with selection probability proportional to sampling weights
- Kalton & Kish (1984), Fay (1996): fractionally weighted imputation – use more than one donor for a recipient

Fractionally Weighted Imputation

- Idea: reduce imputation variance relative to single imputation
- Fractional hot-deck imputation replaces each missing value with a set of imputed values and assigns a weight to each (Kim & Fuller, 2004), i.e.
- Each imputed value receives a “fraction” of the original observation weight

Multiple Imputation Applications

- SAS has recently developed a procedure for multiple imputation ( first available in the 8.1 version)
- The procedure requires use of both:

PROC MI

PROC MIANALYZE

MI Applications

- Multiple imputation inference involves three distinct phases

1. The missing data are filled in m times to generate m complete data sets (PROC MI)

2. The m complete data sets are analyzed by standard statistical analyses. (PROC REG, PROC GLM, etc.)

3. The results from the m complete data sets are combined to produce inferential results. (PROC MIANALYZE)

Three Imputation Mechanisms :

(Choice depends on the type of missing data pattern)

- Regression Method - A regression model is fitted for each variable with missing values, with previous variables as covariates. (Monotone missing)
- Propensity Score Method - Observations are grouped based on propensity scores, and an approximate Bayesian bootstrap imputation is applied to each group. (Monotone missing)
- MCMC Method - (Markov Chain Monte Carlo) Constructs a Markov chain long enough for the distribution of the elements to stabilize (MAR)

Multiple Imputation Applications

- See handout of SAS code and output
- Examples of the MI procedure can be shown using a data set which contains measurements on men running during a P.E. Course at N.C. State University
- 3 Variables of interest:

Oxygen intake per minute (ml/kg body wt)

Runtime (time in minutes to run 1.5 miles)

RunPulse (heart rate while running)

Conclusions:

- Multiple imputation is a method of replacing missing values which has some theoretical advantages over other methods
- Software is becoming more common to handle multiple imputation and the code is relatively simple

Software

Commercial:

- SAS PROC MI
- SOLAS for Missing Data Analysis (http://www.statsolusa.com/)

Free:

- MIX - Software for multiple imputation

http://www.stat.psu.edu/~jls/misoftwa.html

References

- Binder, D.A., and Sun, W. (1996). Frequency valid multiple imputation for surveys with a complex design. Proceedings of the Section on Survey Research Methods, ASA, 281-286.
- Fay, R.E. (1996). Alternative paradigms for the analysis of imputed survey data. JASA, 91, 490-498.
- Kalton, G., and Kish, L. (1984). Some efficient random imputation methods. Communications in Statistics, A13, 1919-1939.
- Kott, P.S. (1995). A paradox of multiple imputation. Proceedings, 384-389.
- Kim, J., and Fuller, W.A. (2004). Fractional hot deck imputation. Biometrika, 91, 559-578.
- Lavori, P.W., Dawson, R., and Shera, D. (1995). A multiple imputation strategy for clinical trials with truncation of patient data. Statistics in Medicine, 14, 1913-1925.
- Rubin, D.B. (1987). Multiple imputation for nonresponse in surveys. New York: John Wiley & Sons, Inc.
- SAS Manual Version 8.1, Chapter 11
- Shao, J. (2002). Resampling methods for variance estimation in complex surveys with a complex design. In Survey Nonresponse. Edited by Groves, R.M., et. al. New York: John Wiley & Sons, Inc., 303-314

Download Presentation

Connecting to Server..