Roberta harnett mar 550 october 30 2007
Download
1 / 25

Statistical Methods for Missing Data - PowerPoint PPT Presentation


  • 540 Views
  • Updated On :

Roberta Harnett MAR 550 October 30, 2007. Statistical Methods for Missing Data. Outline. When do we see missing data? Types of missing data Traditional approaches Deletion Substitution Modern Approaches Maximum likelihood and Bayes Software. Missing Data.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Statistical Methods for Missing Data' - trevet


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Roberta harnett mar 550 october 30 2007 l.jpg

Roberta Harnett

MAR 550

October 30, 2007

Statistical Methods for Missing Data


Outline l.jpg
Outline

When do we see missing data?

Types of missing data

Traditional approaches

Deletion

Substitution

Modern Approaches

Maximum likelihood and Bayes

Software


Missing data l.jpg
Missing Data

Medical studies, nonresponse in surveys or censuses, dropouts in clinical trials, censored data

Loss of information, power

Bias in results due to differences in missing and observed data

Complicated analysis with standard software


Types of missing data l.jpg
Types of missing data

MCAR

MAR

MNAR


Slide5 l.jpg
MCAR

Missing Completely at Random

Probability that xi is missing doesn’t depend on its value or on value of other variables

Doesn’t matter if it is associated with other “missingness”


Slide6 l.jpg
MAR

Missing at Random

Missingness doesn’t depend on xi after controlling for other variable

This is not great, but we can deal with it


Slide7 l.jpg
MNAR

Missing Not at Random

Not MCAR or MAR (anything else)

BAD!!

Model missingness


Traditional approaches l.jpg
Traditional Approaches

Deletion

List-wise

Unbiased, but loses power

Alternatives are really replacements for list-wise

Pair-wise (also called “unwise”) deletion

Leads to different sample sizes for different parts of analysis

Can be a disaster


Traditional cont l.jpg
Traditional cont…

Single Imputation

Hot deck

Census Bureau

vs. Cold deck

Mean substitution

Regression substitution

Stochastic regression substitution


Modern methods l.jpg
Modern Methods

  • Maximum Likelihood

    • EM algorithm

      • Estimate parameters

        • Listwise deletion, add some error

      • Predict missing data

      • (M): Maximize likelihood. Repeat.

    • NORM (http://www.stat.psu.edu/~jls/misoftwa.html)


Modern methods11 l.jpg
Modern Methods

Multiple Imputation

Simple and general – works for any type of analysis

Validity of method depends on how imputation is carried out

Should reasonably predict missing data, but should also reflect uncertainty in predictions

Using a “sensible” imputation model


Random imputation l.jpg
“Random Imputation”

  • Predict missing values, then add error component drawn randomly from residual distribution of the variable

  • Repeat several times to improve error estimates


Multiple imputation l.jpg
Multiple Imputation

Use Bayesian arguments to impute data:

Parametric model for data

Ignorable missing data

Non-ignorable missing data

Apply prior for unknown model parameters

Simulate m independent draws from distribution of Ymis given Yobs

Calculate values explicitly or through MCMC


Mi procedure l.jpg
MI procedure

Simulate a random draw of unknown parameters from observed-data posterior

Simulate a random draw of missing values from conditional predictive distribution

Repeat, obtaining new parameter estimates from “complete” data set until stabilizes

Do 3-5 times total (Rubin)

MCMC: data augmentation algorithm of Tanner and Wong (1987)‏


Parameter estimates l.jpg
Parameter Estimates

  • Calculate parameter Q from m data sets

  • Estimate of Q is just average of m values of Q

  • Variance of Q is T = (1+m-1) B + U

    • Where U is the mean within-imputation variance and B is

      B = (1/m) Σ (Ql-Qave)2

      The between-imputation variability.

    • As m →∞, T = B + U and you don’t need to correct B for low numbers of imputations.


Slide16 l.jpg
MI

Imputation is computationally distinct from analysis

Problem if assumptions of imputation are not compatible with analysis assumptions

Loss of power if imputation makes fewer assumptions than analysis

“Superefficient” if imputation is based on more (valid) assumptions than analysis


Slide17 l.jpg
MI

Inconsistent if imputation makes invalid assumptions that are not included in analysis

Ex: interaction terms

Imputation needs to preserve features of data that will be included in analysis


Slide18 l.jpg
ABB

Approximate Bayesian Bootstrap (Rubin, 1987)‏

Fancier version of Hot deck imputation


Comparison of methods l.jpg
Comparison of Methods

Removing entries with missing data vs. MI

Imputing once vs. MI

Number of imputations

Efficiency is (1+λ/m)-1

MI vs. EM


Nonignorable nonresponse l.jpg
Nonignorable nonresponse

Ignorable if data are MAR

MI can be used when there is nonignorable nonresponse

Missing-data mechanism


Programs l.jpg
Programs

  • For S-PLUS: www.stat.psu.edu/~jls/misoftwa.html

  • For R:

    • Amelia (II) (surveys and time-series data)

    • Norm (for multivariate normal data)

  • SOLAS (tested by Allison, 2000)

    • For windows


References l.jpg
References

  • Little, R.J.A. and Rubin, D.B. (1987) Statistical Analysis with Missing Data. J. Wiley & Sons, New York.

  • Schafer, J.L. (1999) Multiple imputation: a primer. Statistical Methods in Medical Research, 8, 3-15.

  • Barnard, J. and X. Meng. (1999) Applications of multiple imputation in medical studies: from AIDS to NHANES. Statistical Methods in Medical Research, 8, 17-36.

  • http://www.uvm.edu/~dhowell/StatPages/More_Stuff/Missing_Data/Missing.html

  • http://www.stat.psu.edu/~jls/mifaq.html#em

  • Allison, P.D. (2000) Multiple Imputation for Missing Data: A Cautionary Tale. Sociological Methods and Research, 28 (3), 301-309.


Mi example tu et al 1993 l.jpg
MI Example (Tu et al, 1993)

AIDS survival time with reporting-delay

(1) Survival-time model

(2) Reporting-lag model using available information

(3) Multiply impute delayed cases using model from step 2

(4) Compute estimates of survival-time model parameters

(5) Combine estimates using repeated-imputation rules


Milwaukee parental choice program mpcp l.jpg
Milwaukee Parental Choice Program (MPCP)

Effects of school choice on achievement tests (public vs. private schools)‏

School vouchers to attend “choice” schools, participating private schools

Only households with less than 1.75 times poverty line could participate


Milwaukee parental choice program mpcp25 l.jpg
Milwaukee Parental Choice Program (MPCP)

Randomized block design

Outcome variables were scores from ITBS

Maximum of 4 years observed (1990-1994)‏

Higher levels of missingness than in typical medical study

Pattern in missing data was not monotone