1 / 100

Missing Data: Problems & Prospects

Missing Data: Problems & Prospects. Daniel A. Newman University of Illinois. Overview. Missing Data Levels Item-level, Scale-level, and Person-level Missing Data Problems Bias/Poor External Validity, Low Power Missing Data Mechanisms MCAR, MAR, MNAR Missing Data Techniques

nairi
Download Presentation

Missing Data: Problems & Prospects

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Missing Data: Problems & Prospects Daniel A. Newman University of Illinois

  2. Overview • Missing Data Levels • Item-level, Scale-level, and Person-level • Missing Data Problems • Bias/Poor External Validity, Low Power • Missing Data Mechanisms • MCAR, MAR, MNAR • Missing Data Techniques • Listwise & Pairwise Deletion, Single Imputation • Maximum Likelihood and Multiple Imputation • Sensitivity Analysis

  3. Missing Data Levels • Item-Level Missingness • Answering only j out of J possible items on a scale (i.e., leaving a few items blank) • Scale-Level Missingness • Answering zero items from a scale (i.e., omitting an entire scale or an entire construct) • Person-Level Missingness • Failure to return the survey • In the aggregate, this is called response rate

  4. Missing Data Levels • Complete Data

  5. Missing Data Levels • Incomplete Data

  6. Missing Data Levels • Item-Level Missing Data • Scale-Level Missing Data • Person-Level Missing Data

  7. Missing Data Levels • Item-Level Missing Data • Scale-Level Missing Data • Person-Level Missing Data

  8. Missing Data Levels • Item-Level Missing Data • Scale-Level Missing Data • Person-Level Missing Data

  9. Missing Data Levels • Item-Level Missing Data • Scale-Level Missing Data • Person-Level Missing Data

  10. Missing Data Levels • Missing data levels are nested • Item-level missingness can aggregate into scale-level missingness • Scale-level missingness can aggregate into person-level missingness • Choice of appropriate missing data technique can depend upon level of missingness • Person-level missingness can be far more problematic, because you have no information about the nonrespondent

  11. Practical Advice (Newman, 2009)

  12. Missing Data Problems • Missing data reduce the sample size (low N) • More Sampling Error • Lower Statistical Power • Systematically missing data can lead to systematic over- or under-estimation of effect sizes • Bias in Parameter Estimates (mean, SD, corr.)

  13. Parameter Estimates Sample Estimate () Population Parameter ()

  14. Purpose of Data Analysis Parameter Estimates() Data Hypothesis Tests(p-values, standard errors)

  15. Missing Data Problems Parameter Estimates() Bias Missing Data Hypothesis Tests(p-values, standard errors) Low Power

  16. Sampling Distribution r Std. Error

  17. biased r unbiased Sampling Distribution

  18. Sampling Distribution r Std. Error Larger Std. Error

  19. Sampling Distribution crit.05 r Std. Error Type II Error Larger Std. Error

  20. Missing Data Problems Two Major Missing Data Problems: • Bias in Effect Size estimates • Errors of Statistical Inference (p < .05?) • Low Power • Systematically missing data can create Inaccurate Standard Errors (and p-values)

  21. Missing Data Mechanisms • Missing Data can be missing: • Randomly • Systematically • But what does “Systematic” mean?

  22. Missing Data Mechanisms 1) Random • Missing Completely at Random (MCAR) 2) Systematic (Rubin, 1976) • “Missing at Random” (MAR) • Missing Not at Random (MNAR)

  23. Missing Data Mechanisms • MCAR – p(missing) is unrelated to all variables, observed and unobserved • MAR – p(missing) is related to observed variables [observed data] only • MNAR – p(missing) is related to the unobserved/ missing variables [missing data] (see Schafer & Graham, 2002) p(missing|complete data) = p(missing) p(missing|complete data) = p(missing|observed data) p(missing|complete data) ≠ p(missing|observed data)

  24. Missing Data Mechanisms

  25. Missing Data Mechanisms • MCAR – Rmiss_Y is not related to X or Y • MAR – Rmiss_Y is related to X, but is not related to Y after controlling for X • MNAR – Rmiss_Y is related to Y X X X Y RmissY Y RmissY Y RmissY MCAR MAR MNAR

  26. Missing Data Mechanisms • Some missing data techniques (e.g., listwise deletion) assume missing data are MCAR • Some missing data techniques (e.g., maximum likelihood, multiple imputation) assume MAR • It is impossible to test whether missing data are MAR vs. MNAR, because we would need to compare observed values of Y against unobserved values of Y, and unobserved values of Y are unknown

  27. Missing Data Mechanisms Practically speaking … • In the real world, MCAR almost never happens • One exception: “planned missingness” (Graham et al., 2006) • Most missingness falls on a continuum between MAR and MNAR

  28. X X X Y RmissY Y RmissY Y RmissY MCAR MAR MNAR Missing Data Mechanisms

  29. X X X Y RmissY Y RmissY Y RmissY MCAR MAR MNAR Missing Data Mechanisms

  30. Missing Data Mechanisms Practically speaking … • Even though the MAR assumption may not be strictly met in practice, missing data techniques based on this assumption (e.g., Max. Likelihood, Mult. Imputation) can still provide less-biased, more powerful estimates

  31. Missing Data Mechanisms Practically speaking … • An MNAR mechanism can begin to approximate an MAR mechanism if the researcher incorporates more observed variables (i.e., “auxiliary variables”) • (Collins et al., 2001; Graham, 2003)

  32. X X X Y RmissY Y RmissY Y RmissY MCAR MAR MNAR Missing Data Mechanisms

  33. Missing Data Techniques 1) Listwise Deletion 2) Pairwise Deletion 3) Ad Hoc Single Imputation 4) Multiple Imputation 5) Maximum Likelihood • (EM algorithm, FIML) 6) Sensitivity Analysis

  34. Missing Data Techniques 1) Listwise Deletion 2) Pairwise Deletion 3) Ad Hoc Single Imputation 4) Multiple Imputation 5) Maximum Likelihood • (EM algorithm, FIML) 6) Sensitivity Analysis

  35. Missing Data Techniques Listwise Deletion – deleting all cases (persons) for whom any data are missing, then proceeding with the analysis • This procedure converts item-level and scale-level missingness into person-level missingness!

  36. Missing Data Techniques • Incomplete Data

  37. Missing Data Techniques • Incomplete Data Listwise Deletion:

  38. Missing Data Techniques Listwise Deletion • Unbiased under MCAR • But biased under systematic missingness (MAR & MNAR) • [Mean is biased, SD is biased, Correlation is biased] • Amount of bias depends on amount of missing data and strength of missingness mechanism (from completely random to strongly systematic) • Lowest power • Smallest N

  39. Missing Data Techniques Pairwise Deletion – calculating summary estimates (e.g., means, SDs, correlations) using all available cases (persons) who provided data relevant to each estimate, then proceeding with analysis based on these summary estimates • Different correlations are based on different (partly overlapping) subsamples!

  40. Missing Data Techniques • Incomplete Data Pairwise Deletion: • Mean & SD of X1 based on • Mean & SD of Y based on

  41. Missing Data Techniques • Incomplete Data Pairwise Deletion: • Correlation of X2 & X3 based on • Correlation of X3 & Y based on

  42. Missing Data Techniques Pairwise Deletion • Unbiased under MCAR • But still biased under MAR & MNAR • Usually less biased than listwise deletion • Sometimes covariance matrix is not positive definite • Different correlations represent different population mixtures

  43. Missing Data Techniques Pairwise Deletion • More power than listwise • Different Ns for different correlations—no single N makes sense for the whole corr. matrix • minimum N => SEs too big • mean N => some SEs to big, some SEs too small • harmonic mean N => same problem as mean N

  44. Missing Data Techniques Ad hoc single imputation – replacing each missing datum with a “good guess” • Mean imputation (i.e., mean(across persons)) • Hot deck imputation • Regression imputation

  45. Missing Data Techniques Ad hoc single imputation • Mean imputation (i.e., mean(across persons)) – replacing each missing datum with the group mean for the corresponding variable • Hot deck imputation - replacing each missing datum with a value from a “donor” who has similar scores on other variables • Regression imputation - replacing each missing datum with a predicted value based on a multiple regression equation derived from observed cases

  46. Missing Data Techniques Ad hoc single imputation • Mean imputation (i.e., mean(across persons)) – underestimates variance and correlation • Hot deck imputation – using “donors” increases error—worse than regression imputation • Regression imputation – using predicted values underestimates variance and can bias the correlation

  47. Complete Data

  48. Incomplete Data

  49. Mean Imputation

  50. Mean Imputation

More Related