1 / 28

On the pr e sen c e of mi s s i ng v a lues

haile
Download Presentation

On the pr e sen c e of mi s s i ng v a lues

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. missingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvalumissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvaluesmissingvalu On thepresenceofmissingvalues JENA GRADUATE ACADEMY Dr. Friedrich Funke

  2. Learning objectives • What are missing values • How do I basically treat missing data • Why are data missing • How do I detect (the systematics of) missingness • How do I treat missing data - revisited

  3. Basic Types of Missing Values • Unit-nonresponse (drop-out, attrition etc.) • Item-nonresponse • Missing Values by design

  4. Something is Missing - Why worry? • Missingvaluesarealmosteverywhere • Inefficiency (lack of power) • Bias ofestimation (!!!) Missingvalueanalysiscansupportourunderstandingofthedata!

  5. Missing value management (examples) Deletion Imputation Mean imputation Conditional Mean (regression) Hot deck/cold deck Maximum likelihood (EM, FIML) Multiple imputation • Listwise deletion (complete cases analysis) • Pairwise deletion (available data analysis) • Both are unwise deletion

  6. Deletion ListwiseDeletion Pairwisedeletion Estimate each moment with all available non-missing cases Appears to use all information in data Covariance matrices can become non-positive-definite • most common way of dealing with missing data • (implicitly in SPSS) • conservative »At least I do nothing wrong« • Can result in zero cases

  7. Mean imputation • Text neu machen!!!!!

  8. Regression imputation • Actually a form of conditional mean imputation • Very elegant, ifyouaddresiduals (stochasticregressionimputation, mean=0 andvariance equal to the residual variance)

  9. Hot deck imputation • fills in missing values on incomplete records using values from similar, but complete records of the same dataset (hot deck ofpunchcards)

  10. Cold deck imputation • fills in missing values on incomplete records using values from similar, but complete records of external dataset • e.g. Historical imputation

  11. Maximum Likelihood Approaches • Simple idea, but computationally complex • Loosely speaking, for a fixed set of data and underlying probability model, maximum likelihood picks the values of the model parameters that make the data "more likely" than any other values of the parameters would make them.

  12. Multiple Imputation • Combination of several random imputations and integration Data integation Imputeddata (e.g. m=10) Separate analyses Incompletedata

  13. Learning objectives • What are missing values  • How do I basically treat missing data  • Why are data missing • How do I detect (the systematics of) missingness • How do I treat missing data - revisited

  14. Missingness is a probabilistic phenomenon Dataset (data matrix) MV »mechanism« (indicator matrix)

  15. Typology of missingness distributions • MCAR Missingcompletelyatrandom • MAR Missingcompletelyatrandom • MNAR Non-ignorable (eq. 1 and 2 areviolated, missingnessdepends on themissingvaluesitself)

  16. Typology of missingness distributions • X completely observed • Y variable with some missings • R missingness •  missingness »mechanism« X  MCAR Missingnessisindependent from empiricaldata Y R

  17. Typology of missingness distributions • X completely observed • Y variable with some missings • R missingness •  missingness »mechanism« X  MAR Missingnessisrelatedtoobserveddata Y R

  18. Typology of missingness distributions • X completely observed • Y variable with some missings • R missingness •  missingness »mechanism« X  MNAR Missingnessisrelatedtomissingdataas well Y R

  19. Typology of missingness distributions • MCAR Missingcompletelyatrandom • MAR Missingcompletelyatrandom • MNAR Non-ignorable X X X    Y Y Y R R R

  20. Examples for MNAR • We are interested in income, but managers refuse to answer • We are interested in prejudice, but the racists skip that scale • We are interested in depression scores, but the depressed are too tired to complete the questionnaire X  Y R

  21. Now you can answer the question: • Does this rule of thumb make sense? • If up to 5% of my data are missing, I don‘t have a problem. If 50% are missing I am lost. NO! The amount of missingness is much less important than the reason for missingness!

  22. Example: Perfect situation

  23. Amount of missingness 10 % Missing 90% Missing missing missing present present

  24. Mechanism of missingness MAR MNAR/NI Missingness depends mainly on Y Big trouble ahead • Missingness depends mainly on X • solvable

  25. Mechanism of missingness Biased Median

  26. Imputation with MCAR MCAR Although 90% aremissing, model basedimputationcanreproducethedata. Even under MCAR meanimputationisevil!

  27. Imputation with MAR MAR Under MAR model basedimputationcanreproducethedata. Mean imputationisevil!

  28. Take home message • missing values are not only decreasing efficiency/power • they can (severely) bias the parameter estimates • listwise and especially pairwise deletion is unwise deletion • naïve unconditional imputation is evil • understand the missingness „mechanism“ • under MCAR relax • Under MAR model based imputation is no alchemy Best Practice - PREVENTION

More Related