replacing missing values n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Replacing Missing Values PowerPoint Presentation
Download Presentation
Replacing Missing Values

Loading in 2 Seconds...

play fullscreen
1 / 12
wynonna

Replacing Missing Values - PowerPoint PPT Presentation

131 Views
Download Presentation
Replacing Missing Values
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Replacing Missing Values Jukka Parviainen Tik-61.181 Special Course in Information Technology 27.10.1999

  2. Agenda • Motivation • Objectives • Meaning for the conclusions • Origin of missing values (MV) • Detection of missing values • Replacing missing values • Examples Jukka Parviainen

  3. References • Pyle, DP for DM, chapter 8 • Hair, Anderson, Tatham, Black: Multivariate Data Analysis • Bishop: NN for PR Jukka Parviainen

  4. …missing values? Jukka Parviainen

  5. Motivation • There are always MVs in a real data set • MVs may have an impact on modeling, in fact, they can destroy it! • MVs contain also information!!! • Hint for the modeler: Avoid-Detect-Replace-Understand Jukka Parviainen

  6. “Definitions” • Missing value - not captured in the data set: errors in feeding, transmission, ... • Empty value - no value in the population • Outlier, out-of-range value Jukka Parviainen

  7. Objectives • Controlled and understood by the modeler • “Least harm”, no “new” information into a data set • statistical estimation of MVs not the primary issue, but DM • KISS - speed and simplicity • PIE-I/O - training+testing+execution Jukka Parviainen

  8. Origin and Detection • Missing data process • Degree of randomness • nonrandom • missing at random • missing completely at random • Detecting missing value patterns • number of MVs in each variable/case • compare MVP to complete sets Jukka Parviainen

  9. Replacing missing values • Randomness of MVs? • Methods • Use the complete data • Delete variable(s)/case(s) • Imputation methods... • Model based (ML, Bayes) • Use robust models Jukka Parviainen

  10. Imputation methods • Process of estimating MVs based on valid values of other variables / cases • Techniques: • distribution characteristics from all available valid values • replacing: case, mean substitution, cold deck, regression imputation Jukka Parviainen

  11. Examples • Polls, Questionnaires • Planning more than essential • human factors! • small amounts of data • Data from steel plant • Information system • errors, default values • lots of data Jukka Parviainen

  12. Questions • Does software applications help or hide the effect of missing values? (SPSS Clementine) • Execution/prediction phase of DM process? • What to do with alpha variables? Jukka Parviainen