Replacing Missing Values

1 / 12

# Replacing Missing Values - PowerPoint PPT Presentation

##### Replacing Missing Values

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Replacing Missing Values Jukka Parviainen Tik-61.181 Special Course in Information Technology 27.10.1999

2. Agenda • Motivation • Objectives • Meaning for the conclusions • Origin of missing values (MV) • Detection of missing values • Replacing missing values • Examples Jukka Parviainen

3. References • Pyle, DP for DM, chapter 8 • Hair, Anderson, Tatham, Black: Multivariate Data Analysis • Bishop: NN for PR Jukka Parviainen

4. …missing values? Jukka Parviainen

5. Motivation • There are always MVs in a real data set • MVs may have an impact on modeling, in fact, they can destroy it! • MVs contain also information!!! • Hint for the modeler: Avoid-Detect-Replace-Understand Jukka Parviainen

6. “Definitions” • Missing value - not captured in the data set: errors in feeding, transmission, ... • Empty value - no value in the population • Outlier, out-of-range value Jukka Parviainen

7. Objectives • Controlled and understood by the modeler • “Least harm”, no “new” information into a data set • statistical estimation of MVs not the primary issue, but DM • KISS - speed and simplicity • PIE-I/O - training+testing+execution Jukka Parviainen

8. Origin and Detection • Missing data process • Degree of randomness • nonrandom • missing at random • missing completely at random • Detecting missing value patterns • number of MVs in each variable/case • compare MVP to complete sets Jukka Parviainen

9. Replacing missing values • Randomness of MVs? • Methods • Use the complete data • Delete variable(s)/case(s) • Imputation methods... • Model based (ML, Bayes) • Use robust models Jukka Parviainen

10. Imputation methods • Process of estimating MVs based on valid values of other variables / cases • Techniques: • distribution characteristics from all available valid values • replacing: case, mean substitution, cold deck, regression imputation Jukka Parviainen

11. Examples • Polls, Questionnaires • Planning more than essential • human factors! • small amounts of data • Data from steel plant • Information system • errors, default values • lots of data Jukka Parviainen

12. Questions • Does software applications help or hide the effect of missing values? (SPSS Clementine) • Execution/prediction phase of DM process? • What to do with alpha variables? Jukka Parviainen