1 / 10

Coping with Missing Data for Active Learning

Coping with Missing Data for Active Learning. 02-750 Automation of Biological Research jgc@cs.cmu.edu. What is Missing?. In active learning the category label is missing, and we can query an oracle, mindful of cost What else can be missing? Features: we may not have enough for prediction

iliana
Download Presentation

Coping with Missing Data for Active Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Coping with Missing Datafor Active Learning 02-750 Automation of Biological Research jgc@cs.cmu.edu

  2. What is Missing? • In active learning the category label is missing, and we can query an oracle, mindful of cost • What else can be missing? • Features: we may not have enough for prediction • Feature combinations: beyond those the classifier is able to generate automatically (e.g. XOR, ratios) • Values of features: Not all instances have values for all their features. • Feature relevance: Some features are noisy or irrelevant • Feature redundancy: e.g. high feature co-variance

  3. Reducing the Feature Space • Feature selection • Subsample features using IG, MI, … • Well studied, e.g. Yang & Pedersen ICML 1997 • Wrapper methods • Inefficient but accurate, less studied • Feature projection (to lower dimensions) • LDA, SVD, LSI • Slow, well studied, e.g. Falluchi et al 2009 • Kernel functions on feature sub-spaces

  4. Missing Feature Values • Active learning of features • Not as extensively studied as active instance learning (See Saar-Tsechanskyet al, 2007) • Determines which feature values to seek for given instances, or which features across the board • Can be combined with active instance learning • But, what if there is no oracle? • Impossible to get feature values • Too costly or too time consuming • Do we ignore instances with missing features?

  5. Missing Data

  6. How to Cope with Missing Features • ML training assumes feature completeness • Filter our features that are mostly missing • Filter out instances with missing features • Impute values for missing features • Radically change ML algorithms • When do we do each of the above? • With lots of data and few missing features… • With sparse training data and few missing… • With sparse data and mostly missing features…

  7. Missing Feature Imputation • How do we estimate missing feature values? • Infer the mean value across all instances • Infer the mean value in neighborhood • Apply a classifier with other features as input and missing feature value as y (label) • How do we know if it makes a difference? • Sensitivity analysis (extrema, pertubations) • Train without instances with missing features vs instances with imputed values for missing features

  8. More on Missing Values • Missing Completely at Random (MCAR) • It is generally impossible to prove MCAR or MAR • Missing at Random (MAR) • Statisticians assume MAR as default • Missing values that depend on observables • Imputation via classification/regression • Missing valued that depend on unobservables • Missing depending on the value itself

  9. Imputation – Example[From: Fan 2008] • How to impute the missing SCL for patient # 5? • Sample mean: (3.8 + 0.6 + 1.1 + 1.3)/4 = 1.7 • By age: (3.8+0.6)/2 = 2.2 • By sex: 1.1 • By education: 1.3 • By race: (3.8 + 0.6 + 1.3)/3 = 1.9 • By ADL: (1.1 + 1.3)/2 = 1.2 • Who is/are in the same “slice” with #5?

  10. Further Reading • Saar-Tsechansky& Provost http://www.springerlink.com/content/k5m57475n1658723/fulltext.pdf • Yang, Y., Pedersen J.P. A Comparative Study on Feature Selection in Text CategorizationICML 1997, pp412-420 • Gelman chapter: http://www.stat.columbia.edu/~gelman/arm/missing.pdf • Applications in biomed: Lavori, P., R. Dawson and D. Shera (1995) “A Multiple Imputation Strategy for Clinical TrialswithTruncation of Patient Data.” Statistics in Medicine 14: 1913-1925.

More Related