1 / 18

Graph-based Iterative Hybrid Feature Selection

Graph-based Iterative Hybrid Feature Selection. Erheng Zhong † Sihong Xie † Wei Fan ‡ Jiangtao Ren † Jing Peng # Kun Zhang $ † Sun Yat-sen University ‡ IBM T. J. Watson Research Center # Montclair State University $ Xavier University of Louisiana. Where we are.

dot
Download Presentation

Graph-based Iterative Hybrid Feature Selection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Graph-based Iterative Hybrid Feature Selection Erheng Zhong† Sihong Xie† Wei Fan‡ Jiangtao Ren† Jing Peng# Kun Zhang$ †Sun Yat-sen University ‡IBM T. J. Watson Research Center #Montclair State University $Xavier University of Louisiana

  2. Where we are • Supervised Feature Selection • Unsupervised Feature Selection • Semi-supervised Feature Selection • Hybrid: • Supervised to include key features • Improve with semi-supervised approach

  3. Supervised Feature Selection Only feature 2 will be selected, but feature 1 is also useful! sample selection bias problem

  4. Toy example (1) Labeled data A(1,1,1,1;red) B(1,-1,1,-1;blue) Unlabeled data C(0,1,1,1;red) D(0,-1,1,1;red) Both feature 2 & 4 are correlated to class based on A and B. They are selected by supervised fs.

  5. Semi-supervised Feature Selection

  6. Toy example (2) A semi-supervised approach “Spectral Based Feature Selection”. Features are ranked according to the smoothness between data points and consistency with label information. Feature 2 will be selected if only one feature is desired.

  7. Solution  Hybrid • Labeled data insufficient  Sample selection bias Supervised fail • Unlabeled data indistinct  Data from different class are not separated Semi-supervised fail

  8. Hybrid Feature Selection [IteraGraph_FS] ?

  9. Toy example (3)

  10. Properties of feature selection • The distance between any two examples is approximately the same under the high-dimension feature space. [Theorem 3.1] • Feature selection can obtain a more distinguishable distance measure which lead to a better confidence estimate. [Theorem 3.2]

  11. Theorems 3.1 and 3.2 3.1 Dimensionality increases  Nearest neighbor approaches the farthest neighbor 3.2 More distinguishable similarity measure  Better classification confidence matrix

  12. Semi-supervised Feature Selection • Graph-based [Label Propagation] • Expand the labeled set by adding unlabeled data and their prediction labels which have high confidence (s%). • Perform feature selection on the new labeled set.

  13. Confidence and Margin (Lemma 3.2)

  14. Selection Strategy Comparison (Theorem 3.3)

  15. Experiments setup • Data Set • Handwritten Digit Recognition Problem • Biomedical and Gene Expression Data • Text Documents [Reuters-21578] • Comparable Approach • Supervised Feature selection: SFFS • Semi-supervised approach: sSelect [SDM07]

  16. Data Set -- Description

  17. Feature Quality Study

  18. Conclusions • Labeled information  Critical features, better confidence estimates • Unlabeled data  Improve this chosen feature set • Flexible • Can incorporate many feature selection methods which aim at revealing the relationship between data points.

More Related