1 / 15

Margin Based Sample Weighting for Stable Feature Selection

Margin Based Sample Weighting for Stable Feature Selection. Yue Han, Lei Yu State University of New York at Binghamton. Outline. Introduction Related Work Hypothesis-Margin Feature Space Transformation Margin Based Sample Weighting Experimental Study Conclusion and Future Work. Terms.

nibal
Download Presentation

Margin Based Sample Weighting for Stable Feature Selection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Margin Based Sample Weighting for Stable Feature Selection Yue Han, Lei Yu State University of New York at Binghamton

  2. Outline • Introduction • Related Work • Hypothesis-Margin Feature Space Transformation • Margin Based Sample Weighting • Experimental Study • Conclusion and Future Work

  3. Terms T1 T2 ….…… TN C Documents Sports 12 0….…… 6 D1 Travel D2 3 10….…… 28 … … … … DM 0 11….…… 16 Jobs Introduction p: # of features n: # of samples High-dimensional data: p >> n High dimensional Data Learning Model Feature Selection (Filter or Wrapper) Dimension reduced Data Features(Genes or Proteins) • Feature Selection: • Alleviating the effect of the curse of • dimensionality. • Enhancing generalization capability. • Speeding up learning process. • Improving model interpretability. Samples

  4. Cont’s Features Given Unlimited Sample Size of D: Feature selection results from D1 and D2 are the same D1 Samples Size of D is limited(n<<p for high dimensional data) Feature selection results from D1 and D2 are different D2 Increasing #of samples could be very costly or impractical Stability of feature selection - the insensitivity of the result of a feature selection algorithm to variations in the training set. Identifying characteristic markers to explain the observed phenomena

  5. Related Work • Bagging-based Ensemble Feature Selection • (Saeys et al. ECML07) • Different bootstrapped samples of the same training set; • Apply a conventional feature selection algorithm; • Aggregates the feature selection results. • Group-based Stable Feature Selection • (Yu et al. KDD08 , KDD09) • Explore the intrinsic feature correlations; • Identify groups of correlated features; • Select relevant feature groups.

  6. Hypothesis-Margin Feature Space Transformation • Introduce the concept of hypothesis-margin feature space; • Propose the framework of margin based instance weighting for stable feature selection; • Develop an efficient algorithm under the proposed framework. A framework of margin based instance weighting for stable feature selection

  7. Hypothesis-Margin Feature Space Transformation X’ captures the local profile of feature importance for all features at X. Multiple nearest neighbors can be used to compute the HM of a sample miss hit

  8. Cont’s Hypothesis-margin based feature space transformation: (a) original feature space, and (b) hypothesis-margin (HM) feature space.

  9. Margin Based Sample Weighting • Discrepancy among samples w.r.t. their local profiles of feature importance(HM feature space) • Measure the average distance of X’ to all other samples in the HM feature space and greater average distance indicates higher outlying degree. • overall time complexity O(n2q) and n is the number of samples and q is the dimensionality of D.

  10. Experimental Study Stability Metrics Feature Ranking Feature Subset Selection Feature Correlation Stability of a feature selection algorithm is measured as the average of the pair-wise similarity of various feature selection results produced by the same algorithm from different training sets.

  11. Cont’s • Experimental Setup • SVM-RFE: 10 percent of remaining features eliminated at each iteration. • En-RFE: 20 bootstrapped training sets to construct the ensemble. • IW-RFE: k = 10 for hypothesis margin transformation. • 10tims shuffling and 10 fold cross-validation to generate 100 datasets.

  12. Consistent improvement in terms of stability of feature selection results from different stability measures

  13. different feature selection algorithms can lead to similarly good classification results

  14. Conclusion and Future Work • Introduced the concept of hypothesis-margin feature space • Proposed the framework of margin based sample weighting for stable feature selection • Developed an efficient algorithm under the framework • Investigate alternative methods of sample weighting based on HM feature space • Strategies to combine margin based sample weighting with group-based stable feature selection

  15. Questions? Thank you!

More Related