1 / 35

Variance Reduction for Stable Feature Selection

Variance Reduction for Stable Feature Selection. Department of Computer Science 10/27/10. Presenter: Yue Han Advisor: Lei Yu. Outline. Introduction and Motivation Background and Related Work Preliminaries Publications Theoretical Framework

cana
Download Presentation

Variance Reduction for Stable Feature Selection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Variance Reduction for Stable Feature Selection Department of Computer Science • 10/27/10 Presenter: Yue Han Advisor: Lei Yu

  2. Outline • Introduction and Motivation • Background and Related Work • Preliminaries • Publications • Theoretical Framework • Empirical Framework : Margin Based Instance Weighting • Empirical Study • Planned Tasks

  3. Outline • Introduction and Motivation • Background and Related Work • Preliminaries • Publications • Theoretical Framework • Empirical Framework : Margin Based Instance Weighting • Empirical Study • Planned Tasks

  4. Terms T1 T2 ….…… TN C Documents Sports 12 0….…… 6 D1 Travel D2 3 10….…… 28 … … … … DM 0 11….…… 16 Jobs Introduction and MotivationFeature Selection Applications Pixels Vs Features Features(Genes or Proteins) Samples

  5. Introduction and MotivationFeature Selection from High-dimensional Data p: # of features n: # of samples High-dimensional data: p >> n High-Dimensional Data Feature Selection Algorithm MRMR, SVMRFE, Relief-F, F-statistics, etc. • Curse of Dimensionality: • Effects on distance functions • In optimization and learning • In Bayesian statistics Low-Dimensional Data • Feature Selection: • Alleviating the effect of the curse of dimensionality. • Enhancing generalization capability. • Speeding up learning process. • Improving model interpretability. Learning Models Classification, Clustering, etc. Knowledge Discovery on High-dimensional Data

  6. Introduction and MotivationStability of Feature Selection Feature Selection Method Training Data Training Data Feature Subset Training Data Feature Subset Consistent or not??? Feature Subset Stability Issue of Feature Selection Stability of Feature Selection: the insensitivity of the result of a feature selection algorithm to variations to the training set. Training Data Learning Model Stability of feature selection was relatively neglected before and attracted interests from researchers in data mining recently. Training Data Learning Model Training Data Learning Model Learning Algorithm Stability of Learning Algorithm is firstly examined by Turney in 1995

  7. Introduction and MotivationMotivation for Stable Feature Selection Features Samples D1 Given Unlimited Sample Size of D: Feature selection results from D1 and D2 are the same D2 Size of D is limited: (n<<p for high dimensional data) Feature selection results from D1 and D2 are different Challenge: Increasing #of samples could be very costly or impractical • Experts from Biology and Biomedicine are interested in: • not only the prediction accuracy but also the consistency of feature subsets; • validating stable genes or proteins less sensitive to variations to training data; • biomarkers to explain the observed phenomena.

  8. Outline • Introduction and Motivation • Background and Related Work • Preliminaries • Publications • Theoretical Framework • Empirical Framework : Margin Based Instance Weighting • Empirical Study • Planned Tasks

  9. Background and Related WorkFeature Selection Methods Subset Generation Subset Evaluation Original set Subset Goodness of subset Stopping Criterion no Yes Result Validation • Search Strategies: • Complete Search • Sequential Search • Random Search • Evaluation Criteria • Filter Model • Wrapper Model • Embedded Model • Representative Algorithms • Relief, SFS, MDLM, etc. • FSBC, ELSA, LVW, etc. • BBHFS, Dash-Liu’s, etc.

  10. Background and Related WorkStable Feature Selection • Comparison of Feature Selection Algorithms w.r.t. Stability • (Davis et al. Bioinformatics, vol. 22, 2006; Kalousis et al. KAIS, vol. 12, 2007) • Quantify the stability in terms of consistency on subset or weight; • Algorithms varies on stability and equally well for classification; • Choose the best with both stability and accuracy. • Bagging-based Ensemble Feature Selection • (Saeys et al. ECML07) • Different bootstrapped samples of the same training set; • Apply a conventional feature selection algorithm; • Aggregates the feature selection results. • Group-based Stable Feature Selection • (Yu et al. KDD08; Loscalzo et al. KDD09) • Explore the intrinsic feature correlations; • Identify groups of correlated features; • Select relevant feature groups.

  11. Background and Related WorkMargin based Feature Selection Sample Margin: how much can an instance travel before it hits the decision boundary Hypothesis Margin: how much can the hypothesis travel before it hits an instance (Distance between the hypothesis and the opposite hypothesis of an instance) Representative Algorithms: Relief, Relief-F, G-flip, Simba, etc. margin is used for feature weighting or feature selection (totally different use in our study)

  12. Outline • Introduction and Motivation • Background and Related Work • Preliminaries • Publications • Theoretical Framework • Empirical Framework : Margin Based Instance Weighting • Empirical Study • Planned Tasks

  13. Publications • Yue Han and Lei Yu. An Empirical Study on Stability of Feature Selection Algorithms. Technical Report from Data Mining Research Laboratory, Binghamton University, 2009. • Yue Han and Lei Yu. Margin Based Sample Weighting for Stable Feature Selection. In Proceedings of the 11th International Conference on Web-Age Information Management (WAIM2010), pages 680-691, Jiuzhaigou, China, July 15-17, 2010. • Yue Han and Lei Yu. A Variance Reduction Framework for Stable Feature Selection. In Proceedings of the 10th IEEE International Conference on Data Mining (ICDM2010), Sydney, Australia, December 14-17, 2010, To Appear. • Lei Yu, Yue Han and Michael E. Berens. Stable Gene Selection from Microarray Data via Sample Weighting. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 2010, Major Revision Under Review.

  14. Outline • Introduction and Motivation • Background and Related Work • Preliminaries • Publications • Theoretical Framework • Empirical Framework : Margin Based Instance Weighting • Empirical Study • Planned Tasks

  15. Theoretical FrameworkBias-variance Decomposition of Feature Selection Error Training Data: D; Data Space: ; FS Result: r(D); True FS Result: r* Expected Loss(Error): Bias: Variance: Bias-Variance Decomposition of Feature Selection Error: • Reveals relationship between accuracy(opposite of loss) and stability (opposite of variance); • Suggests a better trade-off between the bias and variance of feature selection.

  16. Theoretical FrameworkVariance Reduction via Importance Sampling Feature Selection (Weighting)  Monte Carlo Estimator Relevance Score: Monte Carlo Estimator: Variance of Monte Carlo Estimator: Impact Factor: feature selection algorithm and sample size ? Increasing sample size impractical and costly Importance Sampling A good importance sampling function h(x) Intuition behind h(x) : More instances draw from important regions Less instances draw from other regions Intuition behind instance weight : Increase weights for instances from important regions Decrease weights for instances from other regions Instance Weighting

  17. Outline • Introduction and Motivation • Background and Related Work • Preliminaries • Publications • Theoretical Framework • Empirical Framework : Margin Based Instance Weighting • Empirical Study • Planned Tasks

  18. Empirical FrameworkOverall Framework • Challenges: • How to produce weights for instances from the point view of feature selection stability; • How to present weighted instances to conventional feature selection algorithms. Margin Based Instance Weighting for Stable Feature Selection

  19. Empirical FrameworkMargin Vector Feature Space Nearest Hit Margin Vector Feature Space Original Space Nearest Miss For each Hypothesis Margin: captures the local profile of feature relevance for all features at hit miss • Instances exhibit different profiles of feature relevance; • Instances influence feature selection results differently.

  20. Empirical FrameworkAn Illustrative Example (a) (b) Hypothesis-Margin based Feature Space Transformation: (a) Original Feature Space (b) Margin Vector Feature Space.

  21. Empirical FrameworkMargin Based Instance Weighting Algorithm • Review: • Variance reduction via Importance Sampling • More instances draw • from important regions • Less instances draw from other regions exhibits different profiles of feature relevance Higher Outlying Degree Lower Weight Instance Weighting Instance influence feature selection results differently Lower Outlying Degree Higher Weight Weighting: Outlying Degree:

  22. Empirical FrameworkAlgorithm Illustration • Time Complexity Analysis: • Dominated by Instance Weighting: • Efficient for High-dimensional Data with small sample size (n<<d)

  23. Outline • Introduction and Motivation • Background and Related Work • Preliminaries • Publications • Theoretical Framework • Empirical Framework : Margin Based Instance Weighting • Empirical Study • Planned Tasks

  24. Empirical StudySubset Stability Measures Feature Selection Method Training Data Training Data Feature Subset Training Data Feature Subset Consistent or not??? Feature Subset Stability of Feature Selection • Feature Subset • Jaccard Index; • nPOGR; • SIMv. • Feature Ranking: • Spearman Coefficient • Feature Weighting: • Pearson Correlation Coefficient Average Pair-wise Similarity: Kuncheva Index:

  25. Empirical StudyExperiments on Synthetic Data 500 Training Data: 100 instances with 50 from and 50 from Leave-one-out Test Data: 5000 instances Synthetic Data Generation: Feature Value: two multivariate normal distributions Covariance matrix is a 10*10 square matrix with elements 1 along the diagonal and 0.8 off diagonal. 100 groups and 10 feature each Class label: a weighted sum of all feature values with optimal feature weight vector Method in Comparison: SVM-RFE: Recursively eliminate 10% features of previous iteration till 10 features remained. Measures: Variance, Bias, Error Subset Stability (Kuncheva Index) Accuracy (SVM)

  26. Empirical StudyExperiments on Synthetic Data • Observations: • Error is equal to the sum of bias and variancefor both versions of SVM-RFE; • Error is dominated by bias during early iterations • and is dominated by variance during later iterations; • IW SVM-RFE exhibits significantly lower bias, variance and error than • SVM-RFE when the number of remaining features approaches 50.

  27. Empirical StudyExperiments on Synthetic Data • Conclusion: • Variance Reduction via Margin Based Instance Weighting • better bias-variance tradeoff • increased subset stability • improved classification accuracy

  28. Empirical StudyExperiments on Real-world Data Experiment Setup: Microarray Data: 10-fold Cross-Validation Training Data 10 fold ... Methods in Comparison: SVM-RFE Ensemble SVM-RFE Instance Weighting SVM-RFE Test Data 20-Ensemble SVM-RFE Bootstrapped Training Data Feature Subset Aggregated Feature Subset Measures: Variance Subset Stability Accuracies (KNN, SVM) 20 ... ... Bootstrapped Training Data Feature Subset

  29. Empirical StudyExperiments on Real-world Data • Observations: • Non-discriminative during early iterations; • SVM-RFE sharply increase as # of features approaches 10; • IW SVM-RFE shows significantly slower rate of increase. Note: 40 iterations starting from about 1000 features till 10 features remain

  30. Empirical StudyExperiments on Real-world Data • Observations: • Both ensemble and instance weighting approaches improve stability consistently; • Ensemble is not as significant as instance weighting; • As # of features increases, stability score decreases because of the larger correction factor.

  31. Empirical StudyExperiments on Real-world Data • Conclusions: • Improves stability of feature selection without sacrificing prediction accuracy; • Performs much better than ensemble approach and more efficient; • Leads to significantly increased stability with slight extra cost of time.

  32. Outline • Introduction and Motivation • Background and Related Work • Preliminaries • Publications • Theoretical Framework • Empirical Framework : Margin Based Instance Weighting • Empirical Study • Planned Tasks

  33. Planned TasksOverall Framework Theoretical Framework of Feature Selection Stability Empirical Instance Weighting Framework Margin-based Instance Weighting Iterative Approach Representative FS Algorithms Various Real-world Data Set State-of-the-art Weighting Schemes SVM-RFE Relief-F F-statistics HHSVM Gene Data Text Data Relationship Between Feature Selection Stability and Classification Accuracy

  34. Planned TasksListed Tasks A Extensive Study on Instance Weighting Framework A1Extension to Various Feature Selection Algorithms A2Study on Datasets from Different Domains BDevelopment of Algorithms under Instance Weighting Framework B1Development of Instance Weighting Schemes B2Iterative Approach for Margin Based Instance Weighting CInvestigation on the Relationship between Stable Feature Selection and Classification Accuracy C1How Bias-Variance Properties of Feature Selection Affect Classification Accuracy C2Study on Various Factors for Stability of Feature Selection

  35. Thank you and Questions?

More Related