1 / 13

Consensus Group Stable Feature Selection

Consensus Group Stable Feature Selection. Steven Loscalzo Dept. of Computer Science Binghamton University. Lei Yu Dept. of Computer Science Binghamton University. Chris Ding Dept. of Computer Science and Engineering University of Texas at Arlington.

Download Presentation

Consensus Group Stable Feature Selection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Consensus GroupStable Feature Selection Steven Loscalzo Dept. of Computer Science Binghamton University Lei Yu Dept. of Computer Science Binghamton University Chris Ding Dept. of Computer Science and Engineering University of Texas at Arlington The 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

  2. Overview • Background and motivation • Propose Consensus Feature Group Framework • Finding Consensus Groups • Feature Selection from Consensus Groups • Experimental Study • Conclusion Loscalzo, Yu, Ding Consensus Group Stable Feature Selection

  3. Feature Selection Stability Sampling ModelBuilding Acc % Feature Selection Sample 1 All Training Data F={f2,f5} 92% Sample 2 F’={f4,f10} 91% … Sample k F’’={f5, f11} 93% Loscalzo, Yu, Ding Consensus Group Stable Feature Selection

  4. Motivation • Need for stable feature selection • Give confidence to lab tests • Uncover “truly” relevant information • Utility of feature groups • Model feature interaction • Lack information about a single feature, another in the group may be well studied Loscalzo, Yu, Ding Consensus Group Stable Feature Selection

  5. Dense Feature Group Framework • Dense feature groups can provide stability and accuracy [Yu, Ding, Loscalzo, KDD-08] • Dense Group Stable Feature Selection Framework • Map features as points in sample space • Apply kernel density estimation locate dense feature groups • Select top relevant groups from dense groups • Limitations of this framework • Unreliable density estimation in high-dimensional spaces • Restricts selection of relevant groups to dense groups Loscalzo, Yu, Ding Consensus Group Stable Feature Selection

  6. Consensus Feature Group Framework • Consensus feature groups are ensemble of feature grouping results • Select relevant groups from whole spectrum of consensus groups • Challenges • Base algorithm for ensemble: dense group finder [Yu, Ding, Loscalzo, KDD-08] • Aggregate feature grouping results Loscalzo, Yu, Ding Consensus Group Stable Feature Selection

  7. Group Aggregation Data sub-sample Feature Group Results • 3 aggregation ideas: • Heuristics (reference set) • Cluster based [Fern, Brodley, ICML-03] • Instance based [Fern, Brodley, ICML-03] 1 1 f5 f1 f2 f3 f4 f2 2 2 f4 f5 f1 f3 f2 f1 3 3 f5 f3 f4 f4 f5 Consensus Feature Groups f2 f3 f1 Loscalzo, Yu, Ding Consensus Group Stable Feature Selection

  8. D D1 Dt … Result Grouping 1 Result Grouping t ... ... Measure Instance Co-occurrence Hierarchical Clustering Consensus Feature Groups ... The CGS Algorithm CGS: The Consensus Group Stable Feature Selection Algorithm fori = 1 totdo Construct Training Partition Di from D Run DGF on Di for every pair of features Xiand Xj in D Update Wi,j := freq. Xi and Xj appear together in results create consensus groups CG1,CG2,…,CGL via hierarchical clustering of all features based on Wi,j for i = 1 toL do Obtain a representative feature Xi from CGi Measure relevance of Xi set as relevance of CGi Rank CG1,CG2,…,CGLand return the top k Loscalzo, Yu, Ding Consensus Group Stable Feature Selection

  9. Used 10 random shuffles of data: 10 fold cross validation 9/10 folds training 1/10 folds testing Results shown are averages across 10 folds x 10 shuffles Experimental Setup Setting Algorithms CGS – sub-samples t = 10 DRAGS[Yu, Ding, Loscalzo, KDD-08] – top dense group based feature selection SVM-RFE [Guyon et al, ML-02] – recursively eliminates features based on weights found after training an SVM Loscalzo, Yu, Ding Consensus Group Stable Feature Selection

  10. StabilitySelected Features StabilitySelected Groups Loscalzo, Yu, Ding Consensus Group Stable Feature Selection

  11. Accuracy Results Loscalzo, Yu, Ding Consensus Group Stable Feature Selection

  12. Conclusion • Proposed consensus group stable feature selection framework • Stable • Accurate • Future directions • Apply different ensemble techniques • Incorporate new group finding algorithms Loscalzo, Yu, Ding Consensus Group Stable Feature Selection

  13. References Fern, X. Z., and Brodley, C. Random projection for high-dimensional data clustering: a cluster ensemble approach. In Proceedings of the 20th Conference on Machine Learning (ICML-03). 186-192, 2003. Guyon, I., Weston, J., Barnhill, S., Vapnik, V. Gene selection for cancer classification using support vector machines. Machine Learning (ML-02);46:389–422, 2002. Yu, L., Ding, C., and Loscalzo, S. Stable feature selection via dense feature groups. In Proceedings of the 14th ACM International Conference on Knowledge Discovery and Data Mining (KDD-08). 803-811, 2008. Loscalzo, Yu, Ding Consensus Group Stable Feature Selection

More Related