1 / 12

Forward Semi-Supervised Feature Selection

Forward Semi-Supervised Feature Selection. Jiangtao Ren, Zhengyuan Qiu, Wei Fan, Hong Cheng, and Philip S. Yu. Feature Selection. Challenges of high dimension data Dimensional curse Noise Objective of feature selection Improving the performance of the predictors

emmett
Download Presentation

Forward Semi-Supervised Feature Selection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Forward Semi-Supervised Feature Selection Jiangtao Ren, Zhengyuan Qiu, Wei Fan, Hong Cheng, and Philip S. Yu

  2. Feature Selection • Challenges of high dimension data • Dimensional curse • Noise • Objective of feature selection • Improving the performance of the predictors • Providing more cost-effective predictors • Better understanding of the underlying process that generated the data

  3. Supervised / unsupervised learning • Supervised learning • Used “labeled data” only • Unsupervised learning • Used “unlabeled data” only

  4. Challenges of traditional feature selection methods • A lot of supervised learning methods • Lack of labeled data • The class labels are obtained manually • The class labels are expensive to obtained • Data bias • Challenges: • The training dataset cannot reflect the distribution of the real data in some time. • The model constructed on training set may be not suitable for the unseen data

  5. Abundance of the unlabeled data • Easy to obtain • Don’t need the manually-labeled information • Can reflect the distribution of the real data

  6. Then… How to used unlabeled data effectively?

  7. Forward Semi-Supervised Feature Selection • Basic idea • Random selection from unlabeled data with predicted labels • Form new training set • Feature selection on new training set • Perform several iterations • Add the most frequent one to the result feature subset

  8. Forward Semi-Supervised Feature Selection Iterations Unlabeled data with predicted labels Random selection New training set Train the Classifier and Prediction Iterations SFFS Form the new Feature subset Select the best features Select the most frequent one feature subset

  9. Forward semi-supervised feature selection

  10. Experiment • Datasets • UCI • Classifiers • NaiveBayes, NNge, and k-NN • Comparison • FULL, SFFS and SLS Z. Zhao and H. Liu. ``Semi-supervised Feature Selection via Spectral Analysis", SIAM International Conference on Data Mining (SDM-07), April 26-28, 2007, Minneapolis, Minnesoda ------------------ SLS

  11. Empirical Results

  12. Conclusion • The proposed algorithm works in an iterative procedure; • Unlabeled examples receive labels from the classifier constructed on currently selected feature subset; • Form joint dataset with labeled and randomly selected unlabeled data with predicted labels; • Experiment results show that the proposed approach, can obtained higher accuracy than other supervised and semi-supervised feature selection algorithms in sometime.

More Related