Variance reduction for stable feature selection
This presentation is the property of its rightful owner.
Sponsored Links
1 / 35

Variance Reduction for Stable Feature Selection PowerPoint PPT Presentation


  • 91 Views
  • Uploaded on
  • Presentation posted in: General

Variance Reduction for Stable Feature Selection. Department of Computer Science 10/27/10. Presenter: Yue Han Advisor: Lei Yu. Outline. Introduction and Motivation Background and Related Work Preliminaries Publications Theoretical Framework

Download Presentation

Variance Reduction for Stable Feature Selection

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Variance reduction for stable feature selection

Variance Reduction for Stable Feature Selection

Department of Computer Science

  • 10/27/10

Presenter: Yue Han

Advisor: Lei Yu


Outline

Outline

  • Introduction and Motivation

  • Background and Related Work

  • Preliminaries

    • Publications

    • Theoretical Framework

    • Empirical Framework : Margin Based Instance Weighting

    • Empirical Study

  • Planned Tasks


Outline1

Outline

  • Introduction and Motivation

  • Background and Related Work

  • Preliminaries

    • Publications

    • Theoretical Framework

    • Empirical Framework : Margin Based Instance Weighting

    • Empirical Study

  • Planned Tasks


Introduction and motivation feature selection applications

Terms

T1 T2 ….…… TN

C

Documents

Sports

12 0….…… 6

D1

Travel

D2

3 10….…… 28

DM

0 11….…… 16

Jobs

Introduction and MotivationFeature Selection Applications

Pixels

Vs

Features

Features(Genes or Proteins)

Samples


Introduction and motivation feature selection from high dimensional data

Introduction and MotivationFeature Selection from High-dimensional Data

p: # of features n: # of samples

High-dimensional data: p >> n

High-Dimensional Data

Feature Selection Algorithm

MRMR, SVMRFE, Relief-F,

F-statistics, etc.

  • Curse of Dimensionality:

  • Effects on distance functions

  • In optimization and learning

  • In Bayesian statistics

Low-Dimensional Data

  • Feature Selection:

  • Alleviating the effect of the curse of dimensionality.

  • Enhancing generalization capability.

  • Speeding up learning process.

  • Improving model interpretability.

Learning Models

Classification,

Clustering, etc.

Knowledge Discovery on High-dimensional Data


Introduction and motivation stability of feature selection

Introduction and MotivationStability of Feature Selection

Feature Selection Method

Training Data

Training Data

Feature Subset

Training Data

Feature Subset

Consistent or not???

Feature Subset

Stability Issue of Feature Selection

Stability of Feature Selection: the insensitivity of the result of

a feature selection algorithm to variations to the training set.

Training Data

Learning Model

Stability of feature selection

was relatively neglected before and attracted interests from researchers in data mining recently.

Training Data

Learning Model

Training Data

Learning Model

Learning Algorithm

Stability of Learning Algorithm is

firstly examined by Turney in 1995


Introduction and motivation motivation for stable feature selection

Introduction and MotivationMotivation for Stable Feature Selection

Features

Samples

D1

Given Unlimited Sample Size of D:

Feature selection results from D1 and D2 are the same

D2

Size of D is limited: (n<<p for high dimensional data)

Feature selection results from D1 and D2 are different

Challenge: Increasing #of samples could be very costly or impractical

  • Experts from Biology and Biomedicine are interested in:

  • not only the prediction accuracy but also the consistency of feature subsets;

  • validating stable genes or proteins less sensitive to variations to training data;

  • biomarkers to explain the observed phenomena.


Outline2

Outline

  • Introduction and Motivation

  • Background and Related Work

  • Preliminaries

    • Publications

    • Theoretical Framework

    • Empirical Framework : Margin Based Instance Weighting

    • Empirical Study

  • Planned Tasks


Background and related work feature selection methods

Background and Related WorkFeature Selection Methods

Subset Generation

Subset Evaluation

Original

set

Subset

Goodness of subset

Stopping Criterion

no

Yes

Result Validation

  • Search Strategies:

  • Complete Search

  • Sequential Search

  • Random Search

  • Evaluation Criteria

  • Filter Model

  • Wrapper Model

  • Embedded Model

  • Representative Algorithms

  • Relief, SFS, MDLM, etc.

  • FSBC, ELSA, LVW, etc.

  • BBHFS, Dash-Liu’s, etc.


Background and related work stable feature selection

Background and Related WorkStable Feature Selection

  • Comparison of Feature Selection Algorithms w.r.t. Stability

  • (Davis et al. Bioinformatics, vol. 22, 2006; Kalousis et al. KAIS, vol. 12, 2007)

  • Quantify the stability in terms of consistency on subset or weight;

  • Algorithms varies on stability and equally well for classification;

  • Choose the best with both stability and accuracy.

  • Bagging-based Ensemble Feature Selection

  • (Saeys et al. ECML07)

  • Different bootstrapped samples of the same training set;

  • Apply a conventional feature selection algorithm;

  • Aggregates the feature selection results.

  • Group-based Stable Feature Selection

  • (Yu et al. KDD08; Loscalzo et al. KDD09)

  • Explore the intrinsic feature correlations;

  • Identify groups of correlated features;

  • Select relevant feature groups.


Background and related work margin based feature selection

Background and Related WorkMargin based Feature Selection

Sample Margin: how much can

an instance travel before it hits

the decision boundary

Hypothesis Margin: how much can the hypothesis travel before it hits an instance (Distance between the hypothesis and the opposite hypothesis of an instance)

Representative Algorithms: Relief, Relief-F, G-flip, Simba, etc.

margin is used for feature weighting or feature selection

(totally different use in our study)


Outline3

Outline

  • Introduction and Motivation

  • Background and Related Work

  • Preliminaries

    • Publications

    • Theoretical Framework

    • Empirical Framework : Margin Based Instance Weighting

    • Empirical Study

  • Planned Tasks


Publications

Publications

  • Yue Han and Lei Yu. An Empirical Study on Stability of Feature Selection Algorithms. Technical Report from Data Mining Research Laboratory, Binghamton University, 2009.

  • Yue Han and Lei Yu. Margin Based Sample Weighting for Stable Feature Selection. In Proceedings of the 11th International Conference on Web-Age Information Management (WAIM2010), pages 680-691, Jiuzhaigou, China, July 15-17, 2010.

  • Yue Han and Lei Yu. A Variance Reduction Framework for Stable Feature Selection. In Proceedings of the 10th IEEE International Conference on Data Mining (ICDM2010), Sydney, Australia, December 14-17, 2010, To Appear.

  • Lei Yu, Yue Han and Michael E. Berens. Stable Gene Selection from Microarray Data via Sample Weighting. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 2010, Major Revision Under Review.


Outline4

Outline

  • Introduction and Motivation

  • Background and Related Work

  • Preliminaries

    • Publications

    • Theoretical Framework

    • Empirical Framework : Margin Based Instance Weighting

    • Empirical Study

  • Planned Tasks


Theoretical framework bias variance decomposition of feature selection error

Theoretical FrameworkBias-variance Decomposition of Feature Selection Error

Training Data: D; Data Space: ; FS Result: r(D); True FS Result: r*

Expected Loss(Error):

Bias:

Variance:

Bias-Variance Decomposition of Feature Selection Error:

  • Reveals relationship between accuracy(opposite of loss) and stability (opposite of variance);

  • Suggests a better trade-off between the bias and variance of feature selection.


Theoretical framework variance reduction via importance sampling

Theoretical FrameworkVariance Reduction via Importance Sampling

Feature Selection (Weighting)  Monte Carlo Estimator

Relevance Score: Monte Carlo Estimator:

Variance of Monte Carlo Estimator:

Impact Factor: feature selection algorithm and sample size

? Increasing sample size impractical and costly

Importance Sampling A good importance sampling function h(x)

Intuition behind h(x) :

More instances draw from important regions

Less instances draw from other regions

Intuition behind instance weight :

Increase weights for instances from important regions

Decrease weights for instances from other regions

Instance Weighting


Outline5

Outline

  • Introduction and Motivation

  • Background and Related Work

  • Preliminaries

    • Publications

    • Theoretical Framework

    • Empirical Framework : Margin Based Instance Weighting

    • Empirical Study

  • Planned Tasks


Empirical framework overall framework

Empirical FrameworkOverall Framework

  • Challenges:

  • How to produce weights for instances from the point view of feature selection stability;

  • How to present weighted instances to conventional feature selection algorithms.

Margin Based Instance Weighting for Stable Feature Selection


Empirical framework margin vector feature space

Empirical FrameworkMargin Vector Feature Space

Nearest Hit

Margin Vector Feature Space

Original Space

Nearest Miss

For each

Hypothesis Margin:

captures the local profile of feature relevance for all features at

hit

miss

  • Instances exhibit different profiles of feature relevance;

  • Instances influence feature selection results differently.


Empirical framework an illustrative example

Empirical FrameworkAn Illustrative Example

(a)

(b)

Hypothesis-Margin based Feature Space Transformation:

(a) Original Feature Space (b) Margin Vector Feature Space.


Empirical framework margin based instance weighting algorithm

Empirical FrameworkMargin Based Instance Weighting Algorithm

  • Review:

  • Variance reduction via Importance Sampling

  • More instances draw

  • from important regions

  • Less instances draw from other regions

exhibits different profiles of feature relevance

Higher Outlying Degree Lower Weight

Instance Weighting

Instance

influence feature selection results differently

Lower Outlying Degree Higher Weight

Weighting:

Outlying Degree:


Empirical framework algorithm illustration

Empirical FrameworkAlgorithm Illustration

  • Time Complexity Analysis:

  • Dominated by Instance Weighting:

  • Efficient for High-dimensional Data with small sample size (n<<d)


Outline6

Outline

  • Introduction and Motivation

  • Background and Related Work

  • Preliminaries

    • Publications

    • Theoretical Framework

    • Empirical Framework : Margin Based Instance Weighting

    • Empirical Study

  • Planned Tasks


Empirical study subset stability measures

Empirical StudySubset Stability Measures

Feature Selection Method

Training Data

Training Data

Feature Subset

Training Data

Feature Subset

Consistent or not???

Feature Subset

Stability of Feature Selection

  • Feature Subset

  • Jaccard Index;

  • nPOGR;

  • SIMv.

  • Feature Ranking:

  • Spearman Coefficient

  • Feature Weighting:

  • Pearson Correlation Coefficient

Average Pair-wise Similarity:

Kuncheva Index:


Empirical study experiments on synthetic data

Empirical StudyExperiments on Synthetic Data

500 Training Data:

100 instances with 50 from and 50 from

Leave-one-out Test Data:

5000 instances

Synthetic Data Generation:

Feature Value:

two multivariate normal distributions

Covariance matrix

is a 10*10 square

matrix with elements

1 along the diagonal

and 0.8 off diagonal.

100 groups and 10 feature each

Class label:

a weighted sum of all feature values with optimal feature weight vector

Method in Comparison:

SVM-RFE: Recursively eliminate 10% features of previous iteration till 10 features remained.

Measures:

Variance, Bias, Error

Subset Stability (Kuncheva Index)

Accuracy (SVM)


Empirical study experiments on synthetic data1

Empirical StudyExperiments on Synthetic Data

  • Observations:

  • Error is equal to the sum of bias and variancefor both versions of SVM-RFE;

  • Error is dominated by bias during early iterations

  • and is dominated by variance during later iterations;

  • IW SVM-RFE exhibits significantly lower bias, variance and error than

  • SVM-RFE when the number of remaining features approaches 50.


Empirical study experiments on synthetic data2

Empirical StudyExperiments on Synthetic Data

  • Conclusion:

  • Variance Reduction via Margin Based Instance Weighting

  • better bias-variance tradeoff

  • increased subset stability

  • improved classification accuracy


Empirical study experiments on real world data

Empirical StudyExperiments on Real-world Data

Experiment Setup:

Microarray Data:

10-fold Cross-Validation

Training

Data

10 fold

...

Methods in Comparison:

SVM-RFE

Ensemble SVM-RFE

Instance Weighting SVM-RFE

Test Data

20-Ensemble SVM-RFE

Bootstrapped Training Data

Feature Subset

Aggregated Feature Subset

Measures:

Variance

Subset Stability

Accuracies (KNN, SVM)

20

...

...

Bootstrapped Training Data

Feature Subset


Empirical study experiments on real world data1

Empirical StudyExperiments on Real-world Data

  • Observations:

  • Non-discriminative during early iterations;

  • SVM-RFE sharply increase as # of features approaches 10;

  • IW SVM-RFE shows significantly slower rate of increase.

Note: 40 iterations starting from about 1000 features till 10 features remain


Empirical study experiments on real world data2

Empirical StudyExperiments on Real-world Data

  • Observations:

  • Both ensemble and instance weighting approaches improve stability consistently;

  • Ensemble is not as significant as instance weighting;

  • As # of features increases, stability score decreases because of the larger correction factor.


Empirical study experiments on real world data3

Empirical StudyExperiments on Real-world Data

  • Conclusions:

  • Improves stability of feature selection without sacrificing prediction accuracy;

  • Performs much better than ensemble approach and more efficient;

  • Leads to significantly increased stability with slight extra cost of time.


Outline7

Outline

  • Introduction and Motivation

  • Background and Related Work

  • Preliminaries

    • Publications

    • Theoretical Framework

    • Empirical Framework : Margin Based Instance Weighting

    • Empirical Study

  • Planned Tasks


Planned tasks overall framework

Planned TasksOverall Framework

Theoretical Framework of Feature Selection Stability

Empirical Instance Weighting Framework

Margin-based Instance Weighting

Iterative Approach

Representative FS Algorithms

Various Real-world Data Set

State-of-the-art Weighting Schemes

SVM-RFE

Relief-F

F-statistics

HHSVM

Gene Data

Text Data

Relationship Between Feature Selection Stability and Classification Accuracy


Planned tasks listed tasks

Planned TasksListed Tasks

A Extensive Study on Instance Weighting Framework

A1Extension to Various Feature Selection Algorithms

A2Study on Datasets from Different Domains

BDevelopment of Algorithms under Instance Weighting Framework

B1Development of Instance Weighting Schemes

B2Iterative Approach for Margin Based Instance Weighting

CInvestigation on the Relationship between Stable Feature Selection

and Classification Accuracy

C1How Bias-Variance Properties of Feature Selection Affect Classification Accuracy

C2Study on Various Factors for Stability of Feature Selection


Variance reduction for stable feature selection

Thank you

and

Questions?


  • Login