slide1
Download
Skip this Video
Download Presentation
Feature Selection and Bioinformatics Applications Isabelle Guyon

Loading in 2 Seconds...

play fullscreen
1 / 29

Feature Selection and Bioinformatics Applications Isabelle Guyon - PowerPoint PPT Presentation


  • 125 Views
  • Uploaded on

Feature Selection and Bioinformatics Applications Isabelle Guyon. Part I. INTRODUCTION. Objectives. Output y. Predictor f( x ). Input x. Reduce the number of features as much as possible without significantly degrading prediction performance.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Feature Selection and Bioinformatics Applications Isabelle Guyon' - dulcea


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1
Feature Selection

and

Bioinformatics Applications

Isabelle Guyon

part i
Part I

INTRODUCTION

objectives
Objectives

Output y

Predictor f(x)

Input x

  • Reduce the number of features as much as possible without significantly degrading prediction performance.
  • Possibly improve prediction performances.
  • Gain insight.
applications
Applications

training examples

High

Energy

Physics

Market

Analysis

OCR

HWR

105

Machine

Vision

104

Text

Categorization

103

Genomics

System diagnosis

102

Bioinformatics

10

Proteomics

inputs

10

102

103

104

105

this talk
This talk:
  • Simple is beautiful but some (moderate) sophistication is needed.
  • “Classical statistics” is pessimistic: it advocates the simplest methods to overcome the curse of dimensionality.
  • Modern statistical methods from soft-computing and machine learning provide necessary additional sophistication and still defeat the curse of dimensionality.
part ii
Part II

PROBLEM STATEMENT

correlation analysis
Correlation Analysis

{yk}, k=1…num_patients

{xik}, k=1…num_patients

m-

m+

Top 25 positively correlated features (genes)

Top 25 negatively correlated features (genes)

s-

s+

38 training ex. (27 ALL, 11 AML); 34 test ex. (20 ALL, 14 AML).

Golub et al, Science

Vol 286:15 Oct. 1999

{- yk}

yes but
Yes, but ...

s-

s+

m-

m+

m-

m+

s-

s+

i i d features
I.I.D. Features

6

4

2

0

-2

-4

5

0

-5

-4

-2

0

2

4

6

-5

0

5

i i d features1
I.I.D. Features

5

0

-5

6

4

2

0

-2

-4

-6

-5

0

5

-6

-4

-2

0

2

4

6

m-

m+

smaller win
Smaller Win

4

2

0

-2

-4

-6

4

2

0

-2

-4

-6

-6

-4

-2

0

2

4

-6

-4

-2

0

2

4

bigger win
Bigger Win

6

4

2

0

-2

-4

4

2

0

-2

-4

-6

-4

-2

0

2

4

6

-6

-4

-2

0

2

4

explanation
Explanation:

F1: The peak of interest

F2: The best local estimate of the baseline.

two useless features
Two “Useless” Features

1.5

1

0.5

0

-0.5

1.5

1

0.5

0

-0.5

-0.5

0

0.5

1

1.5

-0.5

0

0.5

1

1.5

Axis projections do not help finding good features.

higher dimension problem
Higher dimension problem

Even two-d projections may not help finding good features.

part iv
Part IV

ALGORITHMS

main goal
Main Goal

Output

Output

Predictor f(x)

- Eliminate useless features (distracters).

- Rank useful features.

- Eliminate redundant features.

- Rank subsets of useful features.

Sub-goals:

Main goal:

filters and wrappers
Filters and Wrappers

Feature subset

  • Main goal: rank subsets of useful features.
  • Danger of overfitting: Greedy search often works better.

All features

Filter

Predictor

Multiple Feature subsets

All features

Predictor

Wrapper

nested subset methods
Nested Subset Methods

Nested subset methods perform a greedy search:

- At each step add or remove a single feature to best improve (or least degrade) the cost function.

- Backward elimination:

Start with all features, progressively remove (never add). Example: RFE (Guyon, Weston, et al, 2002.)

- Forward selection:

Start with an empty set, progressively add (never remove). Example: Gram-Schmidt orthogonalization (Stoppiglia et al, 2003, Rivals-Personnaz, 2003.)

backward elimination rfe
Backward elimination: RFE

Improve (or least degrade) cost function J(t):

  • Exact or approximate difference calculation DJ=J(feat+1)-J(feat).
  • RFE with linear predictor f(x)=w.x+b: eliminate the feature with smallest wi2(Guyon, Weston, et al, 2002.)
  • Zero norm/multiplicative updates (MU): rescale the input with |wi| at each iteration(Weston, Elisseeff et al. 2003.)
  • Non-linear RFE and non-linear MU: estimate (DJ)i ~ aH(i)a.
forward selection gram schmidt
Forward selection: Gram-Schmidt

Feature ranking

in the context of others

  • Vanilla (linear) GS: At every iteration, project onto null space of features already selected; select feature most correlated with target.
  • Relief (Kira and Rendell, 1992):
  • GS-Relief combination (Guyon, 2003).
part iv1
Part IV

EXPERIMENTS

mass spectrometry experiments
Mass Spectrometry Experiments

In collaboration with Biospect Inc., 2003

Data from Cancer Research, Adam, et al, 2002

TOF

- EVMS prostate cancer data: 326 samples (167 cancer, 159 control).

- Preprocessing including m/z 200-10000, baseline removal.

- Split in 3 equal parts and make 3 experiments 2/3 train 1/3 test.

- Fourty-four methods tried.

method comparison 100 features
Method Comparison: 100 Features

...

Non-linear multivariate > Linear multivariate > Linear univariate

method comparison 7 features
Method Comparison: 7 Features

...

Non-linear multivariate > Linear multivariate > Linear univariate

part v
Part V

CONCLUSION

experimental results
Experimental Results

In spite of the risk of overfitting ...

  • Subset selection methods can outperform single feature ranking by correlation with the target.
  • Non-linear feature selection can outperform linear feature selection.

|

>

>

… in prediction performance and number of features.

which method works best
Which method works best?

See the results of the NIPS 2003 competition.

Presentation on December 19th.

See also:

JMLR special issue:

www.jmlr.org/papers/special/feature.html

I. Guyon and A. Elisseeff editors, March 2003.

Workshop website:

www.clopinet.com/isabelle/Projects/NIPS2003

Acknowledgements: Masoud Nikravesh

ad