slide1 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, PowerPoint Presentation
Download Presentation
RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon,

Loading in 2 Seconds...

play fullscreen
1 / 39

RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, - PowerPoint PPT Presentation


  • 177 Views
  • Uploaded on

RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see http://www.agnostic.inf.ethz.ch/credits.php. Thanks. Part I. INTRODUCTION. Model selection.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon,' - chanel


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1
RESULTS OF THE NIPS 2006

MODEL SELECTION GAME

Isabelle Guyon, Amir Saffari, Gideon Dror,

Gavin Cawley, Olivier Guyon,

and many other volunteers, see http://www.agnostic.inf.ethz.ch/credits.php

part i
Part I

INTRODUCTION

model selection
Model selection
  • Selecting models (neural net, decision tree, SVM, …)
  • Selecting hyperparameters (number of hidden units, weight decay/ridge, kernel parameters, …)
  • Selecting variables or features (space dimensionality reduction.)
  • Selecting patterns (data cleaning, data reduction, e.g by clustering.)
performance prediction challenge
Performance prediction challenge

How good are you at predicting

how good you are?

  • Practically important in pilot studies.
  • Good performance predictions render model selection trivial.
model selection game
Model Selection Game

Find which model works best

in a well controlled environment.

  • A given “sandbox”: the CLOP Matlab® toolbox.
  • Focus only on devising model selection strategy.
  • Same datasets as the performance prediction challenge, but “reshuffled”
  • Two $500 prizes offered.
agnostic learning vs prior knowledge challenge
Agnostic Learning vs. Prior Knowledge challenge

When everything else fails,

ask for additional domain knowledge…

  • Two tracks:
    • Agnostic learning: Preprocessed datasets in a nice “feature-based” representation, but no knowledge about the identity of the features.
    • Prior knowledge: Raw data, sometimes not in a feature-based representation. Information given about the nature and structure of the data.
game rules
Game rules
  • Date started: October 1st, 2006.
  • Date ended: December 1st, 2006
  • Duration: 3 months.
  • Submit in Agnostic track only.
  • Optionally use CLOP or Spider.
  • Five last complete entries ranked:
    • Total ALvsPK challenge entrants: 22.
    • Total ALvsPK developement entries: 546.
    • Number of game ranked participants: 10.
    • Number of game ranked submissions: 39.
datasets
Datasets

Type

Dataset

Validation Examples

Domain

Feat-ures

Training Examples

Test Examples

Dense

ADA

415

Marketing

48

4147

41471

Dense

GINA

315

Digits

970

3153

31532

Dense

HIVA

384

Drug discovery

1617

3845

38449

Sparse binary

NOVA

175

Text classif.

16969

1754

17537

Dense

SYLVA

1308

Ecology

216

13086

130858

http://www.agnostic.inf.ethz.ch

agnostic track on dec 1 st 2006
Agnostic track on Dec. 1st 2006
  • Yellow: used a CLOP model
  • CLOP prize winner: Juha Reunanen (both ave. rank and ave. BER)
  • Best ave. BER still held by Reference (Gavin Cawley) with the_bad.
part ii
Part II

PROTOCOL and SCORING

protocol
Protocol
  • Data split: training/validation/test.
  • Data proportions: 10/1/100.
  • Online feed-back on validation data.
  • Validation label release: not yet; one month before end of challenge.
  • Final ranking on test data using the five last complete submissions for each entrant.
performance metrics
Performance metrics
  • Balanced Error Rate (BER): average of error rates of positive class and negative class.
  • Area Under the ROC Curve (AUC).
  • Guess error (for the performance prediction challenge only):

dBER = abs(testBER – guessedBER)

slide15
CLOP
  • CLOP=Challenge Learning Object Package.
  • Based on the Spider developed at the Max Planck Institute.
  • Two basic abstractions:
    • Data object
    • Model object

http://www.agnostic.inf.ethz.ch/models.php

clop tutorial
CLOP tutorial

At the Matlab prompt:

  • D=data(X,Y);
  • hyper = {'degree=3', 'shrinkage=0.1'};
  • model = kridge(hyper);
  • [resu, model] = train(model, D);
  • tresu = test(model, testD);
  • model = chain({standardize,kridge(hyper)});
model grouping
Model grouping

for k=1:10

base_model{k}=chain({standardize, naive});

end

my_model=ensemble(base_model);

part iii
Part III

RESULT ANALYSIS

what did we expect
What did we expect?
  • Learn about new competitive machine learning techniques.
  • Identify competitive methods of performance prediction, model selection, and ensemble learning (theory put into practice).
  • Drive research in the direction of refining such methods (on-going benchmark).
method comparison ppc
Method comparison (PPC)

Agnostic track no significant improvement so far

dBER

Test BER

ls svm
LS-SVM

Gavin Cawley, July 2006

logitboost
Logitboost

Roman Lutz, July 2006

clop models best entrant

Dataset

CLOP models selected

2*{sns,std,norm,gentleboost(neural),bias}; 2*{std,norm,gentleboost(kridge),bias}; 1*{rf,bias}

ADA

GINA

6*{std,gs,svc(degree=1)}; 3*{std,svc(degree=2)}

HIVA

3*{norm,svc(degree=1),bias}

NOVA

5*{norm,gentleboost(kridge),bias}

SYLVA

4*{std,norm,gentleboost(neural),bias}; 4*{std,neural}; 1*{rf,bias}

CLOP models (best entrant)

Juha Reunanen, cross-indexing-7

sns = shift’n’scale, std = standardize, norm = normalize(some details of hyperparameters not shown)

clop models 2 nd best entrant

Dataset

CLOP models selected

{sns, std, norm, neural(units=5), bias}

ADA

GINA

{norm, svc(degree=5, shrinkage=0.01), bias}

HIVA

{std, norm, gentleboost(kridge), bias}

NOVA

{norm,gentleboost(neural), bias}

SYLVA

{std, norm, neural(units=1), bias}

CLOP models (2nd best entrant)

Hugo Jair Escalante Balderas, BRun2311062

sns = shift’n’scale, std = standardize, norm = normalize(some details of hyperparameters not shown)

Note: entry Boosting_1_001_x900 gave better results, but was older.

danger of overfitting ppc
Danger of overfitting (PPC)

Full line: test BER

Dashed line: validation BER

0.5

0.45

0.4

0.35

HIVA

0.3

BER

0.25

0.2

ADA

0.15

0.1

NOVA

GINA

0.05

SYLVA

0

0

20

40

60

80

100

120

140

160

Time (days)

two best clop entrants game
Two best CLOP entrants (game)

Ave. test BER

H._Jair_Escalante

Juha Reunanen

Time

Statistically significant difference for 3/5 datasets.

top ranking methods
Top ranking methods
  • Performance prediction:
    • CV with many splits 90% train / 10% validation
    • Nested CV loops
  • Model selection
    • Performance prediction challenge
      • Use of a single model family
      • Regularized risk / Bayesian priors
      • Ensemble methods
      • Nested CV loops, computationally efficient with with VLOO
    • Model selection game
      • Cross-indexing
      • Particle swarm
part iv
Part IV

COMPETE NOW

in the

PRIOR KNOWLEDGE TRACK

slide32
ADA

ADA is the marketing database

  • Task: Discover high revenue people from census data. Two-class pb.
  • Source: Census bureau, “Adult” database from the UCI machine-learning repository.
  • Features: 14 original attributes including age, workclass,  education, education, marital status, occupation, native country. Continuous, binary and categorical features.
slide33
GINA

GINA is the digit database

  • Task: Handwritten digit recognition. Separate the odd from the even digits. Two-class pb. with heterogeneous classes.
  • Source: MNIST database formatted by LeCun and Cortes.
  • Features: 28x28 pixel map.
slide34
HIVA

HIVA is the HIV database

  • Task: Find compounds active against the AIDS HIV infection. We brought it back to a two-class pb. (active vs. inactive), but provide the original labels (active, moderately active, and inactive).
  • Data source: National Cancer Inst.
  • Data representation: The compounds are represented by their 3d molecular structure.
slide35
NOVA

Subject: Re: Goalie masksLines: 21Tom Barrasso wore a great mask, one time, last season.  He unveiled it at a game in Boston.  It was all black, with Pgh city scenes on it. The "Golden Triangle" graced the top, alongwith a steel mill on one side and the Civic Arena on the other.   On the back of the helmet was the old Pens' logo the current (at the time) Penslogo, and a space for the "new" logo.A great mask done in by a goalie's superstition.Lori

NOVA is the text classification database

  • Task: Classify newsgroup emails into politics or religion vs. other topics.
  • Source: The 20-Newsgroup dataset from in the UCI machine-learning repository.
  • Data representation : The raw text with an estimated 17000 words of vocabulary.
sylva
SYLVA

SYLVA is the ecology database

  • Task: Classify forest cover types into Ponderosa pine vs. everything else.
  • Source: US Forest Service (USFS).
  • Data representation: Forest cover type for 30 x 30 meter cells encoded with 108 features (elavation, hill shade, wilderness type, soil type, etc.)
how to enter
How to enter?
  • Enter results on any dataset in either track until March 1st 2007 at http://www.agnostic.inf.ethz.ch.
  • Only “complete” entries (on 5 datasets) will be ranked. The 5 last will count.
  • Seven prizes:
    • Best overall agnostic entry.
    • Best overall prior knowledge entry.
    • Best prior knowledge result in each dataset (5 prizes).
    • Best paper.
conclusions
Conclusions
  • Less participation volume as in the previous challenges:
    • Entry level higher
    • Other on-going competitions
  • Top methods in agnostic track as before
    • LS-SVMs and boosted logistic trees
  • Top ranking entries closely followed by CLOP entries showing great advances in model selection.
  • Todo: upgrade CLOP with LS-SVMs and logitboost.
open problems
Open problems

Bridge the gap between theory and practice…

  • What are the best estimators of the variance of CV?
  • What should k be in k-fold?
  • Are other cross-validation methods better than k-fold (e.g bootstrap, 5x2CV)?
  • Are there better “hybrid” methods?
  • What search strategies are best?
  • More than 2 levels of inference?