MILESTONE RESULTS Mar. 1
This presentation is the property of its rightful owner.
Sponsored Links
1 / 27

Agnostic Learning vs. Prior Knowledge challenge Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, PowerPoint PPT Presentation


  • 72 Views
  • Uploaded on
  • Presentation posted in: General

MILESTONE RESULTS Mar. 1 st , 2007. Agnostic Learning vs. Prior Knowledge challenge Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see http://www.agnostic.inf.ethz.ch/credits.php. Thanks.

Download Presentation

Agnostic Learning vs. Prior Knowledge challenge Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon,

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Milestone results mar 1st 2c 2007

MILESTONE RESULTS Mar. 1st, 2007

Agnostic Learning

vs.

Prior Knowledge

challenge

Isabelle Guyon, Amir Saffari, Gideon Dror,

Gavin Cawley, Olivier Guyon,

and many other volunteers, see http://www.agnostic.inf.ethz.ch/credits.php


Thanks

Thanks


Agnostic learning vs prior knowledge challenge

Agnostic Learning vs. Prior Knowledge challenge

When everything else fails,

ask for additional domain knowledge…

  • Two tracks:

    • Agnostic learning: Preprocessed datasets in a nice “feature-based” representation, but no knowledge about the identity of the features.

    • Prior knowledge: Raw data, sometimes not in a feature-based representation. Information given about the nature and structure of the data.


Part i

Part I

DATASETS


Datasets

Datasets

Type

Dataset

Validation Examples

Domain

Feat-ures

Training Examples

Test Examples

Dense

ADA

415

Marketing

48

4147

41471

Dense

GINA

315

Digits

970

3153

31532

Dense

HIVA

384

Drug discovery

1617

3845

38449

Sparse binary

NOVA

175

Text classif.

16969

1754

17537

Dense

SYLVA

1308

Ecology

216

13086

130858

http://www.agnostic.inf.ethz.ch


Milestone results mar 1st 2c 2007

ADA

ADA is the marketing database

  • Task: Discover high revenue people from census data. Two-class pb.

  • Source: Census bureau, “Adult” database from the UCI machine-learning repository.

  • Features: 14 original attributes including age, workclass,  education, education, marital status, occupation, native country. Continuous, binary and categorical features.


Milestone results mar 1st 2c 2007

GINA

GINA is the digit database

  • Task: Handwritten digit recognition. Separate the odd from the even digits. Two-class pb. with heterogeneous classes.

  • Source: MNIST database formatted by LeCun and Cortes.

  • Features: 28x28 pixel map.


Milestone results mar 1st 2c 2007

HIVA

HIVA is the HIV database

  • Task: Find compounds active against the AIDS HIV infection. We brought it back to a two-class pb. (active vs. inactive), but provide the original labels (active, moderately active, and inactive).

  • Data source: National Cancer Inst.

  • Data representation: The compounds are represented by their 3d molecular structure.


Milestone results mar 1st 2c 2007

NOVA

Subject: Re: Goalie masksLines: 21Tom Barrasso wore a great mask, one time, last season.  He unveiled it at a game in Boston.  It was all black, with Pgh city scenes on it. The "Golden Triangle" graced the top, alongwith a steel mill on one side and the Civic Arena on the other.   On the back of the helmet was the old Pens' logo the current (at the time) Penslogo, and a space for the "new" logo.A great mask done in by a goalie's superstition.Lori

NOVA is the text classification database

  • Task: Classify newsgroup emails into politics or religion vs. other topics.

  • Source: The 20-Newsgroup dataset from in the UCI machine-learning repository.

  • Data representation : The raw text with an estimated 17000 words of vocabulary.


Sylva

SYLVA

SYLVA is the ecology database

  • Task: Classify forest cover types into Ponderosa pine vs. everything else.

  • Source: US Forest Service (USFS).

  • Data representation: Forest cover type for 30 x 30 meter cells encoded with 108 features (elavation, hill shade, wilderness type, soil type, etc.)


Part ii

Part II

PROTOCOL and SCORING


Protocol

Protocol

  • Data split: training/validation/test.

  • Data proportions: 10/1/100.

  • Online feed-back on validation data (1st phase).

  • Validation labels released in February, 2007.

  • Challenge prolonged until August 1st, 2007.

  • Final ranking on test data using the five last complete submissions for each entrant.


Performance metrics

Performance metrics

  • Balanced Error Rate (BER): average of error rates of positive class and negative class.

  • Area Under the ROC Curve (AUC).

  • Guess error (for the performance prediction challenge only):

    dBER = abs(testBER – guessedBER)


Ranking

Ranking

  • Compute an overall score:

    • For each dataset, regardless of the track, rank all the entries with “test BER”. Score=entry_rank/max_rank.

    • Overall_score=average score over datasets.

  • Keep only the last five complete entries of each participant, regardless of track.

  • Individual dataset ranking: For each dataset, make one ranking for each track using “test BER”.

  • Overall ranking: Rank the entries separately in each track with their overall score. Entries having “prior knowledge” results for at least one dataset are entered in the “prior knowledge” track.


Part iii

Part III

RESULT ANALYSIS


Challenge statistics

Challenge statistics

  • Date started: October 1st, 2006.

  • Milestone (NIPS 06): December 1st, 2006

  • Milestone: March 1st, 2007

  • End: August 1st, 2007

  • Total duration: 10 months.

  • Five last complete entries ranked (Aug 1st):

    • Total ALvsPK challenge entrants: 37.

    • Total ALvsPK development entries: 1070.

    • Total ALvsPK complete entries: 90 prior + 167 agnos.

    • Number of ranked participants: 13 (prior), 13 (agnos).

    • Number of ranked submissions: 7 prior + 12 agnos


Learning curves

Best entry performance, IJCNN06 challenge

0.5

0.45

0.4

0.35

HIVA

0.3

BER

0.25

0.2

ADA

0.15

0.1

NOVA

GINA

0.05

SYLVA

0

0

20

40

60

80

100

120

140

160

Time (days)

Learning curves


Learning curves1

Best entry performance, IJCNN07 challenge

0.3

HIVA

0.25

0.2

ADA

BER

0.15

0.1

NOVA

0.05

GINA

SYLVA

0

5

0

10

Time (months)

Learning curves


Ber distribution

BER distribution

Agnostic learning

Prior knowledge

The black vertical line indicates the best ranked entry (only the 5 last entry of each participant were ranked). Beware of overfitting!


Final al results

Final AL results

Agnostic learning best ranked entries as of August 1st, 2007

Best ave. BER still held by Reference (Gavin Cawley) with “the bad”. Note that the best entry for each dataset is not necessarily the best entry overall.


Method comparison ppc

Method comparison (PPC)

Agnostic track no significant improvement so far

dBER

Test BER


Ls svm

LS-SVM

Gavin Cawley, July 2006


Logitboost

Logitboost

Roman Lutz, July 2006


Final pk results

Final PK results

Prior knowledge best ranked entries as of August 1st, 2007

Best ave. BER held by Reference (Gavin Cawley) with “interim all prior”.

Louis Duclos-Gosselin is second on ADA with Neural Network13, and S. Joshua Swamidass second on HIVA, but they are not entered in the table because he did not submit a complete entry.

The overall entry ranking is performed with the overall score (average rank over all datasets). The best performing complete entry may not contain all the best performing entries on the individual datasets.

We indicate the ranks of the “prior” entries only for individual datasets.


Al vs pk who wins

AL vs. PK, who wins?

We compare the best results of the ranked entries for entrants who entered both tracks. If the Agnostic Learning BER is larger than the Prior Knowledge BER, “1” is shown in the table. The sign test is not powerful enough to reveals a significant advantage of PK or AL.


Progress

Progress?

  • On ADA and NOVA, the best results obtained by the participants is in the agnostic track! But it is possible to do better with prior knowledge: on ADA, the PK winner has a worse AL entry; the PK best reference entry yields best results on NOVA.

  • On GINA and SYLVA, significantly better results are achieved in the prior knowledge track and all but one participant who entered both tracks did better with PK.

  • On HIVA, experts achieve significantly better results with prior knowledge, but non-experts entering both tracks do worse in the PK track.


Conclusion

Conclusion

  • PK wins, but not by a huge margin. Improving performances using PK is not that easy!

  • AL using fairly simple low level features is a fast way of getting hard-to-beat results.

  • The website will remain open for post-challenge entries http://www.agnostic.inf.ethz.ch.


  • Login