1 / 31

KDD-Cup 2004

KDD-Cup 2004. Chairs: Rich Caruana & Thorsten Joachims Web Master: Lars Backstrom Cornell University. KDD-Cup Tasks. Goal: Optimize learning for different performance metrics Task1: Particle Physics Accuracy Cross-Entropy ROC Area SLAC Q-Score Task2: Protein Matching Squared Error

rbosworth
Download Presentation

KDD-Cup 2004

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. KDD-Cup 2004 Chairs: Rich Caruana & Thorsten Joachims Web Master: Lars Backstrom Cornell University

  2. KDD-Cup Tasks • Goal: Optimize learning for different performance metrics • Task1: Particle Physics • Accuracy • Cross-Entropy • ROC Area • SLAC Q-Score • Task2: Protein Matching • Squared Error • Average Precision • Top 1 • Rank of Last

  3. Competition Participation • Timeline • April 28: tasks and datasets available • July 14: submission of predictions • Participation • 500+ registrants/downloads • 102 teams submitted predictions • Physics: 65 submissions • Protein: 59 submissions • Both: 22 groups • Demographics • Registrations from 49 Countries (including .com) • Winners from China, Germany, India, New Zealand, USA • Winners half from companies, half from universities

  4. Task 1: Particle Physics • Data contributed by Charles Young et al, SLAC (Stanford Linear Accelerator) • Binary classification: distinguishing B from B-Bar particles • Balanced: 50-50 B/B-Bar • 78 features (most real-valued) describing track • Some missing values • Train: 50,000 cases • Test: 100,000 cases

  5. Task 1: Particle Physics Metrics • 4 performance metrics: • Accuracy: had to specify threshold • Cross-Entropy: probabilistic predictions • ROC Area: only ordering is important • SLAC Q-Score: domain-specific performance metric from SLAC • Participants submit separate predictions for each metric • About half of participants submitted different predictions for different tasks • Winner submitted four sets of predictions, one for each task • Calculate performance using PERF software we provided to participants

  6. Determining the Winners • For each performance metric • Calculate performance using same PERF software available to participants • Rank participants by performance • Honorable mention for participant ranked first • Overall winner is participant with best average rank across all metrics

  7. and the winners are…

  8. Task 1: Physics Winners Christophe Lambert (Golden Helix Inc.): 3rd place overall (out of 65) Lalit Wangikar et al. (Inductis Inc.): 2nd place overall, HM Acc David Vogels et al. (MEDai Inc./University of Central Florida): 1st place overall, HM ROC, HM Cross-Entropy, HM SLQ

  9. Bootstrap Analysis of Results • How much does selection of winner depend on specific test set (100k)? • Algorithm: • Repeat many times: • Take 100k bootstrap sample (with replacement) from test set • Evaluate performance on bootstrap sample and re-rank participants • What is probability of winning/placing?

  10. Physics Winners: Bootstrap Analysis • 1000 bootstrap samples

  11. Physics: Full Table of Results

  12. Task 2: Protein Matching • Data contributed by Ron Elber, Cornell University • Finding homologous proteins (structural similarity) • 74 real-valued features describing match between two proteins • Data comes in blocks • Unbalanced: typically < 10 homologs (+) per block of 1000 • Train: 153 Proteins (145,751 cases) • Test: 150 Proteins (139,658 cases)

  13. Task 2: Protein Matching Metrics • Four performance metrics: • Mean Squared Error: probabilistic predictions • Mean Average Precision: only ordering within each block is important • Mean Top 1: best predicted match is true homolog in each block • Mean Rank of Last: finding all homologs • Again participants submitted separate predictions for each metric • Again, about half of participants submitted multiple sets of predictions • 19/20 top participants submitted multiple sets of predictions • Optimizing to each metric separately helped more on Protein than on Physics

  14. Task 2: Protein Winners Katharina Morik et al. (University of Dortmund): HM Rank Last David Vogel et al. (Aimed / University of Central Florida): 3rd place overall, HM Top1 Yan Fu et al. (Inst. of Comp. Tech., Chinese Academy of Sci.): 2nd place overall, HM Squared Error, HM Average Precision Bernhard Pfahringer (University of Waikato): 1st place overall

  15. Protein Winners: Bootstrap Analysis • 10,000 bootstrap samples

  16. Protein: Full Table of Results

  17. Does Optimizing to Each Metric Help? • About half of participants submitted different predictions for each metric • Among winners: • Some evidence that top performers benefit from optimizing to each metric • Some metrics incompatible: e.g., optimizing to APR hurts RMS

  18. PHYSICS Submitted For: PROTEINSubmitted For: ACC: APR: CXE: RKL: ROC: RMS: SLQ: TOP1: ACC APR +9,-9 +14,-10 +6,-7 +14,-15 +8,-8 +5,-11 CXE RKL +4,-16 +6,-18 +0,-17 +1,-18 +3,-12 +7,-19 Tested On: Tested On: ROC RMS +1,-27 +4,-9 +8,-9 +1,-28 +8,-8 +2,-29 TOP1 SLQ +4,-12 +6,-6 +8,-7 +13,-9 +5,-9 +12,-16 Physics: +67,-125 Biology: +82,-204 Did Groups Effectively Optimize to Different Measures? • Score predictions for one measure using the other measures.

  19. Did Groups Effectively Optimize to Different Measures? • How often did a submission for another measure perform better? • Do not count screw-ups and invalid predictions • Count only those predictions, where the rank stays within a window of  (x-axis) • Count only the groups in the top 40 Physics Protein

  20. Did Good Groups Benefit more than Bad Groups? • How often did a submission for another measure perform better? • Do not count screw-ups and invalid predictions • Count only those predictions, where the rank stays within a window of  =10 • Count only the groups in the top k (x-axis) Physics Protein

  21. How Big is the Benefit? • How much does swapping predictions change rank? • Count only those predictions, where the rank stays within a window of  (x-axis) • Count only the groups in the top 40 Physics Protein

  22. How Much did Predictions Differ Between Groups? • Fit MDS to Euclidian Distance between Prediction Vectors • Top 30 Groups MDS PlotPhysics, RMSE MDS PlotProtein, APR

  23. The Easy, the Difficult, and the Impossible • How often do the competitors agree on a classification? • X-Axis: number of competitors • Y-Axis: percentage of test examples x competitors classified correctly Physics AccuracyTop 10 Physics AccuracyTop 30

  24. The Easy and the Impossible • How often does everybody agree? • X-Axis: number of competitors from the top • Y-Axis: percentage of test examples everybody classified correctly / incorrectly Physics AccuracyEverybody Incorrect Physics AccuracyEverbody Correct

  25. How to Win KDD-Cup 2005: Collaborate • Ensemble that averages predictions of best participants

  26. How to Win KDD-Cup 2005: Collaborate • Ensemble that averages predictions of best participants

  27. Lessons Learned • Use WWW site for organizing competition. • Data and all results still available online • Approx. 400 new registrations since end of competition (used in courses, papers, research) • Registration process that provides anonymity, but allows tracking • Selection of suitable tasks • Sample size large enough, so that evaluation statistically reliable • But small enough so that tractable for most methods • Two tasks: one traditional, one that required non-standard techniques • Well-defined evaluation criteria, if possible • Automation if possible • Provide evaluation software for download (PERF software) • Automatic format and plausibility checking of submissions • Crucial team members: • Web Master++: Lars Backstrom (Cornell) • Data Providers: Charles Young (SLAC), Ron Elber (Cornell) • PERF: Alex Niculescu (Cornell), Filip Radlinski (Cornell), Claire Cardie (Cornell), …participants who found bugs: Chinese Academy of Sciences, University of Dortmund • Who is interested in results? • Data providers get connected with Data Mining experts • Data Mining community • Regulate exploitation by Winners: the “Vogel Effect” • Affiliated with conference, program, organization ...?

  28. Closing • Data and all results available online:http://kodiak.cs.cornell.edu/kddcup • PERF software download: http://www.cs.cornell.edu/~caruana • Thanks to: • Web Master++: Lars Backstrom (Cornell) • Physics Data: Charles Young (SLAC) • Protein Data: Ron Elber (Cornell) • PERF: Alex Niculescu (Cornell), Filip Radlinski (Cornell), Claire Cardie (Cornell), … • Thanks to participants who found bugs in the PERF software: • Chinese Academy of Sciences • University of Dortmund • And of course, thanks to everyone who participated!

  29. The Contest Goes On Physics Protein

More Related