1 / 29

Predictive modeling competitions

Predictive modeling competitions. making data science a sport. Anthony Goldbloom CEO, Kaggle e-mail anthony.goldbloom@kaggle.com twitter @antgoldbloom. Photo by mikebaird, www.flickr.com/photos/mikebaird. Motivation Why compete? How it works R on Kaggle

Download Presentation

Predictive modeling competitions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Predictive modeling competitions making data science a sport Anthony Goldbloom CEO, Kaggle e-mail anthony.goldbloom@kaggle.com twitter @antgoldbloom Photo by mikebaird, www.flickr.com/photos/mikebaird

  2. Motivation • Why compete? • How it works • R on Kaggle • The Heritage Health Prize

  3. Global competitions Predicting HIV viral load Competition closes 77% 1½ weeks 70.8% State of the art 70%

  4. Crowdsourcing Mismatch between those with data andthose with the skills to analyse it

  5. Countless approaches. Hard to know which will work 5

  6. Additional slides Not MIT, not SAS … UoL?

  7. Tourism Forecasting Competition Forecast Error(MASE) Existing model Aug 9 2 weeks later 1 month later Competition End

  8. Chess Ratings Competition Existing model (ELO) Error Rate(RMSE) Aug 4 1 month later 2 months later Today

  9. Our User Base

  10. Users apply different techniques • neural networks • logistic regression • support vector machine • decision trees • ensemble methods • adaBoost • Bayesian networks • genetic algorithms • random forest • Monte Carlo methods • principal component analysis • Kalman filter • evolutionary fuzzy modeling

  11. Motivation • Why compete? • How it works • R on Kaggle • The Heritage Health Prize

  12. Why Participants Compete 2 1 More fun than Sudoku Clean, Real world data Professional Reputation & Experience 4 3 Interactions with experts in related fields Prizes

  13. Motivation • Why compete? • How it works • R on Kaggle • The Heritage Health Prize

  14. Competitions are judged based on predictive accuracy

  15. Competition Mechanics Competitions are judged on objective criteria

  16. Motivation • Why compete? • How it works • R on Kaggle • The Heritage Health Prize

  17. R on Kaggle

  18. R on Kaggle among academics

  19. R on Kaggle among Americans

  20. Who Uses R and How

  21. Motivation • Why compete? • How it works • R on Kaggle • The Heritage Health Prize

  22. Mmm… how do I put this into R?

  23. Some SQL Magic

  24. Gives us a flat record

  25. Voila, an entry!

  26. What could the world’s bestanalysts find in your data? e-mail anthony.goldbloom@kaggle.com phone +61438400053 Photo by gidzy, www.flickr.com/photos/gidzy

More Related