1 / 8

Mebi 591D – BHI Kaggle Class

Mebi 591D – BHI Kaggle Class. Baselines http://winter2014-mebi591d-kaggleclass.weebly.com/. Baseline (I.). What is a baseline for? (a) a reasonable 1 st approach to your problem (b) meant to be quick and to get system running (c) allow you to see improvements What should be included?

maj
Download Presentation

Mebi 591D – BHI Kaggle Class

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mebi 591D –BHI Kaggle Class Baselines http://winter2014-mebi591d-kaggleclass.weebly.com/

  2. Baseline (I.) • What is a baseline for? • (a) a reasonable 1st approach to your problem • (b) meant to be quick and to get system running • (c) allow you to see improvements • What should be included? • (a) your system should be able to take in any test set and output your prediction • (b) you should be able to give evaluation scores on any test set presented • (c) you should be able to visualize which instances your errors occur in Due in 4 weeks: Start early!

  3. Baseline (II.) • Examples of how to use your baseline • Case 1. Named-entity recognition task, choose to use sequential CRF implementation • Baseline: use unigram features • Further experiments: bigram features, POS, etc • Change to 2-step classification, change tagging

  4. Baseline (II.) • Case 2. Predict stock market price • Baseline: HMM – previous stock price same time • Further experiments: add derivative features, add features from news • Can try several other classifications • Can use some kind of boosting algorithm

  5. Evaluation Metrics (I.) • RECAP from last time --- never evaluate on test set when building your system -- why? • You are cheating! • Overtraining on mistakes and noise (won’t generalize) • Using a development set or cross-validation • A development set is another set you split out just like the test set (~10%) • Used to evaluate • Used for tuning parameters • Cross-validation sets • Split data to N pieces, use N-1 pieces as training, 1 as test, then repeat Nx to get variations of scores

  6. Evaluation Metrics (II.) • Multi-class categorization • Precision, recall, f1-score • AUC curve • Why may these not measure things well? • Class imblance! • Use micro- and macro- definitions • Numeric predictions • RMS-error • Nearest neighbor error

  7. Error analysis • Good to see where your system makes error so you can introduce better features (or a better model) • Good to see where you are getting false positives and false negatives • Confusion matrices for classification are helpful • It’s a (n label)x(n label) matrix where rows/columns represent gold and system predictions • Numbers in matrix represent counts

  8. Tasks • Decide strategy(next week) • What is baseline • How work will be divided • What resources you will use • Baseline system (4 weeks) • Include prediction module • Includes evaluation module • Be able to visualize your errors for error analysis

More Related