1 / 29

A Robust "Black Box" Technique for Pattern Classification

MIT Lincoln Laboratory 244 Wood Street Lexington, MA 02420-9108. A Robust "Black Box" Technique for Pattern Classification. Kelly Wallenstein, John Weatherwax, Virginia Hafer MIT Lincoln Laboratory Presented to Group 32 August 2007. Outline. Problem Introduction Classification Methods

calliet
Download Presentation

A Robust "Black Box" Technique for Pattern Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MIT Lincoln Laboratory 244 Wood Street Lexington, MA 02420-9108 A Robust "Black Box" Technique forPattern Classification Kelly Wallenstein, John Weatherwax, Virginia Hafer MIT Lincoln Laboratory Presented to Group 32 August 2007

  2. Outline • Problem Introduction • Classification Methods • Single Multidimensional Gaussian • Decision Tree • AdaBoost • “Real World” Application: Email Spam • Conclusions

  3. Pattern Classification • Provide set of data and classes • features: size, shape, color, etc • “Train” computer to associate certain features with particular classes • Introduce new data; have computer sort data into classes based on features • Reduce frequency of misclassification Data, Classes Learn New Data Classify Reduce Error Training Testing

  4. Data Sets “Clouds” “Four-Spiral”

  5. Outline • Problem Introduction • Classification Methods • Single Multidimensional Gaussian • Decision Tree • AdaBoost • “Real World” Application: Email Spam • Conclusions

  6. Method 1: Simple Gaussian Quadratic Classification Probability density: Discriminant function: Data Estimate ,  Data Evaluate d(x) >0, Blue <0, Green Training Testing

  7. Method 1: Results Clouds Four-Spiral Error = 0.25 Error = 0.30

  8. Outline • Problem Introduction • Classification Methods • Single Multidimensional Gaussian • Decision Tree • AdaBoost • “Real World” Application: Email Spam • Conclusions

  9. Method 2: Decision Tree

  10. Method 2: Decision Tree

  11. Method 2: Decision Tree Error ~ 0.14 (vs. 0.25) Error ~ 0.12 (vs. 0.3)

  12. Method 2: Results Error ~ 0.14 (vs. 0.25) Error ~ 0.12 (vs. 0.3)

  13. Outline • Problem Introduction • Classification Methods • Single Multidimensional Gaussian • Decision Tree • AdaBoost • “Real World” Application: Email Spam • Conclusions

  14. Method 3: AdaBoost “Meta” Algorithm (nests existing classification algorithm, or “weak learner”) • Train a portion of available data, create classifier • Test on remaining data • Select a new portion of data to train, after applying a higher weight to previously misclassified instances • Train/test on remaining data to create second classifier • Continue creating more classifiers (“boosts”) using weighted data • 6. Use “majority voting” to combine results of all the classifiers

  15. Method 3: AdaBoost • weak learners  strong learner • Trains on increasingly hard-to-classify instances • Avoids overfitting, or “memorizing the data” Image: Polikar, Robi. "Ensemble Based Systems in Decision Making". IEEE Circuits and Systems Magazine 3rd quarter, 2006: 31.

  16. Method 3: AdaBoost with Decision Tree Clouds Error ~ 0.14 (vs. 0.14, 0.25)

  17. Method 3: AdaBoost with Decision Tree Four-Spiral Error ~ 0.06 (vs. 0.12, 0.30)

  18. Method Comparison 0.30 0.25 0.14 0.14 0.12 0.06 Can we improve upon these results?

  19. Bayes Error • The theoretical “minimum” error • You can’t do better than the Bayes error • Due to fundamental overlap in data

  20. Gaussian Mixture Model Data Estimate i, i Data Evaluate d(x) >0, Blue <0, Green Training Testing

  21. Gaussian Mixture Model: Results 0.25 0.14 0.14 0.10 Perfect knowledge of the distributions achieves the Bayes error

  22. Outline • Problem Introduction • Classification Methods • Single Multidimensional Gaussian • Decision Tree • AdaBoost • “Real World” Application: Email Spam • Conclusions

  23. Email Spam Dataset • 4601 emails (39.4% spam, 60.6% non-spam) • 58 Features: • Word/character frequency measures (“money,” “free,” “credit,” “$,” etc) • String length of consecutive capital letters • Number of capital letters in the e-mail • Data from sample email: • 0,0,0,0,1.16,0,0,0,0,0,0,0.58,0,0,0,1.16,0,1.16,1.16,0,1.75,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.133,0,0.667,0,0,1.131,5,69,1 • Spam data set compiled at University of California: Irvine, in 1999

  24. Email Spam: Method Comparison 0.32 0.22 0.17

  25. Conclusions • Important to avoid “overfitting” • Testing error vs. training error • AdaBoost – useful “black box” algorithm • Increases performance for various types of datasets without overfitting • Doesn’t require knowledge of how data was generated (statistically, physically, etc) • Can achieve near-optimal results

  26. Backups

  27. ROC Curves (Have something before this slide to introduce the ROC stuff)

  28. Email Spam: Adaboost Error = 0.17

  29. Email Spam: ROC Curve

More Related