1 / 20

Active Learning to Classify Email

Active Learning to Classify Email. 4/22/05. What’s the problem?. How will I ever sort all these new emails?. What’s the problem?. To get an idea of what mail I have gotten, I will need to sort these new messages.

atalo
Download Presentation

Active Learning to Classify Email

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Active Learningto Classify Email 4/22/05

  2. What’s the problem? • How will I ever sort all these new emails?

  3. What’s the problem? • To get an idea of what mail I have gotten, I will need to sort these new messages. • A great solution would be if I could sort just a few and my computer could sort the rest for me. • To make it really accurate, the assistant could even pick which messages I should manually sort, so that it can learn to do the best job possible. (Active Learning)

  4. What’s the solution? • To solve this problem, we need a way to choose the most informative training examples. • This requires some way of sorting emails by how informative they are for classification.

  5. Email Classification • So, what do we know about email classification? • SVM and Naïve Bayes significantly outperform many other methods (Brutlag 2000, Kiritchenko 2001) • Both SVM and Naïve Bayes are suitable for “online” learning required for solving this problem effectively. (Cauwenberghs 2000) • Classifier accuracy varies more between users than between algorithms. (Kiritchenko 2001) • SVM performs better for users with more email in each folder. (Brutlag 2000) • Users with more email, such as in our example problem, tend to have more email in each folder than other users. (Klimt 2004) • Thus, we have chosen SVM as the basis for this research.

  6. “Bag-of-Words” Model classification decision email data “bag of words” SVM

  7. Multiple SVMs • Using separate SVMs for each section LLSF classification decision email data SVMs

  8. Active Learning with SVM • In general, examples closer to the decision boundary hyperplane will cause larger displacement of that boundary. (Schohn and Cohn 2000, Tong 2001)

  9. Labeling the closer example: Labeling the farther example: What if our prediction is right?

  10. Picking the closer example: Picking the farther example: And if our prediction is wrong?

  11. Incorporating Diversity • In this example, the instance near the top is intuitively more likely to be informative. • This is known as “diversity” (Brinker 2003).

  12. Active Learning with SVM • But what about when you have multiple SVMs (like one-vs-rest)? (Yan 2003)

  13. The Enron Corpus • 150+ users • 200,000 emails

  14. Initial Results • Trained on 10%, Tested on 90%

  15. Chrono-Diverse Algorithm • The way a user sorts email changes over time. • Pick training data that are maximally different from previous data with respect to time.

  16. Combination Algorithm • Combine strengths of Standard and Chrono-Diverse. • Take a weighted combination of their results. • Adjust weighting with parameter lambda.

  17. Results • Trained on 10%, Tested on 90%

  18. Parameter Tuning

  19. Conclusions • State-of-the-art algorithm for active learning with text classification performs horribly on email data! • Choosing emails for time diversity works very well. • Combining the two works best.

  20. Future Work • Improve the efficiency of SVM or find a better alternative • Determine when using chronological diversity performs best and worst • Adapt the algorithm to online classification

More Related