1 / 20

Text Classification Using Stochastic Keyword Generation

Text Classification Using Stochastic Keyword Generation. Cong Li , Ji-Rong Wen and Hang Li Microsoft Research Asia August 22nd, 2003. Outline. Introduction Text Classification Using Stochastic Keyword Generation Experimental Results Conclusion and Future Work. Introduction

Download Presentation

Text Classification Using Stochastic Keyword Generation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Text Classification Using Stochastic Keyword Generation Cong Li, Ji-Rong Wen and Hang Li Microsoft Research Asia August 22nd, 2003

  2. Outline • Introduction • Text Classification Using Stochastic Keyword Generation • Experimental Results • Conclusion and Future Work • Introduction • Text Classification Using Stochastic Keyword Generation • Experimental Results • Conclusion and Future Work

  3. Introduction • Supervised Text Classification • Question: how to use additional data in training to improve the performance? • New Text Classification Problem • Summaries of texts are available in training, which are more indicative of contents • Note: Summaries are not available in classification • Example: classification at a help desk

  4. Example • Email • When getting emails I get a notice that an email has been received but when I try to view the message it is blank. I have also tried to run the repair program off the install disk but that it did not take care of the problem. • Categories • Empty Outlook Message • Cannot Open Word File • Summary • receive emails; some emails have no subject and message body

  5. Outline • Introduction • Text Classification Using Stochastic Keyword Generation • Experimental Results • Conclusion and Future Work

  6. New Text Classification Problem • Spaces • Users’ emails: space X • Categories: space Y • Engineers’ summaries (for training): space S • Assumption • Summaries are much easier to be classified

  7. Text Classification Using SKG Conventional Text Classification Text Classification Using SKG email: x X When getting emails I get a notice that an email has been received but when I try to view the message it is blank. I have also tried to run the repair program off the install disk but that it did not take care of the problem. email: x X When getting emails I get a notice that an email has been received but when I try to view the message it is blank. I have also tried to run the repair program off the install disk but that it did not take care of the problem. SKG classification probability vector: (x)   (email 0.75, receive 0.68, subject 0.45, body 045, … ) classification category: y Y Empty Outlook Message category: y Y Empty Outlook Message

  8. Stochastic Keyword Generation • Generating Keywords from a Given Text • Stochastic Keyword Generation (SKG) • Generate keywords and their conditional probabilities of occurrence given the text • Example emails 0.75 receive 0.68 subject 0.45 body 0.45       When getting emails I get a notice that an email has been received but when I try to view the message it is blank. I have also tried to run the repair program off the install disk but that it did not take care of the problem. Stochastic Keyword Generation

  9. SKG Model new text x

  10. Model for Each Keyword new text x

  11. Learning Using SKG SKG classification

  12. Outline • Introduction • Text Classification Using Stochastic Keyword Generation • Experimental Results • Conclusion and Future Work

  13. Data in Experiments • Data of the Help Desk of Microsoft • 2517 texts from 52 categories • About 10000 unique words in texts • About 1500 unique words in summaries • Conducted stopword removal, but not stemming • Training/Test Split • 5-fold cross validation

  14. Experimental Settings • Classifiers • Linear SVM (Platt 1998; Dumais et al. 1998) • Perceptron algorithm with margins (PAM) (Li et al. 2002) • Methods • Text classification using SKG • Methods for comparison: • Prior • Texts for training • Summaries for training • (text+summary)s for training • Deterministic keyword generation (DKG)

  15. Experimental Results

  16. SKG versus DKG

  17. Discussion email: x X When getting emails I get a notice that an email has been received but when I try to view the message it is blank. I have also tried to run the repair program off the install disk but that it did not take care of the problem. SKG summary: x X receive emails; some emails have no subject and message body probability vector: (x)   (email 0.75, receive 0.68, subject 0.45, body 045, … ) classification category: y Y Empty Outlook Message

  18. Outline • Introduction • Text Classification Using Stochastic Keyword Generation • Experimental Results • Conclusion and Future Work

  19. Conclusion and Future Work • Conclusion • Text classification using SKG significantly outperforms the methods without using it • Future Work • Theoretical analysis of the problem and the proposed method • Applied in different settings

  20. Thank You

More Related