1 / 37

Statistical Learning Methods in Natural Language Processing

中国科学院计算技术研究所. Statistical Learning Methods in Natural Language Processing. Hang Li Microsoft Research Asia Nov. 29, 2002. Talk Outline. MDL Principle Lexical Knowledge Acquisition Using MDL Principle Text Mining Using MDL Principle Information Extraction based on Active Learning.

Download Presentation

Statistical Learning Methods in Natural Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 中国科学院计算技术研究所 Statistical Learning Methods inNatural Language Processing Hang Li Microsoft Research Asia Nov. 29, 2002

  2. Talk Outline • MDL Principle • Lexical Knowledge Acquisition Using MDL Principle • Text Mining Using MDL Principle • Information Extraction based on Active Learning

  3. 1.MDL Principle

  4. Statistical Learning and Prediction Learning System Prediction System

  5. Elements of Statistical Estimation • Model • Strategy (Criterion) • Algorithm

  6. Example of Model Estimation Data Model1 Bernoulli Model Model2 Mixture Model Question: What Criterion Should We Employ?

  7. Shannon’s Information Theory Data Distribution Code Length Average Probability Distribution = Compact Code

  8. Minimum Description Length Principle • MDL Principle: Selecting Model with Minimum Code Length • Minimum Code Length:

  9. Example of Description Length Data Bernoulli

  10. MDL:Trade-off Relationship L Lm+Ld Lm Ld M

  11. 2. Lexical Knowledge Acquisition

  12. Lexical Knowledge Acquisition fly arg1 bird fly arg1 swallow fly arg1 bee fly arg1 bird Knowledge Learning System Prediction System fly arg1 crow ?

  13. Problems • Model ? • Criterion (Strategy) ? ← MDL • Algorithm ?

  14. Case Slot Model Word-based Model Class-based Model

  15. Example Partition crow crow swallow swallow bug bug bird bird eagle eagle bee bee insect insect crow swallow bug bird eagle crow crow swallow bee swallow bug insect bird bug bird eagle eagle bee bee insect insect

  16. Example Thesaurus ANIMAL INSECT BIRD swallow crow bug bee insect eagle bird

  17. Tree Cut ANIMAL INSECT BIRD swallow crow bug bee insect eagle bird

  18. Efficient Algorithm Dynamic Programming

  19. Experimental Results TOP <entity> <abstraction> 0.11 0.10 <life_form> <object> <quantity> <time> 0.08 <plant> <animal> <substance> <artifact> • Penn Tree Bank Data • Object of Eat 0.39 <solid> <fluid> <food>

  20. Demo Lexical Knowledge Acquisition

  21. 3. Text Mining

  22. Text Mining Questionnaire Data: Car Brand Images Closed Answer Open Answer How to mine Open Answers ?

  23. Rule Analysis Accurate extraction of image characteristics for individual car types Characteristics of Car A Comfortable Less luxury Easy to drive expensive Characteristics Of Car B Not reliable Fast & Stylish

  24. Using MDL w w w w w T N N T N T N N T T T 1001010011 w=1 w=0 T 10111 T 01000

  25. Ex.1 Rules for Car A Condition Score Freq./Total Freq. `for ordinary people’ 4.459 7/7 `X, LTD’ 4.523 6/6 `simplicity’ 3.017 4/6 `traditional’ 3.030 4/4 `Japan’ 3.061 3/4 `common-people’ 3.093 3/3 `middle-class’ 3.126 3/3 `earnest’ 3.159 2/2 class&common people’ 3.194 2/2 `general’ 1.919 4/8

  26. Condition Score Freq./TotalFreq. Ex.2 Rules for Car B `outdoor’ 10.325 5/5 `Z, LTD’ 8.132 4/4 `mobility’ 6.527 8/14 `fast’ 5.694 3/3 `run’ 5.057 2/2 `work’ 3.341 2/2 `road’ 3.380 2/2 `enjoyableness’ 3.420 2/2 `boring’ 3.438 3/4 `sporty’ 1.891 2/3

  27. Graphic User Interface Rule Analysis A 車              Search Function Displaying Data

  28. 4. Information Extraction

  29. Information Extraction • One Setting: Information Extraction = Classification • Our Goal: • Help user to build task-specific information extraction system • Minimize user efforts in data annotation • Solution: Active Perceptron

  30. Active Perceptron Text Data Active Learing Feature Extraction Perceptron with Margin Extraction Model Perceptron with Margin Using Active Learning ?

  31. Related Work • Perceptron with Margin (Krauth and Mezard) • Performance: comparable with SVM in text classification • Easy to implement • Much faster than SVM • Active Learning • Parsing (Tang et al.) • FSA learning (Angluin)

  32. Case Study: Site Question Answering • Site Question Answering: Dr. What • Answering “what is X” question at microsoft.com • Information Extraction • Extract definitions from web pages

  33. Case Study: Dr. What Active Perceptron Active learning Annotation about 400 paragraphs About 3300 paragraphs from MS Search of several keywords Extraction Model Definitions of about 10000 terms 150000 web pages downloaded from microsoft.com

  34. Demo Active Perceptron and Dr. What

  35. Dr. What • Performance of Perceptron with Margin • 3300 examples, 2640 for training, 660 for testing • Precision: 70.03% • Recall: 38.09% • Performance of Active Learning • Reach the optimal performance with annotation of 400 examples • Performance of Dr. What • Human evaluation on the definitions of 2800 terms extracted • Top 1 Precision: 76.97% • Top 3 Precision: 77.78%

  36. Summary of Talk • Lexical Knowledge Acquisition Using MDL • Text Mining Using MDL • Information Extraction based on Active Learning

  37. References • Hang Li and Naoki Abe, Generalizing Case Frames Using a Thesaurus and the MDL Principle, Computational Linguistics 24(2), 217-244 (1998). • Hang Li and Kenji Yamanishi, Mining from Open Answers in Questionnaire Data, Proc. of ACM-KDD’01, 43-449, (2001).

More Related