1 / 68

Semi-Supervised Learning & Summary

Semi-Supervised Learning & Summary. Advanced Statistical Methods in NLP Ling 572 March 8, 2012. Roadmap. Semi-supervised learning: Motivation & perspective Yarowsky’s model Co-training Summary . Semi-supervised Learning. Motivation. Supervised learning:. Motivation.

kostya
Download Presentation

Semi-Supervised Learning & Summary

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012

  2. Roadmap • Semi-supervised learning: • Motivation & perspective • Yarowsky’s model • Co-training • Summary

  3. Semi-supervised Learning

  4. Motivation • Supervised learning:

  5. Motivation • Supervised learning: • Works really well • But need lots of labeled training data • Unsupervised learning:

  6. Motivation • Supervised learning: • Works really well • But need lots of labeled training data • Unsupervised learning: • No labeled data required, but • May not work well, may not learn desired distinctions

  7. Motivation • Supervised learning: • Works really well • But need lots of labeled training data • Unsupervised learning: • No labeled data required, but • May not work well, may not learn desired distinctions • E.g. Unsupervised parsing techniques • Fits data, but doesn’t correspond to linguistic intuition

  8. Solution • Semi-supervised learning:

  9. Solution • Semi-supervised learning: • General idea: • Use a small amount of labeled training data

  10. Solution • Semi-supervised learning: • General idea: • Use a small amount of labeled training data • Augment with large amount of unlabeled training data • Use information in unlabeled data to improve models

  11. Solution • Semi-supervised learning: • General idea: • Use a small amount of labeled training data • Augment with large amount of unlabeled training data • Use information in unlabeled data to improve models • Many different semi-supervised machine learners • Variants of supervised techniques: • Semi-supervised SVMs, CRFs, etc

  12. Solution • Semi-supervised learning: • General idea: • Use a small amount of labeled training data • Augment with large amount of unlabeled training data • Use information in unlabeled data to improve models • Many different semi-supervised machine learners • Variants of supervised techniques: • Semi-supervised SVMs, CRFs, etc • Bootstrapping approaches • Yarowsky’s method, self-training, co-training

  13. There are more kinds of plants and animals in the rainforests than anywhere else on Earth. Over half of the millions of known species of plants and animals live in the rainforest. Many are found nowhere else. There are even plants and animals in the rainforest that we have not yet discovered. Biological Example The Paulus company was founded in 1938. Since those days the product range has been the subject of constant expansions and is brought up continuously to correspond with the state of the art. We’re engineering, manufacturing and commissioning world- wide ready-to-run plants packed with our comprehensive know-how. Our Product Range includes pneumatic conveying systems for carbon, carbide, sand, lime andmanyothers. We use reagent injection in molten metal for the… Industrial Example Label the First Use of “Plant”

  14. Word Sense Disambiguation • Application of lexical semantics • Goal: Given a word in context, identify the appropriate sense • E.g. plants and animals in the rainforest • Crucial for real syntactic & semantic analysis

  15. Word Sense Disambiguation • Application of lexical semantics • Goal: Given a word in context, identify the appropriate sense • E.g. plants and animals in the rainforest • Crucial for real syntactic & semantic analysis • Correct sense can determine • .

  16. Word Sense Disambiguation • Application of lexical semantics • Goal: Given a word in context, identify the appropriate sense • E.g. plants and animals in the rainforest • Crucial for real syntactic & semantic analysis • Correct sense can determine • Available syntactic structure • Available thematic roles, correct meaning,..

  17. Disambiguation Features • Key: What are the features?

  18. Disambiguation Features • Key: What are the features? • Part of speech • Of word and neighbors • Morphologically simplified form • Words in neighborhood • Question: How big a neighborhood? • Is there a single optimal size? Why? • (Possibly shallow) Syntactic analysis • E.g. predicate-argument relations, modification, phrases • Collocation vs co-occurrence features • Collocation: words in specific relation: p-a, 1 word +/- • Co-occurrence: bag of words..

  19. WSD Evaluation

  20. WSD Evaluation • Ideally, end-to-end evaluation with WSD component • Demonstrate real impact of technique in system • Difficult, expensive, still application specific

  21. WSD Evaluation • Ideally, end-to-end evaluation with WSD component • Demonstrate real impact of technique in system • Difficult, expensive, still application specific • Typically, intrinsic, sense-based • Accuracy, precision, recall • SENSEVAL/SEMEVAL: all words, lexical sample

  22. WSD Evaluation • Ideally, end-to-end evaluation with WSD component • Demonstrate real impact of technique in system • Difficult, expensive, still application specific • Typically, intrinsic, sense-based • Accuracy, precision, recall • SENSEVAL/SEMEVAL: all words, lexical sample • Baseline: • Most frequent sense • Topline: • Human inter-rater agreement: 75-80% fine; 90% coarse

  23. Minimally Supervised WSD • Yarowsky’s algorithm (1995) • Bootstrapping approach: • Use small labeled seedset to iteratively train

  24. Minimally Supervised WSD • Yarowsky’s algorithm (1995) • Bootstrapping approach: • Use small labeled seedset to iteratively train • Builds on 2 key insights: • One Sense Per Discourse • Word appearing multiple times in text has same sense • Corpus of 37232 bass instances: always single sense

  25. Minimally Supervised WSD • Yarowsky’s algorithm (1995) • Bootstrapping approach: • Use small labeled seedset to iteratively train • Builds on 2 key insights: • One Sense Per Discourse • Word appearing multiple times in text has same sense • Corpus of 37232 bass instances: always single sense • One Sense Per Collocation • Local phrases select single sense • Fish -> Bass1 • Play -> Bass2

  26. Yarowsky’s Algorithm • Training Decision Lists • 1. Pick Seed Instances & Tag

  27. Yarowsky’s Algorithm • Training Decision Lists • 1. Pick Seed Instances & Tag • 2. Find Collocations: Word Left, Word Right, Word +K

  28. Yarowsky’s Algorithm • Training Decision Lists • 1. Pick Seed Instances & Tag • 2. Find Collocations: Word Left, Word Right, Word +K • (A) Calculate Informativeness on Tagged Set, • Order:

  29. Yarowsky’s Algorithm • Training Decision Lists • 1. Pick Seed Instances & Tag • 2. Find Collocations: Word Left, Word Right, Word +K • (A) Calculate Informativeness on Tagged Set, • Order: • (B) Tag New Instances with Rules

  30. Yarowsky’s Algorithm • Training Decision Lists • 1. Pick Seed Instances & Tag • 2. Find Collocations: Word Left, Word Right, Word +K • (A) Calculate Informativeness on Tagged Set, • Order: • (B) Tag New Instances with Rules • (C) Apply 1 Sense/Discourse • (D)

  31. Yarowsky’s Algorithm • Training Decision Lists • 1. Pick Seed Instances & Tag • 2. Find Collocations: Word Left, Word Right, Word +K • (A) Calculate Informativeness on Tagged Set, • Order: • (B) Tag New Instances with Rules • (C) Apply 1 Sense/Discourse • (D) If Still Unlabeled, Go To 2

  32. Yarowsky’s Algorithm • Training Decision Lists • 1. Pick Seed Instances & Tag • 2. Find Collocations: Word Left, Word Right, Word +K • (A) Calculate Informativeness on Tagged Set, • Order: • (B) Tag New Instances with Rules • (C) Apply 1 Sense/Discourse • (D) If Still Unlabeled, Go To 2 • 3. Apply 1 Sense/Discourse

  33. Yarowsky’s Algorithm • Training Decision Lists • 1. Pick Seed Instances & Tag • 2. Find Collocations: Word Left, Word Right, Word +K • (A) Calculate Informativeness on Tagged Set, • Order: • (B) Tag New Instances with Rules • (C) Apply 1 Sense/Discourse • (D) If Still Unlabeled, Go To 2 • 3. Apply 1 Sense/Discourse • Disambiguation: First Rule Matched

  34. Yarowsky Decision List

  35. Iterative Updating

  36. There are more kinds of plants and animals in the rainforests than anywhere else on Earth. Over half of the millions of known species of plants and animals live in the rainforest. Many are found nowhere else. There are even plants and animals in the rainforest that we have not yet discovered. Biological Example The Paulus company was founded in 1938. Since those days the product range has been the subject of constant expansions and is brought up continuously to correspond with the state of the art. We’re engineering, manufacturing and commissioning world- wide ready-to-run plants packed with our comprehensive know-how. Our Product Range includes pneumatic conveying systems for carbon, carbide, sand, lime andmanyothers. We use reagent injection in molten metal for the… Industrial Example Label the First Use of “Plant”

  37. Sense Choice With Collocational Decision Lists • Create Initial Decision List • Rules Ordered by

  38. Sense Choice With Collocational Decision Lists • Create Initial Decision List • Rules Ordered by • Check nearby Word Groups (Collocations) • Biology: “Animal” in + 2-10 words • Industry: “Manufacturing” in + 2-10 words

  39. Sense Choice With Collocational Decision Lists • Create Initial Decision List • Rules Ordered by • Check nearby Word Groups (Collocations) • Biology: “Animal” in + 2-10 words • Industry: “Manufacturing” in + 2-10 words • Result: Correct Selection • 95% on Pair-wise tasks

  40. Self-Training • Basic approach: • Start off with small labeled training set • Train a supervised classifier with the training set • Apply new classifier to residual unlabeled training data • Add ‘best’ newly labeled examples to labeled training • Iterate

  41. Self-Training • Simple – right?

  42. Self-Training • Simple – right? • Devil in the details: • Which instances are ‘best’ to add?

  43. Self-Training • Simple – right? • Devil in the details: • Which instances are ‘best’ to add? • Highest confidence? • Probably accurate, but • Probably add little new information to classifier

  44. Self-Training • Simple – right? • Devil in the details: • Which instances are ‘best’ to add? • Highest confidence? • Probably accurate, but • Probably add little new information to classifier • Most different? • Probably adds information, but • May not be accurate • Use most different, highly confident instances

  45. Co-Training • Blum & Mitchell, 1998 • Basic intuition: “Two heads are better than one”

  46. Co-Training • Blum & Mitchell, 1998 • Basic intuition: “Two heads are better than one” • Ensemble classifier: • Uses results from multiple classifiers

  47. Co-Training • Blum & Mitchell, 1998 • Basic intuition: “Two heads are better than one” • Ensemble classifier: • Uses results from multiple classifiers • Multi-view classifier: • Uses different views of data – feature subsets • Ideally, views should be: • Conditionally independent • Individually sufficient – enough information to learn

  48. Co-training Set-up • Create two views of data: • Typically partition feature set by type • E.g. predicting speech emphasis • View 1: Acoustics: loudness, pitch, duration • View 2: Lexicon, syntax, context

  49. Co-training Set-up • Create two views of data: • Typically partition feature set by type • E.g. predicting speech emphasis • View 1: Acoustics: loudness, pitch, duration • View 2: Lexicon, syntax, context • Some approaches use learners of different types • In practice, views may not truly be conditionally indep. • But often works pretty well anyway

  50. Co-training Approach • Create small labeled training data set • Train two (supervised) classifiers on current training • Using different views • Use two classifiers to label residual unlabeled instances • Select ‘best’ newly labeled data to add to training data* • Adding instances labeled by C1 to training data for C2, v.v. • Iterate

More Related