1 / 93

MaxEnt POS Tagging

MaxEnt POS Tagging. Shallow Processing Techniques for NLP Ling570 November 21, 2011. Roadmap. MaxEnt POS Tagging Features Beam Search vs Viterbi Named Entity Tagging. MaxEnt Feature Template. Words: Current word: w 0 Previous word: w -1 Word two back: w -2 Next word: w +1

aneko
Download Presentation

MaxEnt POS Tagging

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MaxEnt POS Tagging Shallow Processing Techniques for NLP Ling570 November 21, 2011

  2. Roadmap • MaxEnt POS Tagging • Features • Beam Search • vs Viterbi • Named Entity Tagging

  3. MaxEnt Feature Template • Words: • Current word: w0 • Previous word: w-1 • Word two back: w-2 • Next word: w+1 • Next next word: w+2 • Tags: • Previous tag: t-1 • Previous tag pair: t-2t-1 • How many features? 5|V|+|T|+|T|2

  4. Representing Orthographic Patterns • How can we represent morphological patterns as features? • Character sequences • Which sequences? Prefixes/suffixes • e.g. suffix(wi)=ing or prefix(wi)=well • Specific characters or character types • Which? • is-capitalized • is-hyphenated

  5. MaxEnt Feature Set

  6. Examples • well-heeled: rare word

  7. Examples • well-heeled: rare word JJ prevW=about:1 prev2W=stories:1 nextW=communities:1 next2W=and:1 pref=w:1 pref=we:1 pref=wel:1 pref=well:1 suff=d:1 suff=ed:1 suff=led:1 suff=eled:1 is-hyphenated:1 preT=IN:1 pre2T=NNS-IN:1

  8. Finding Features • In training, where do features come from? • Where do features come from in testing?

  9. Finding Features • In training, where do features come from? • Where do features come from in testing? • tag features come from classification of prior word

  10. Sequence Labeling

  11. Sequence Labeling • Goal: Find most probable labeling of a sequence • Many sequence labeling tasks • POS tagging • Word segmentation • Named entity tagging • Story/spoken sentence segmentation • Pitch accent detection • Dialog act tagging

  12. Solving Sequence Labeling

  13. Solving Sequence Labeling • Direct: Use a sequence labeling algorithm • E.g. HMM, CRF, MEMM

  14. Solving Sequence Labeling • Direct: Use a sequence labeling algorithm • E.g. HMM, CRF, MEMM • Via classification: Use classification algorithm • Issue: What about tag features?

  15. Solving Sequence Labeling • Direct: Use a sequence labeling algorithm • E.g. HMM, CRF, MEMM • Via classification: Use classification algorithm • Issue: What about tag features? • Features that use class labels – depend on classification • Solutions:

  16. Solving Sequence Labeling • Direct: Use a sequence labeling algorithm • E.g. HMM, CRF, MEMM • Via classification: Use classification algorithm • Issue: What about tag features? • Features that use class labels – depend on classification • Solutions: • Don’t use features that depend on class labels (loses info)

  17. Solving Sequence Labeling • Direct: Use a sequence labeling algorithm • E.g. HMM, CRF, MEMM • Via classification: Use classification algorithm • Issue: What about tag features? • Features that use class labels – depend on classification • Solutions: • Don’t use features that depend on class labels (loses info) • Use other process to generate class labels, then use

  18. Solving Sequence Labeling • Direct: Use a sequence labeling algorithm • E.g. HMM, CRF, MEMM • Via classification: Use classification algorithm • Issue: What about tag features? • Features that use class labels – depend on classification • Solutions: • Don’t use features that depend on class labels (loses info) • Use other process to generate class labels, then use • Perform incremental classification to get labels, use labels as features for instances later in sequence

  19. HMM Trellis <s> time flies like an arrow N N N N N V V V V V BOS 0 P P P P P D D D D D Adapted from F. Xia

  20. Viterbi • Initialization: • Recursion: • Termination:

  21. Decoding • Goal: Identify highest probability tag sequence

  22. Decoding • Goal: Identify highest probability tag sequence • Issues: • Features include tags from previous words • Not immediately available

  23. Decoding • Goal: Identify highest probability tag sequence • Issues: • Features include tags from previous words • Not immediately available • Uses tag history • Just knowing highest probability preceding tag insufficient

  24. Decoding • Approach: Retain multiple candidate tag sequences • Essentially search through tagging choices

  25. Decoding • Approach: Retain multiple candidate tag sequences • Essentially search through tagging choices • Which sequences?

  26. Decoding • Approach: Retain multiple candidate tag sequences • Essentially search through tagging choices • Which sequences? • All sequences?

  27. Decoding • Approach: Retain multiple candidate tag sequences • Essentially search through tagging choices • Which sequences? • All sequences? • No. Why not?

  28. Decoding • Approach: Retain multiple candidate tag sequences • Essentially search through tagging choices • Which sequences? • All sequences? • No. Why not? • How many sequences?

  29. Decoding • Approach: Retain multiple candidate tag sequences • Essentially search through tagging choices • Which sequences? • All sequences? • No. Why not? • How many sequences? • Branching factor: N (# tags); Depth: T (# words) • NT

  30. Decoding • Approach: Retain multiple candidate tag sequences • Essentially search through tagging choices • Which sequences? • All sequences? • No. Why not? • How many sequences? • Branching factor: N (# tags); Depth: T (# words) • NT • Top K highest probability sequences

  31. Breadth-First Search <s> time flies like an arrow BOS

  32. Breadth-First Search <s> time flies like an arrow N BOS V

  33. Breadth-First Search <s> time flies like an arrow N N V BOS N V V

  34. Breadth-First Search <s> time flies like an arrow P N V N P V V BOS P N V V V P V

  35. Breadth-First Search <s> time flies like an arrow P N V N P V V BOS P N V V V P V

  36. Breadth-First Search <s> time flies like an arrow P N V N P V V BOS P N V V V P V

  37. Breadth-first Search • Is breadth-first search efficient?

  38. Breadth-first Search • Is it efficient? • No, it tries everything

  39. Beam Search • Intuition: • Breadth-first search explores all paths

  40. Beam Search • Intuition: • Breadth-first search explores all paths • Lots of paths are (pretty obviously) bad • Why explore bad paths?

  41. Beam Search • Intuition: • Breadth-first search explores all paths • Lots of paths are (pretty obviously) bad • Why explore bad paths? • Restrict to (apparently best) paths • Approach: • Perform breadth-first search, but

  42. Beam Search • Intuition: • Breadth-first search explores all paths • Lots of paths are (pretty obviously) bad • Why explore bad paths? • Restrict to (apparently best) paths • Approach: • Perform breadth-first search, but • Retain only k ‘best’ paths thus far • k: beam width

  43. Beam Search, k=3 <s> time flies like an arrow BOS

  44. Beam Search, k=3 <s> time flies like an arrow N BOS V

  45. Beam Search, k=3 <s> time flies like an arrow N N V BOS N V V

  46. Beam Search, k=3 <s> time flies like an arrow P N V N P V V BOS P N V V V

  47. Beam Search, k=3 <s> time flies like an arrow P N V N P V V BOS P N 56 V V V

  48. Beam Search • W={w1,w2,…,wn}: test sentence

  49. Beam Search • W={w1,w2,…,wn}: test sentence • sij: jth highest prob. sequence up to & inc. word wi

More Related