1 / 21

CS 479, section 1: Natural Language Processing

This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License . CS 479, section 1: Natural Language Processing. Lectures # 25: MEMMs II, Named Entity Recognition. Thanks to Dan Klein of UC Berkeley for some of the materials used in this lecture. Announcements.

kaya
Download Presentation

CS 479, section 1: Natural Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License. CS 479, section 1:Natural Language Processing Lectures #25: MEMMs II, Named Entity Recognition Thanks to Dan Klein of UC Berkeley for some of the materials used in this lecture.

  2. Announcements • Reading Report #10 due now • Optional Bayes Nets Exercises • In place of Mid-term exam question #4 • Due: Friday • Reading Report #11: M&S 11.1-11.3.1, 11.3.3 • Due: Monday • Project #3 • Early: Monday • Due: Wednesday

  3. Objectives • Gain further insight into MEMMs • Understand another application: Named Entity Tagging • Reinforce the distinction between joint and conditional models • Understand the label bias problem in local conditional models • Think about other sequence labeling problems

  4. Feature Templates Your templates do notextract the current tag. The current tag is provided behind thescenes. • Emission: • Feature: < w0=future, t0=JJ > • Feature template: < w0, t0 > • public class CurrentWordExtractor implements • FeatureTemplate<LocalContext<String, String>> { • […] • public String getName() { • return "CURWORD"; • } • […] • public void extract(LocalContext<String, String> • inputDatum, Counter<String> outputVector) { • outputVector.incrementCount(getName() + ":" + • inputDatum.getCurrentWord()); • } • }

  5. Feature Templates • Edges or Transitions: • Feature: < t-1 =DT, t0=JJ > • Feature template: < t-1, t0 > • Higher order: < t-2, t-1, t0 > public class PreviousTagExtractor implements FeatureTemplate<LocalContext<String, String>> { […] public String getName() { return "PREVIOUS_TAG"; } […] public void extract(LocalContext<String, String> inputDatum, Counter<String> outputVector) { outputVector.incrementCount(getName() + ":" + inputDatum.getPreviousTag()); } }

  6. Feature Templates • We discussed others: • Feature: < w0 ends in “ly”, t0 =RB > • Feature template: < 2-letter suffix(w0), t0 > • Also possible: • Mixed feature templates: < t-1, w0, t0 >

  7. MEMMs Local Model: Mathematically: Example local context (at training time): Example feature vector:

  8. MEMMs Local Model: Mathematically: Example local context (at training time): Example feature vector:

  9. Decoding • Decoding with MaxEnttaggers: • Like decoding in HMMs • Same techniques: Viterbi, beam search, posterior decoding • Viterbi algorithm (HMMs): • Viterbi algorithm (MEMMs): • Beam search is effective! Why?

  10. New Problem • Named Entity Recognition

  11. Named Entity Recognition

  12. Named Entity Recognition Which features are dynamic? Feature Weights: Decision Point: StateforGrace Local Context

  13. Other Applications • Word breaking • Sentence breaking • Phrasal “chunking” / shallow parsing • Information extraction • Dialog act mark-up • …

  14. Label Bias Problem • An issue with conditional local models worth knowing about: • “Label bias” • Maxent taggers’ local scores can be near one without having both good “transitions” and “emissions” • This means that often evidence doesn’t flow properly in the MEMM (as a sequence model)

  15. Label Bias Problem • Search space: • Think: probabilistic garden path

  16. CRF Taggers • Newer, higher-powered discriminatively trained sequence models: • Conditional Random Fields (CRFs): normalize globally! • Voted perceptrons • Maximum Margin Markov Networks (M3Ns) • Do not decompose training into independent local regions • Con: Can be deathly slow to train – require repeated inference on training set • Pro: Avoid label bias • Differences tend not to be too important for POS tagging • Why?

  17. CRFs are Slow to Train! • POS: 45 labels, 1,000,000 words = over a week • NER: 11 labels, 200,000 words = 2 hours • Using Mallet in the NLP Lab (2008) on new hardware

  18. Reminder: Conditional vs. Joint • Joint model, conditional query: • In practice, we restrict the search over possible tags for w: • Conditional model (with discriminative training):

  19. Joint vs. Conditional Pairs Maximum EntropyClassifier Will cover General “factor graphs” next Fall in CS 401R

  20. Next Time • Parsing!

More Related