210 likes | 295 Views
Gain insight into MEMMs for Named Entity Tagging, explore feature templates & decoding methods, and understand label bias issues. Dive into dynamic features and other applications like POS tagging and CRFs. Learn about CRF taggers and the advantages and limitations of different discriminative models.
E N D
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License. CS 479, section 1:Natural Language Processing Lectures #25: MEMMs II, Named Entity Recognition Thanks to Dan Klein of UC Berkeley for some of the materials used in this lecture.
Announcements • Reading Report #10 due now • Optional Bayes Nets Exercises • In place of Mid-term exam question #4 • Due: Friday • Reading Report #11: M&S 11.1-11.3.1, 11.3.3 • Due: Monday • Project #3 • Early: Monday • Due: Wednesday
Objectives • Gain further insight into MEMMs • Understand another application: Named Entity Tagging • Reinforce the distinction between joint and conditional models • Understand the label bias problem in local conditional models • Think about other sequence labeling problems
Feature Templates Your templates do notextract the current tag. The current tag is provided behind thescenes. • Emission: • Feature: < w0=future, t0=JJ > • Feature template: < w0, t0 > • public class CurrentWordExtractor implements • FeatureTemplate<LocalContext<String, String>> { • […] • public String getName() { • return "CURWORD"; • } • […] • public void extract(LocalContext<String, String> • inputDatum, Counter<String> outputVector) { • outputVector.incrementCount(getName() + ":" + • inputDatum.getCurrentWord()); • } • }
Feature Templates • Edges or Transitions: • Feature: < t-1 =DT, t0=JJ > • Feature template: < t-1, t0 > • Higher order: < t-2, t-1, t0 > public class PreviousTagExtractor implements FeatureTemplate<LocalContext<String, String>> { […] public String getName() { return "PREVIOUS_TAG"; } […] public void extract(LocalContext<String, String> inputDatum, Counter<String> outputVector) { outputVector.incrementCount(getName() + ":" + inputDatum.getPreviousTag()); } }
Feature Templates • We discussed others: • Feature: < w0 ends in “ly”, t0 =RB > • Feature template: < 2-letter suffix(w0), t0 > • Also possible: • Mixed feature templates: < t-1, w0, t0 >
MEMMs Local Model: Mathematically: Example local context (at training time): Example feature vector:
MEMMs Local Model: Mathematically: Example local context (at training time): Example feature vector:
Decoding • Decoding with MaxEnttaggers: • Like decoding in HMMs • Same techniques: Viterbi, beam search, posterior decoding • Viterbi algorithm (HMMs): • Viterbi algorithm (MEMMs): • Beam search is effective! Why?
New Problem • Named Entity Recognition
Named Entity Recognition Which features are dynamic? Feature Weights: Decision Point: StateforGrace Local Context
Other Applications • Word breaking • Sentence breaking • Phrasal “chunking” / shallow parsing • Information extraction • Dialog act mark-up • …
Label Bias Problem • An issue with conditional local models worth knowing about: • “Label bias” • Maxent taggers’ local scores can be near one without having both good “transitions” and “emissions” • This means that often evidence doesn’t flow properly in the MEMM (as a sequence model)
Label Bias Problem • Search space: • Think: probabilistic garden path
CRF Taggers • Newer, higher-powered discriminatively trained sequence models: • Conditional Random Fields (CRFs): normalize globally! • Voted perceptrons • Maximum Margin Markov Networks (M3Ns) • Do not decompose training into independent local regions • Con: Can be deathly slow to train – require repeated inference on training set • Pro: Avoid label bias • Differences tend not to be too important for POS tagging • Why?
CRFs are Slow to Train! • POS: 45 labels, 1,000,000 words = over a week • NER: 11 labels, 200,000 words = 2 hours • Using Mallet in the NLP Lab (2008) on new hardware
Reminder: Conditional vs. Joint • Joint model, conditional query: • In practice, we restrict the search over possible tags for w: • Conditional model (with discriminative training):
Joint vs. Conditional Pairs Maximum EntropyClassifier Will cover General “factor graphs” next Fall in CS 401R
Next Time • Parsing!