Enhancing Named Entity Recognition with MEMMs II Lectures

This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License. CS 479, section 1:Natural Language Processing Lectures #25: MEMMs II, Named Entity Recognition Thanks to Dan Klein of UC Berkeley for some of the materials used in this lecture.

Announcements • Reading Report #10 due now • Optional Bayes Nets Exercises • In place of Mid-term exam question #4 • Due: Friday • Reading Report #11: M&S 11.1-11.3.1, 11.3.3 • Due: Monday • Project #3 • Early: Monday • Due: Wednesday

Objectives • Gain further insight into MEMMs • Understand another application: Named Entity Tagging • Reinforce the distinction between joint and conditional models • Understand the label bias problem in local conditional models • Think about other sequence labeling problems

Feature Templates Your templates do notextract the current tag. The current tag is provided behind thescenes. • Emission: • Feature: < w0=future, t0=JJ > • Feature template: < w0, t0 > • public class CurrentWordExtractor implements • FeatureTemplate<LocalContext<String, String>> { • […] • public String getName() { • return "CURWORD"; • } • […] • public void extract(LocalContext<String, String> • inputDatum, Counter<String> outputVector) { • outputVector.incrementCount(getName() + ":" + • inputDatum.getCurrentWord()); • } • }

Feature Templates • Edges or Transitions: • Feature: < t-1 =DT, t0=JJ > • Feature template: < t-1, t0 > • Higher order: < t-2, t-1, t0 > public class PreviousTagExtractor implements FeatureTemplate<LocalContext<String, String>> { […] public String getName() { return "PREVIOUS_TAG"; } […] public void extract(LocalContext<String, String> inputDatum, Counter<String> outputVector) { outputVector.incrementCount(getName() + ":" + inputDatum.getPreviousTag()); } }

Feature Templates • We discussed others: • Feature: < w0 ends in “ly”, t0 =RB > • Feature template: < 2-letter suffix(w0), t0 > • Also possible: • Mixed feature templates: < t-1, w0, t0 >

MEMMs Local Model: Mathematically: Example local context (at training time): Example feature vector:

Decoding • Decoding with MaxEnttaggers: • Like decoding in HMMs • Same techniques: Viterbi, beam search, posterior decoding • Viterbi algorithm (HMMs): • Viterbi algorithm (MEMMs): • Beam search is effective! Why?

New Problem • Named Entity Recognition

Named Entity Recognition

Named Entity Recognition Which features are dynamic? Feature Weights: Decision Point: StateforGrace Local Context

Other Applications • Word breaking • Sentence breaking • Phrasal “chunking” / shallow parsing • Information extraction • Dialog act mark-up • …

Label Bias Problem • An issue with conditional local models worth knowing about: • “Label bias” • Maxent taggers’ local scores can be near one without having both good “transitions” and “emissions” • This means that often evidence doesn’t flow properly in the MEMM (as a sequence model)

Label Bias Problem • Search space: • Think: probabilistic garden path

CRF Taggers • Newer, higher-powered discriminatively trained sequence models: • Conditional Random Fields (CRFs): normalize globally! • Voted perceptrons • Maximum Margin Markov Networks (M3Ns) • Do not decompose training into independent local regions • Con: Can be deathly slow to train – require repeated inference on training set • Pro: Avoid label bias • Differences tend not to be too important for POS tagging • Why?

CRFs are Slow to Train! • POS: 45 labels, 1,000,000 words = over a week • NER: 11 labels, 200,000 words = 2 hours • Using Mallet in the NLP Lab (2008) on new hardware

Reminder: Conditional vs. Joint • Joint model, conditional query: • In practice, we restrict the search over possible tags for w: • Conditional model (with discriminative training):

Joint vs. Conditional Pairs Maximum EntropyClassifier Will cover General “factor graphs” next Fall in CS 401R

Next Time • Parsing!

Enhancing Named Entity Recognition with MEMMs II Lectures

Enhancing Named Entity Recognition with MEMMs II Lectures

Presentation Transcript

PROCESSING OF CERAMICS AND CERMETS

Lexical Semantics and Ontologies Tutorial at the ACL/HCSnet 2006 Advanced Program in Natural Language Processing

Natural Language Processing Applications

PSYCHOLOGY OF LANGUAGE

CS 388: Natural Language Processing: Part-Of-Speech Tagging, Sequence Labeling, and Hidden Markov Models (HMMs)

Language processing: introduction to compiler construction

LING / C SC 439/539 Statistical Natural Language Processing

Statistical Natural Language Parsing

Natural Language Processing (4)

Leica Geo Office GNSS Processing

Speech perception as a window into language processing:

Natural Language Processing COMPSCI 423/723

Natural Language Processing, Linguistics and Terminology

Membranes for Gas Conditioning

WJEC GCSE H English Language Preparing for the Reading Section

Assembly Language

Image Processing

Chapter 15 Evolution

Natural Language Processing

Kernel Methods in Natural Language Processing