1 / 76

Ling 570 Day 16 : Sequence modeling Named Entity Recognition

Ling 570 Day 16 : Sequence modeling Named Entity Recognition. Sequence Labeling. Goal: Find most probable labeling of a sequence Many sequence labeling tasks POS tagging Word segmentation Named entity tagging Story/spoken sentence segmentation Pitch accent detection Dialog act tagging.

sovann
Download Presentation

Ling 570 Day 16 : Sequence modeling Named Entity Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ling 570 Day 16:Sequence modelingNamed Entity Recognition

  2. Sequence Labeling • Goal: Find most probable labeling of a sequence • Many sequence labeling tasks • POS tagging • Word segmentation • Named entity tagging • Story/spoken sentence segmentation • Pitch accent detection • Dialog act tagging

  3. HMM search space N N N N N V V V V V P P P P P DT DT DT DT DT time flies like an arrow

  4. N N V DT N Find max in last column, Follow back-pointer chains to recover that best sequence

  5. Viterbi • Initialization: • Recursion: • Termination:

  6. Decoding • Goal: Identify highest probability tag sequence • Issues: • Features include tags from previous words • Not immediately available • Uses tag history • Just knowing highest probability preceding tag insufficient

  7. Decoding • Approach: Retain multiple candidate tag sequences • Essentially search through tagging choices • Which sequences? • We can’t look at all of them – exponentially many! • Instead, use top K highest probability sequences

  8. Breadth-First Search <s> time flies like an arrow BOS

  9. Breadth-First Search <s> time flies like an arrow N BOS V

  10. Breadth-First Search <s> time flies like an arrow N N V BOS N V V

  11. Breadth-First Search <s> time flies like an arrow P N V N P V V BOS P N V V V P V

  12. Breadth-First Search <s> time flies like an arrow P N V N P V V BOS P N V V V P V

  13. Breadth-First Search <s> time flies like an arrow P N V N P V V BOS P N V V V P V

  14. Breadth-first Search • Is breadth-first search efficient?

  15. Breadth-first Search • Is it efficient? • No, it tries everything

  16. Beam Search • Intuition: • Breadth-first search explores all paths • Lots of paths are (pretty obviously) bad • Why explore bad paths? • Restrict to (apparently best) paths • Approach: • Perform breadth-first search, but • Retain only k ‘best’ paths thus far • k: beam width

  17. Beam Search, k=3 <s> time flies like an arrow BOS

  18. Beam Search, k=3 <s> time flies like an arrow N BOS V

  19. Beam Search, k=3 <s> time flies like an arrow N N V BOS N V V

  20. Beam Search, k=3 <s> time flies like an arrow P N V N P V V BOS P N V V V

  21. Beam Search, k=3 <s> time flies like an arrow P N V N P V V BOS P N 56 V V V

  22. Beam Search • W={w1,w2,…,wn}: test sentence • sij: jth highest prob. sequence up to & inc. word wi • Generate tags for w1, keep top k, set s1j accordingly • for i=2 to n: • Extension: add tags for wito each s(i-1)j • Beam selection: • Sort sequences by probability • Keep only top k sequences • Return highest probability sequence sn1

  23. POS Tagging • Overall accuracy: 96.3+% • Unseen word accuracy: 86.2% • Comparable to HMM tagging accuracy or TBL • Provides • Probabilistic framework • Better able to model different info sources • Topline accuracy 96-97% • Consistency issues

  24. Beam Search • Beam search decoding: • Variant of breadth first search • At each layer, keep only top k sequences • Advantages: • Efficient in practice: beam 3-5 near optimal • Empirically, beam 5-10% of search space; prunes 90-95% • Simple to implement • Just extensions + sorting, no dynamic programming • Running time: O(kT) [vs. O(NT)] • Disadvantage: Not guaranteed optimal (or complete)

  25. Viterbi Decoding • Viterbi search: • Exploits dynamic programming, memoization • Requires small history window • Efficient search: O(N2T) • Advantage: • Exact: optimal solution is returned • Disadvantage: • Limited window of context

  26. Beam vs Viterbi • Dynamic programming vs heuristic search • Guaranteed optimal vs no guarantee • Different context window

  27. MaxEnt POS Tagging • Part of speech tagging by classification: • Feature design • word and tag context features • orthographic features for rare words • Sequence classification problems: • Tag features depend on prior classification • Beam search decoding • Efficient, but inexact • Near optimal in practice

  28. Named Entity Recognition

  29. Roadmap • Named Entity Recognition • Definition • Motivation • Challenges • Common Approach

  30. Named Entity Recognition • Task: Identify Named Entities in (typically) unstructured text • Typical entities: • Person names • Locations • Organizations • Dates • Times

  31. Example • Lady Gaga is playing a concert for the Bushes in Texas next September

  32. Example • Lady Gaga is playing a concert for the Bushes in Texas next September person person location time

  33. Example from financial news • Ray Dalio’s Bridgewater Associates is an extremely large and extremely successful hedge fund. • Based in Westport and known for its strong -- some would say cultish -- culture, it has grown to well over $100 billion in assets under management with little negative impact on its returns. person organization location value

  34. Entity types may differ by applicaiton • News: • People, countries, organizations, dates, etc. • Medical records: • Diseases, medications, organisms, organs, etc.

  35. Named Entity Types • Common categories

  36. Named Entity Examples • For common categories:

  37. Why NER? • Machine translation: • Lady Gaga is playing a concert for the Bushes in Texas next September • La señora Gagaestoca un concierto para los arbustos … • Number: • 9/11: Date vs ratio • 911: Emergency phone number, simple number

  38. Why NER? • Information extraction: • MUC task: Joint ventures/mergers • Focus on Company names, Person Names (CEO), valuations • Information retrieval: • Named entities focus of retrieval • In some data sets, 60+% queries target NEs • Text-to-speech: • 206-616-5728 • Phone numbers (vs other digit strings) , differ by language

  39. Challenges • Ambiguity • Washington chose • D.C., State, George, etc • Most digit strings • cat: (95 results) • CAT(erpillar) stock ticker • Computerized Axial Tomography • Chloramphenicol Acetyl Transferase • small furry mammal

  40. Context & Ambiguity

  41. Evaluation • Precision • Recall • F-measure

  42. Resources • Online: • Name lists • Baby name, who’s who, newswire services, census.gov • Gazetteers • SEC listings of companies • Tools • Lingpipe • OpenNLP • Stanford NLP toolkit

  43. Approaches to NER • Rule/Regex-based: • Match names/entities in lists • Regex: e.g \d\d/\d\d/\d\d: 11/23/11 • Currency: $\d+\.\d+ • Machine Learning via Sequence Labeling: • Better for names, organizations • Hybrid

  44. NER as Sequence Labeling

  45. NER as Classification Task • Instance:

  46. NER as Classification Task • Instance: token • Labels:

  47. NER as Classification Task • Instance: token • Labels: • Position: B(eginning), I(nside), Outside

  48. NER as Classification Task • Instance: token • Labels: • Position: B(eginning), I(nside), Outside • NER types: PER, ORG, LOC, NUM

  49. NER as Classification Task • Instance: token • Labels: • Position: B(eginning), I(nside), Outside • NER types: PER, ORG, LOC, NUM • Label: Type-Position, e.g. PER-B, PER-I, O, … • How many tags?

  50. NER as Classification Task • Instance: token • Labels: • Position: B(eginning), I(nside), Outside • NER types: PER, ORG, LOC, NUM • Label: Type-Position, e.g. PER-B, PER-I, O, … • How many tags? • (|NER Types|x 2) + 1

More Related