1 / 53

Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Dynamic Conditional Random Fields for Labeling and Segmenting Sequences. Khashayar Rohanimanesh Joint work with Charles Sutton Andrew McCallum University of Massachusetts Amherst. Noun Phrase Segmentation (CoNLL-2000, Sang and Buckholz, 2000).

Download Presentation

Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Conditional Random Fieldsfor Labeling and Segmenting Sequences Khashayar Rohanimanesh Joint work with Charles Sutton Andrew McCallum University of Massachusetts Amherst

  2. Noun Phrase Segmentation(CoNLL-2000, Sang and Buckholz, 2000) B I I B I I O O O Rockwell International Corp. 's Tulsa unit said it signed B I I O B I O B Ia tentative agreement extending its contract with Boeing Co. O O B I O B B I I to provide structural parts for Boeing 's 747 jetliners.

  3. Named Entity Recognition [McCallum & Li, 2003] CRICKET - MILLNS SIGNS FOR BOLAND CAPE TOWN 1996-08-22 South African provincial side Boland said on Thursday they had signed Leicestershire fast bowler David Millns on a one year contract. Millns, who toured Australia with England A in 1992, replaces former England all-rounder Phillip DeFreitas as Boland's overseas professional. Labels: Examples: PER Yayuk Basuki Innocent Butare ORG 3M KDP Leicestershire LOC Leicestershire Nirmal Hriday The Oval MISC Java Basque 1,000 Lakes Rally

  4. Information Extraction Seminar Announcements [Peshkin,Pfeffer 2003] a seminar entitled “Nanorheology of Polymers & Complex STIMELOC Fluids," at 4:30 p.m, Monday, Feb. 27, in Wean Hall 7500. SPEAK The seminar will be given by Professor Steven Granick Biological Abstracts [Skounakis,Craven,Ray 2003] PROTEIN SNC1, a gene from the yeast Saccharomyces cerevisiae, LOC encodes a homolog of vertebrate synaptic vesicle-associated membrane proteins (VAMPs) or synaptobrevins. ” subcellular-localization(SNC1,vesicle)

  5. Simultaneous noun-phrase & part-of-speech tagging B I I B I I O O O N N N O N N V O V Rockwell International Corp. 's Tulsa unit said it signed B I I O B I O B IO J N V O N O N N a tentative agreement extending its contract with Boeing Co.

  6. Probabilistic Sequence Labeling

  7. Linear-Chain CRFs Finite-State

  8. Linear-Chain CRFs Graphical Model Training       y       x Um… what's ?

  9. Linear-Chain CRFs Graphical Model Training          y    x Rewrite  as: Now solve forkby convex optimization. for some features fk and weights k

  10. General CRFs Trainkby convex optimization to maximize conditional log-likelihood. A CRF is an undirected, conditionally-trained graphical model. Features fk can be arbitrary, overlapping, domain-specific.

  11. CRF Training Trainkby convex optimization to maximize conditional log-likelihood.

  12. Optimization Methods • Generalized Iterative Scaling (GIS) • Improved Iterative Scaling • First order methods • Non-Linear conjugate gradient • Second Order methods • Limited memory Quasi-Newton (BFGS)

  13. From Generative to Conditional Model Graphical Model Models observation HMMs - Does not model observation - Label bias problem MEMMs - Does not model observation - Eliminates label bias problem Linear chain CRFs

  14. Dynamic CRFs

  15. Simultaneous noun-phrase & part-of-speech tagging B I I B I I O O O N N N O N N V O V Rockwell International Corp. 's Tulsa unit said it signed B I I O B I O B IO J N V O N O N N a tentative agreement extending its contract with Boeing Co.

  16. Features • Word identity “International” • Capitalization Xxxxxxx • Character classes Contains digits • Character n-gram …ment • Lexicon memberships In list of company names • WordNet synset (speak, say, tell) • … • Part of speech Proper Noun

  17. Multiple Nested Predictionson the Same Sequence Noun phrase Part-of-speech (output prediction) Word identity (input observation) Rockwell Int’l Corp. 's Tulsa

  18. Multiple Nested Predictionson the Same Sequence Noun phrase (output prediction) Part-of-speech (input observation) Word identity (input observation) Rockwell Int’l Corp. 's Tulsa But errors in each stage are compounding. Uncertainty from one stage to the next is not preserved.

  19. Cascaded Predictions Named-entity tag Part-of-speech Segmentation (output prediction) Chinese character (input observation)

  20. Cascaded Predictions Named-entity tag Part-of-speech (output prediction) Segmentation (input observation) Chinese character (input observation)

  21. Cascaded Predictions Named-entity tag (output prediction) Part-of-speech (input obseration) Segmentation (input observation) Chinese character (input observation) Even more stages here, so compounding of errors is worse.

  22. Joint PredictionCross-Product over Labels O(|V| x 9902) parameters O(T x 9902) running time 2 x 45 x 11 = 990 possible states e.g.: state label = (Wordbeg, Noun, Person) Segmentation+POS+NE (output prediction) Chinese character (input observation)

  23. Joint PredictionFactorial CRF O(|V| x 990) parameters Named-entity tag (output prediction) Part-of-speech (output prediction) Segmentation (output prediction) Chinese character (input observation)

  24. Linear-Chain to Factorial CRFsModel Definition Linear-chain ... y ... x Factorial ... u ... v ... w ... x where

  25. Linear-Chain to Factorial CRFsLog-likelihood Training Linear-chain ... y ... x Factorial ... u ... v ... w ... x

  26. Dynamic CRFsUndirected conditionally-trained analogue to Dynamic Bayes Nets (DBNs) Factorial Higher-Order Hierarchical

  27. Need for Inference Marginal distributions ... y ... x Used during training Most-likely (Viterbi) labeling ... y ... x Used to label a sequence 9000 training instances x 100 maximizer iterations = 900,000 calls to inference algorithm!

  28. Inference (Exact)Junction Tree Max-clique: 3 x 45 x 45 = 6075 assignments NP POS

  29. Inference (Exact)Junction Tree Max-clique: 3 x 45 x 45 x 11 = 66825 assignments NER POS SEG

  30. m2(v3) m1(v2) m1(v4) m2(v5) m3(v6) m5(v2) m6(v3) m4(v1) Inference (Approximate)Loopy Belief Approximation v1 v2 v3 m2(v1) m3(v2) m5(v4) m5(v4) v5 v6 v4 m4(v5) m5(v6)

  31. Inference (Approximate)Tree Re-parameterization [Wainwright, Jaakkola, Willsky 2001]

  32. Inference (Approximate)Tree Re-parameterization [Wainwright, Jaakkola, Willsky 2001]

  33. Inference (Approximate)Tree Re-parameterization [Wainwright, Jaakkola, Willsky 2001]

  34. Inference (Approximate)Tree Re-parameterization [Wainwright, Jaakkola, Willsky 2001]

  35. ExperimentsSimultaneous noun-phrase & part-of-speech tagging B I I B I I O O O N N N O N N V O V Rockwell International Corp. 's Tulsa unit said it signed B I I O B I O B IO J N V O N O N N a tentative agreement extending its contract with Boeing Co. • Data from CoNLL Shared Task 2000 (Newswire) • Training subsets of various sizes: from 223-894 sentences • Features include: word identity, neighboring words, capitalization, lexicons of parts-of-speech, company names (1,358227 feature functions !)

  36. ExperimentsSimultaneous noun-phrase & part-of-speech tagging B I I B I I O O O N N N O N N V O V Rockwell International Corp. 's Tulsa unit said it signed B I I O B I O B IO J N V O N O N N a tentative agreement extending its contract with Boeing Co. Two experiments • Compare exact and approximate inference • Compare accuracy of cascaded CRFs and Factorial DCRFs

  37. Noun Phrase Accuracy

  38. Accuracy F1 for NP on 8936: 93.87 POS-tagger, (Brill, 1994)

  39. Summary • Many natural language tasks are solved by chaining errorful subtasks. • Approach: Jointly solve all subtasks in a single graphical model. • Learn dependence between subtasks • Allow higher-level to inform lower level • Improved joint and POS accuracy over cascaded model, but NP accuracy lower. • Current work: Emphasize one subtask

  40. Maximize Marginal Likelihood (Ongoing work) NP POS

  41. Thank you!

  42. State-of-the-art Performance • POS tagging: • 97% (Brill, 1999) • NP chinking: • 94.38% (Sha and Pereira) • 94.39% (?)

  43. Alternatives to Traditional Joint • Optimize Marginal Likelihood • Optimize Utility • Optimize Margin (M3N) [Taskar, Guestrin, Koller 2003]

  44. Maximize Marginal Likelihood (Ongoing work) NP POS

  45. Undirected Graphical Models Directed Undirected

  46. Hidden Markov Models Graphical Model Training             p(,)=p() p(|) p(|) p(|) p(|) p(|)

  47. Hidden Markov Models Graphical Model Finite-State p(,)=p() p(|) p(|) p(|) p(|) p(|)

More Related