1 / 64

LING / C SC 439/539 Statistical Natural Language Processing

LING / C SC 439/539 Statistical Natural Language Processing. Lecture 19 3 /27/2013. Recommended reading. Jurafsky & Martin Ch. 12, Formal grammars of English Ch. 13, Parsing with context-free grammars Ch. 14, Statistical parsing Papers Marcus et al. 1993, Penn Treebank

durin
Download Presentation

LING / C SC 439/539 Statistical Natural Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LING / C SC 439/539Statistical Natural Language Processing • Lecture 19 • 3/27/2013

  2. Recommended reading • Jurafsky & Martin • Ch. 12, Formal grammars of English • Ch. 13, Parsing with context-free grammars • Ch. 14, Statistical parsing • Papers • Marcus et al. 1993, Penn Treebank • Klein & Manning 2003, Accurate Unlexicalized Parsing • Petrov et al 2006, Learning Accurate, Compact, and Interpretable Tree Annotation • Charniak & Johnson 2005, Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking

  3. Last time • Context-free grammars • Specifies possible structures of sentences • Probabilistic context-free grammars • Generative probabilistic model for sentences and phrase structure trees • Probabilities indicate which structures are more likely • CKY parsing algorithm • Uses dynamic programming to compute an exponential number of parses in O(n3) time • Probabilistic CKY • Find most likely parse for a sentence • Provides a principled way of resolving ambiguity

  4. How do we develop a parser? • Hard to write a non-trivial CFG that has broad coverage of the constructions of a language • For a PCFG, we also have to specify the probabilities • Not clear how this can be done by hand • Solution: read PCFG from an annotated corpus • Syntactically annotated corpus = treebank

  5. Outline • Treebanks • Treebank grammars • Evaluating parsers • Improving parsing performance • Limitations of generative parsers

  6. Penn Treebank • Corpus of syntactically annotated sentences • 1.3 million words of Wall Street Journal • 50,000 sentences • Annotated by hand for part of speech, syntactic structure, and predicate-argument relations • Standard corpus for developing and testing statistical parsers

  7. You can download a portion of the Penn Treebank through NLTK • http://www.nltk.org/data >>> from nltk.corpus import treebank >>> sents = treebank.sents() # sentences >>> trees = treebank.parsed_sents() # labeled trees >>> len(trees) 3914 >>> trees[5].draw() # show tree in pop-up window >>> print trees[5] # display in brackets

  8. Example treebank sentence and annotation (S (PP (IN Despite) (NP (DT the) (JJ gloomy) (NN forecast))) (, ,) (NP-SBJ (NNP South) (NNP Korea)) (VP (VBZ has) (VP (VBN recorded) (NP (NP (DT a) (NN trade) (NN surplus)) (PP (IN of) (NP (QP ($ $) (CD 71) (CD million)) (-NONE- *U*)))) (ADVP-TMP (IN so) (IN far)) (NP-TMP (DT this) (NN year)))) (. .))

  9. Treebank has long, complex sentences Without the Cray-3 research and development expenses , the company would have been able *-2 to report a profit of $ 19.3 million *U* *ICH*-3 for the first half of 1989 rather than the $ 5.9 million *U* 0 it posted *T*-1 .

  10. Trees tend to be “flat” • They don’t indicate fine details of constituent structure • (In the study of syntax in linguistics, nodes are usually binary branching)

  11. Penn Treebank POS tag set for words

  12. Treebank node labels • Phrase labels may be augmented with function tags • http://bulba.sdsu.edu/jeanette/thesis/PennTags.html • Examples: NP-SBJ: noun phrase that is a subject ADJP-PRD: predicative adjective PP-TMP: temporal prepositional phrase NP-TMP: temporal noun phrase NP-CLR: “closely related” to previous NP • Also, null elements and traces

  13. NP subject, temporal PP, adverbial S

  14. Null elements and traces • Have empty node in tree for places in the sentence where there is an argument that is implicit or stated elsewhere in the sentence • If stated elsewhere, a “trace” in the form of a numerical index is added to both the dislocated argument and its original position

  15. Subject preposed twice

  16. Outline • Treebanks • Treebank grammars • Evaluating parsers • Improving parsing performance • Limitations of generative parsers

  17. Obtain PCFG from a treebank • Want to build a parser automatically • Acquire CFG from a treebank • In a phrase structure tree, nodes and their children indicate constituency • Write a CFG rule for each such pattern • Extract PCFG • Count frequencies of nodes and their children in order to assign probability to rules

  18. Ignore extra information • When developing a parser from the Penn Treebank, people typically ignore extra information like function tags, null elements, traces • Focus on constituency • Separate tasks: • Semantic role labeling • Label the arguments of verbs according to their semantic function • For example, label NP as subject or object • Recover null elements • http://www.seas.upenn.edu/~gabbard/fully_parsing_the_penn_treebank.pdf

  19. Count frequencies of rules from tree • First get rid of functional tags • Then count rules S  NP VP . 1 NP  JJ 1 JJ  Many 1 VP  VDB NP 1 VBD  lost 1 NP  PRP$ NNS 1 PRP$  their 1 NNS  farms 1 .  . 1

  20. Probability of a rule • Then, given all the rules for a particular LHS nonterminal, calculate the probability of each rule • Example, rule counts: A  b 5 A  c 10 • Probabilities: A  b 1/3 A  c 2/3

  21. Some treebank VP rules • VP  VBD PP • VP  VBD PP PP • VP  VBD PP PP PP • VP  VBD PP PP PP PP • VP  VB ADVP PP • VP  VB PP ADVP • VP  ADVP VB PP • VP  VBD PP PP PP PP PP ADVP PP • This mostly happens because we [go] [from football] [in the fall] [to lifting] [in the winter] [to football] [again] [in the spring]

  22. Some treebank NP rules • NP  DT JJ NN • NP  DT JJ NNS • NP  DT JJ NN NN • NP  DT JJ JJ NN • NP  DT JJ CD NNS • NP  RB DT JJ NN NN • NP  RB DT JJ JJ NNS • NP  DT JJ JJ NNP NNS • NP  DT NNP NNPNNPNNP JJ NN • NP  DT JJ NNP CC JJ JJ NN NNS • NP  RB DT JJS NN NN SBAR • NP  DT VBG JJ NNP NNP CC NNP • NP  DT JJ NNS , NNS CC NN NNS NN • NP  DT JJ JJ VBG NN NNP NNP FW NNP • NP  NP JJ , JJ '' SBAR '' NNS

  23. Some complicated NP rules • NP  DT JJ JJ VBG NN NNP NNP FW NNP [The]DT [state-owned]JJ [industrial]JJ [holding]VBG [company]NN [Instituto]NNP [Nacional]NNP [de]FW [Industria]NNP • NP  NP JJ , JJ '' SBAR '' NNS [Shearson’s]NP [easy-to-film]JJ [,], [black-and-white]JJ ['']'' [Where We Stand]SBAR ['']'' [commercials]NNS

  24. Treebank grammar rules • Treebank rules are different from toy grammar rules • More nonterminals on right-hand side • Large number of rules • Flat rules lead to large number of rules for a particular LHS nonterminal • (Penn Treebank: 1.3 million tokens) • 17,500 distinct rule types • 4,500 distinct VP rules

  25. Outline • Treebanks • Treebank grammars • Evaluating parsers • Improving parsing performance • Limitations of generative parsers

  26. Procedure for statistical parsing • Read a grammar from a treebank • Convert CFG to appropriate form • Remove functional tags • If using CKY, convert to Chomsky Normal Form • Modify rules and labels further • Parse sentences • Convert back to original grammar • Compare your parse tree to gold standard in the treebank

  27. Standard parsing setup • Penn Treebank • 50,000 sentences • Divided into 24 sections • Training: sections 02-21 • Development: section 22 • Test: section 23 • Sections 00 and 01 are not used because they are inconsistently annotated • These were the first sections they annotated

  28. Different ways of assessing performance • Quantitative evaluation • Labeled precision • Labeled recall • Crossing brackets • Extensibility • Out-of-domain sentences • Other languages • Efficiency • Size of grammar • Parsing time

  29. PARSEVAL metrics (Black et. al. 1991) • Measure by correct constituents, not sentences • Sentences too hard, esp. long ones • Constituent: • Sequence of words under a nonterminal in the parse tree • A constituent is correct when: • There exists a constituent with same span as in gold standard • Same nonterminal label as gold standard • Note: constituent doesn’t have to be recursively identical

  30. PARSEVAL metrics (Black et. al. 1991) • Labeled recall: # of correct constituents in parse # of constituents in gold standard • Labeled precision: # of correct constituents in parse # of total constituents in parse • F-measure: 2*P*R / ( P + R )

  31. Also crossing brackets • Crossing brackets, which are really bad: • parsed as ( A ( B C )) but correct parse is (( A B ) C ) • Report % of sentences with crossing brackets • State-of-the-art performance on Treebank: ~90% labeled recall ~90% labeled precision < 1% crossing brackets

  32. Outline • Treebanks • Treebank grammars • Evaluating parsers • Improving parsing performance • (Some slides borrowed from D. Klein) • Limitations of generative parsers

  33. Count frequencies of rules from tree • Count rules S  NP VP . 1 NP  JJ 1 JJ  Many 1 VP  VDB NP 1 VBD  lost 1 NP  PRP$ NNS 1 PRP$  their 1 NNS  farms 1 .  . 1 Get rid of functional tags

  34. Improving parsing performance • F-measure for grammar read directly off of tree is 72.6% • We can get better performance by modifying the PCFG rules to indicate richer linguistic relationships

  35. Probability and PCFG rules • X  Y Z • Y  … • Z  … • p(tree for X  Y Z ) = • p(rule X  Y Z ) * p(tree for Y) * p(tree for Z) • Expansion of Y is independent of parent X and sister Z (and similarly for Z) • Think in terms of generation: choose any X rule, and any Z rule • However, this isn’t what we want linguistically X Z Y

  36. Independence assumption is too strong • Expansion of nonterminalsisn’t independent of context • Example: expansion of NP is dependent on parent • NP under S is a subject NP • NP under VP is an object NP

  37. Encode linguistic context in the CFG • Relax independence assumptions by indicating linguistic dependencies in the grammar rules • Example: p(NP  NP PP | parent=VP) • Methods • Lexicalization • Unlexicalized • Horizontal Markovization • Vertical Markovization / parent annotation • Splitting tags

  38. Lexical relationshipsindicated by paths in tree • How can we use lexical relationships in parsing PCFGs?

  39. Indicate lexical head in tree • For each CFG rule, indicate the “head” child of the phrase • “Head” = the node that determines the linguistic properties of the phrase • Resulting CFG is a template for CFGs with specific words as heads • Example: CFG with head rules S -> NP VP[head] VP -> V[head] NP NP -> DT N[head]

  40. Apply head rules to trees • Every nonterminal in tree is augmented with a head • Lexical relations are now encoded locally in the tree

  41. Result: Lexicalized CFG • Example: S(questioned -> NP(lawyer) VP(questioned) VP(questioned) -> V(questioned) NP(witness) NP(lawyer) -> DT(the) N(lawyer) NP(witness) -> DT(the) N(witness) • Advantage: specify richer linguistic dependencies • Disadvantage: sparse data in probability estimation • Encodes specific words in the grammar

  42. Unlexicalized parsers • Encode non-lexical information into tree nodes • Can encode multiple relationships • Methods • Parent annotation / Vertical Markovization • Condition on labels of ancestor nodes • Horizontal Markovization • Condition on labels of left/right neighbors • Splitting tags • Manual and automatic

  43. Parent annotation (Special case of vertical Markovization)

  44. Parent annotation takes care of this • Expansion of nonterminals isn’t independent of context • Example: expansion of NP is dependent on parent • NP under S is a subject NP • NP under VP is an object NP

  45. v = variable length: back off to lower-order if freq of rule < 10

  46. Flat trees lead to sparse data; rewrite as binary branching, encode history of previous constituent(s) F-measure v = variable length: back off to lower-order if freq of rule < 10

  47. Manually split categories • Examples • NP: subject vs. object • DT: determiners (a/the) vs. demonstratives (this/that) • IN: sentential vs. prepositional • Advantages: • Linguistically motivated • Maintain a small category set

More Related