1 / 51

PARSING

PARSING. Analyzing Linguistic Units. Why should we parse a sentence? to detect relations among words used to normalize surface syntactic variations. invaluable for a number of NLP applications. Some Concepts. Grammar: A generative device that prescribes a set of valid strings.

kele
Download Presentation

PARSING

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PARSING

  2. Analyzing Linguistic Units • Why should we parse a sentence? • to detect relations among words • used to normalize surface syntactic variations. • invaluable for a number of NLP applications

  3. Some Concepts • Grammar: A generative device that prescribes a set of valid strings. • Parser: A device that uncovers the sequence of grammar rules that might have generated the input sentence. • Input: Grammar, Sentence • Output: parse tree, derivation tree • Recognizer: A device that returns a “yes” if the input string could be generated by the grammar. • Input: Grammar, Sentence • Output: boolean

  4. Searching for a Parse • Grammar + rewrite procedure encodes • all strings generated by the grammar L(G) • all parse trees for each string (s) generated T(G) = U{Ts(G)} • Given an input sentence (I), the set of parse trees is TI (G). • Parsing is searching for TI (G) ⊆ T(G) • Ideally, parser finds the appropriate parse for the sentence.

  5. CFG for Fragment of English S VP NP V Book Nom Det that N flight Bottom-up Parsing Top-down Parsing

  6. Top-down/Bottom-up Parsing • Control strategy -- how to explore search space? • Pursuing all parses in parallel or backtrack or …? • Which rule to apply next? • Which node to expand next? • Look at how the Top-down and Bottom-up parsing works on the board for “Book that flight”

  7. Top-down, Depth First, Left-to-Right parser • Systematic, incremental expansion of the search space. • In contrast to a parallel parser • Start State: (•S,0) • End State: (•,n) n is the length of input to be parsed • Next State Rules • (•wj+1b,j)  (•b,j+1) • (•Bb,j)  (•gb,j) if Bg (note B is left-most non-terminal) • Agenda: A data structure to keep track of the states to be expanded. • Depth-first expansion, if Agenda is a stack.

  8. Fig 10.7 CFG

  9. Category Left Corners S Det, PropN, Aux, V NP Det, PropN Nom N VP V Left Corners • Can we help top-down parsers with some bottom-up information? • Unnecessary states created if there are many Bg rules. • If after successive expansions B * w d; and w does not match the input, then the series of expansion is wasted. • The leftmost symbol derivable from B needs to match the input. • look ahead to left-corner of the tree • B is a left-corner of A if A * B g • Build table with left-corners of all non-terminals in grammar and consult before applying rule • At a given point in state expansion (•Bb,j) • Pick the rule B C g if left-corner of C matches the input wj+1

  10. Limitation of Top-down Parsing: Left Recursion • Depth-first search will never terminate if grammar is left recursive (e.g. NP --> NP PP) • Solutions: • Rewrite the grammar to a weakly equivalent one which is not left-recursive NP  NP PP NP  Nom PP NP  Nom • This may make rules unnatural • Fix depth of search explicitly • Other book-keeping needed in top-down parsing • Memoization for reusing previously parsed substrings • Packed representation for parse ambiguity • NP  Nom NP’ • NP’  PP NP’ • NP’  e

  11. Dynamic Programming for Parsing • Memoization: • Create table of solutions to sub-problems (e.g. subtrees) as parse proceeds • Look up subtrees for each constituent rather than re-parsing • Since all parses implicitly stored, all available for later disambiguation • Examples: Cocke-Younger-Kasami (CYK) (1960), Graham-Harrison-Ruzzo (GHR) (1980) and Earley (1970) algorithms • Earley parser: O(n^3) parser • Top-down parser with bottom-up information • State: [i, A  a •b, j] • j is the position in the string that has been parsed • i is the position in the string where A begins • Top-down prediction: S * w1… wi A g • Bottom-up completion: a wj+1 … wn * wi … wn

  12. Earley Parser • Data Structure: An n+1 cell array called : Chart • For each word position, chart contains set of states representing all partial parse trees generated to date. • E.g. chart[0] contains all partial parse trees generated at the beginning of the sentence • Chart entries represent three type of constituents: • predicted constituents (top-down predictions) • in-progress constituents (we’re in the midst of …) • completed constituents (we’ve found …) • Progress in parse represented by Dotted Rules • Position of • indicates type of constituent • 0 Book 1 that 2 flight 3 (0,S  • VP, 0) (predicting VP) (1,NP  Det • Nom, 2) (finding NP) (0,VP  V NP •, 3) (found VP)

  13. Earley Parser: Parse Success • Final answer is found by looking at last entry in chart • If entry resembles (0,S  •, n) then input parsed successfully • But … note that chart will also contain a record of all possible parses of input string, given the grammar -- not just the successful one(s) • Why is this useful?

  14. Earley Parsing Steps • Start State: (0, S’ •S, 0) • End State: (0, Sa•, n) n is the input size • Next State Rules • Scanner: read input • (i, Aa•wj+1b, j)  (i, Aawj+1•b, j+1) • Predictor: add top-down predictions • (i, Aa•Bb, j)  (j, B•g, j) if Bg (note B is left-most non-terminal) • Completer: move dot to right when new constituent found • (i, Ba•Ab, k) (k, Ag•, j)  (i, BaA•b, j) • No backtracking and no states removed: keep complete history of parse • Why is this useful?

  15. Earley Parser Steps

  16. Book that flight (Chart [0]) • Seed chart with top-down predictions for S from grammar

  17. S  NP VP S  Aux NP VP S  VP NP  Det Nom Prep from | to | on PropN  Houston | TWA CFG for Fragment of English Det  that | this | a N  book | flight | meal | money V  book | include | prefer Aux  does Nom  N Nom  N Nom NP PropN VP  V Nom  Nom PP VP  V NP PP  Prep NP

  18. Chart[1] Vbook passed to Completer, which finds 2 states in Chart[0] whose left corner is V and adds them to Chart[1], moving dots to right

  19. Retrieving the parses • Augment the Completer to add pointer to prior states it advances as a field in the current state • i.e. what states combined to arrive here? • Read the pointers back from the final state • What if the final cell does not have the final state? – Error handling. • Is it a total loss? No... • Chart contains every constituent and combination of constituents possible for the input given the grammar • Useful for partial parsing or shallow parsing used in information extraction

  20. Alternative Control Strategies • Change Earley top-down strategy to bottom-up or ... • Change to best-first strategy based on the probabilities of constituents • Compute and store probabilities of constituents in the chart as you parse • Then instead of expanding states in fixed order, allow probabilities to control order of expansion

  21. Probabilistic and Lexicalized Parsing

  22. Probabilistic CFGs • Weighted CFGs • Attach weights to rules of CFG • Compute weights of derivations • Use weights to pick, preferred parses • Utility: Pruning and ordering the search space, disambiguate, Language Model for ASR. • Parsing with weighted grammars (like Weighted FA) • T* = arg maxT W(T,S) • Probabilistic CFGs are one form of weighted CFGs.

  23. Probability Model • Rule Probability: • Attach probabilities to grammar rules • Expansions for a given non-terminal sum to 1 R1:VP  V .55 R2: VP  V NP .40 R3: VP  V NP NP .05 • Estimate the probabilities from annotated corpora P(R1)=counts(R1)/counts(VP) • Derivation Probability: • Derivation T= {R1…Rn} • Probability of a derivation: • Most likely probable parse: • Probability of a sentence: • Sum over all possible derivations for the sentence • Note the independence assumption: Parse probability does not change based on where the rule is expanded.

  24. S  NP VP VP  V NP NP  NP PP VP  VP PP PP  P NP NP  John | Mary | Denver V -> called P -> from S VP NP PP VP V NP P NP called from John Mary Denver Structural ambiguity John called Mary from Denver S VP NP NP V NP PP called John Mary P NP from Denver

  25. Cocke-Younger-Kasami Parser • Bottom-up parser with top-down filtering • Start State(s): (A, i, i+1) for each Awi+1 • End State: (S, 0,n) n is the input size • Next State Rules • (B, i, k) (C, k, j)  (A, i,j) if ABC

  26. Example

  27. Base Case: Aw

  28. Recursive Cases: ABC

  29. Probabilistic CKY • Assign probabilities to constituents as they are completed and placed in the table • Computing the probability • Since we are interested in the max P(S,0,n) • Use the max probability for each constituent • Maintain back-pointers to recover the parse.

  30. Problems with PCFGs • The probability model we’re using is just based on the rules in the derivation. • Lexical insensitivity: • Doesn’t use the words in any real way • Structural disambiguation is lexically driven • PP attachment often depends on the verb, its object, and the preposition • I ate pickles with a fork. • I ate pickles with relish. • Context insensitivity of the derivation • Doesn’t take into account where in the derivation a rule is used • Pronouns more often subjects than objects • She hates Mary. • Mary hates her. • Solution: Lexicalization • Add lexical information to each rule

  31. An example of lexical information: Heads • Make use of notion of the headof a phrase • Head of an NP is a noun • Head of a VP is the main verb • Head of a PP is its preposition • Each LHS of a rule in the PCFG has a lexical item • Each RHS non-terminal has a lexical item. • One of the lexical items is shared with the LHS. • If R is the number of binary branching rules in CFG, in lexicalized CFG: O(2*|∑|*|R|) • Unary rules: O(|∑|*|R|)

  32. Example (correct parse) Attribute grammar

  33. Example (less preferred)

  34. Computing Lexicalized Rule Probabilities • We started with rule probabilities • VP  V NP PP P(rule|VP) • E.g., count of this rule divided by the number of VPs in a treebank • Now we want lexicalized probabilities • VP(dumped)  V(dumped) NP(sacks)PP(in) • P(rule|VP ^ dumped is the verb ^ sacks is the head of the NP ^ in is the head of the PP) • Not likely to have significant counts in any treebank

  35. Another Example • Consider the VPs • Ate spaghetti with gusto • Ate spaghetti with marinara • Dependency is not between mother-child. Vp (ate) Vp(ate) Np(spag) Vp(ate) Pp(with) np Pp(with) v np v Ate spaghetti with marinara Ate spaghetti with gusto

  36. Log-linear models for Parsing • Why restrict to the conditioning to the elements of a rule? • Use even larger context • Word sequence, word types, sub-tree context etc. • In general, compute P(y|x); where fi(x,y) test the properties of the context; li is the weight of that feature. • Use these as scores in the CKY algorithm to find the best scoring parse.

  37. S N NP S Adj N NP VP Adv S N underground V NP now poachers control S VP NP N N S S NP VP Adv VP Det NP N N N N NP NP VP VP V NP now the e S NP Adj V V poachers trade VP underground S VP Adv control control S NP NP VP now NP N V NP trade e N e trade Supertagging: Almost parsing Poachers now control the underground trade S S NP VP S NP V NP NP VP e N V NP e poachers : : e Adj : : : underground

More Related