1 / 30

# Probabilistic and Lexicalized Parsing

Probabilistic and Lexicalized Parsing. Probabilistic CFGs. Weighted CFGs Attach weights to rules of CFG Compute weights of derivations Use weights to pick, preferred parses Utility: Pruning and ordering the search space, disambiguate, Language Model for ASR.

## Probabilistic and Lexicalized Parsing

E N D

### Presentation Transcript

1. Probabilistic and Lexicalized Parsing

2. Probabilistic CFGs • Weighted CFGs • Attach weights to rules of CFG • Compute weights of derivations • Use weights to pick, preferred parses • Utility: Pruning and ordering the search space, disambiguate, Language Model for ASR. • Parsing with weighted grammars (like Weighted FA) • T* = arg maxT W(T,S) • Probabilistic CFGs are one form of weighted CFGs.

3. Probability Model • Rule Probability: • Attach probabilities to grammar rules • Expansions for a given non-terminal sum to 1 R1:VP  V .55 R2: VP  V NP .40 R3: VP  V NP NP .05 • Estimate the probabilities from annotated corpora P(R1)=counts(R1)/counts(VP) • Derivation Probability: • Derivation T= {R1…Rn} • Probability of a derivation: • Most likely probable parse: • Probability of a sentence: • Sum over all possible derivations for the sentence • Note the independence assumption: Parse probability does not change based on where the rule is expanded.

4. S  NP VP VP  V NP NP  NP PP VP  VP PP PP  P NP NP  John | Mary | Denver V -> called P -> from S VP NP PP VP V NP P NP called from John Mary Denver Structural ambiguity John called Mary from Denver S VP NP NP V NP PP called John Mary P NP from Denver

5. Cocke-Younger-Kasami Parser • Bottom-up parser with top-down filtering • Start State(s): (A, i, i+1) for each Awi+1 • End State: (S, 0,n) n is the input size • Next State Rules • (B, i, k) (C, k, j)  (A, i,j) if ABC

6. Example

7. Base Case: Aw

8. Recursive Cases: ABC

9. Probabilistic CKY • Assign probabilities to constituents as they are completed and placed in the table • Computing the probability • Since we are interested in the max P(S,0,n) • Use the max probability for each constituent • Maintain back-pointers to recover the parse.

10. Problems with PCFGs • The probability model we’re using is just based on the rules in the derivation. • Lexical insensitivity: • Doesn’t use the words in any real way • Structural disambiguation is lexically driven • PP attachment often depends on the verb, its object, and the preposition • I ate pickles with a fork. • I ate pickles with relish. • Context insensitivity of the derivation • Doesn’t take into account where in the derivation a rule is used • Pronouns more often subjects than objects • She hates Mary. • Mary hates her. • Solution: Lexicalization • Add lexical information to each rule

11. An example of lexical information: Heads • Make use of notion of the headof a phrase • Head of an NP is a noun • Head of a VP is the main verb • Head of a PP is its preposition • Each LHS of a rule in the PCFG has a lexical item • Each RHS non-terminal has a lexical item. • One of the lexical items is shared with the LHS. • If R is the number of binary branching rules in CFG, in lexicalized CFG: O(2*|∑|*|R|) • Unary rules: O(|∑|*|R|)

12. Example (correct parse) Attribute grammar

13. Example (less preferred)

14. Computing Lexicalized Rule Probabilities • We started with rule probabilities • VP  V NP PP P(rule|VP) • E.g., count of this rule divided by the number of VPs in a treebank • Now we want lexicalized probabilities • VP(dumped)  V(dumped) NP(sacks)PP(in) • P(rule|VP ^ dumped is the verb ^ sacks is the head of the NP ^ in is the head of the PP) • Not likely to have significant counts in any treebank

15. Another Example • Consider the VPs • Ate spaghetti with gusto • Ate spaghetti with marinara • Dependency is not between mother-child. Vp (ate) Vp(ate) Np(spag) Vp(ate) Pp(with) np Pp(with) v np v Ate spaghetti with marinara Ate spaghetti with gusto

16. Log-linear models for Parsing • Why restrict to the conditioning to the elements of a rule? • Use even larger context • Word sequence, word types, sub-tree context etc. • In general, compute P(y|x); where fi(x,y) test the properties of the context; li is the weight of that feature. • Use these as scores in the CKY algorithm to find the best scoring parse.

17. S N NP S Adj N NP VP Adv S N underground V NP now poachers control S VP NP N N S S NP VP Adv VP Det NP N N N N NP NP VP VP V NP now the e S NP Adj V V poachers trade VP underground S VP Adv control control S NP NP VP now NP N V NP trade e N e trade Supertagging: Almost parsing Poachers now control the underground trade S S NP VP S NP V NP NP VP e N V NP e poachers : : e Adj : : : underground

18. Summary • Parsing context-free grammars • Top-down and Bottom-up parsers • Mixed approaches (CKY, Earley parsers) • Preferences over parses using probabilities • Parsing with PCFG and PCKY algorithms • Enriching the probability model • Lexicalization • Log-linear models for parsing

More Related