153 Views

Download Presentation
## The CYK Parsing Method

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**The CYK Parsing Method**Chiyo Hotani Tanya Petrova CL2 Parsing Course 28 November, 2007**Overview**• CYK Recognition with CF grammar • Basic Algorithm • Problems: unit-rules, є-rules • Recognition with a grammar in CNF • CYK Parsing with CNF • Parsing with CNF • Recognition Table • Chart Parsing • Summary • Advantages and Disadvantages • Other remarks**Basic Algorithm of CYK Recognition (1)**Example Grammar: A grammar describing numbers in scientific notation Input: 32.5e+1**Basic Algorithm of CYK Recognition (2)**Digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Sign -> + | - derivations of substrings of length 1**Basic Algorithm of CYK Recognition (3)**NumberS -> Integer | Real Integer -> Digit | Integer Digit Digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 derivations of substrings of length 1 • Unit Rule: rules of the form AB, where A and B are non-terminals. We can have chains of them in a derivation.**Basic Algorithm of CYK Recognition (4)**NumberS -> Integer | Real Integer -> Digit | Integer Digit Fraction -> . Integer Scale -> e Sign Integer | Empty**Basic Algorithm of CYK Recognition (5)**NumberS -> Integer | Real Real -> Integer Fraction Scale Number does indeed derive 32.5e+1.**Basic Algorithm of CYK Recognition (7)**• Rє = { Empty, Scale } • sentence: z = z1z2 . . . znsubstring of z starting at position i, of length l.si,l = zizi+1. . . zi+l-1 • Rsi,l: the set of non-terminals deriving the substring si,l A graphical presentation of substrings**CYK recognition with a grammar in CNF**• Required restrictions: • Eliminate є-rules and unit rules • Limit the maximum length of RHS of the rule to 2 • CNF • No є-rules and unit rules • all rules have one of the following two forms: AaABC**CYK Parsing with CNF**• Building the recognition table • Input : Our example grammar in CNF input sentence: 32.5 e + 1**CYK Parsing with the CNF**• bottom-row : read directly from the grammar (rules of the form A a )**Two Ways to Copmute a R s i,l:**• check each right-hand side • compute possible right-hand sides from the recognition table**How this is done**Example: 2.5 e ( = s 2, 4) 1) N1 not in R s 2, 1 or R s 2, 2 N1 is a member of R s 2, 3 But Scale´ is not a member of R s 5, 1 2) R s 2, 4 is the set of Non- Terminals that have a right-hand side AB where either: A in R s 2, 1 and B in R s 3, 3 A in R s 2, 2 and B in R s 4, 2 A in R s 2, 3 and B in R s 5, 1 Possible combinations: N1 T2 or Number T2 In our grammar we do not have such a right-hand side, so nothing is added to R s 2, 4.**As a result we find out that:**• This process is much less complicated than the one we saw before**Reasons**• We do not have to repeat the process again and again until no new Non-Terminals are added to R s i,l (The substrings we are dealing with are really substrings and cannot be equal to the string we start with) • We only have to find one place where the substring must be split into two A B C Here !**Chart Parsing**A chart is just a recognition table.**A short retrospective of CYK**• First: recognition table using the original grammar. • Then: transforming grammar to CNF.**A short retrospective of CYK cont.**• CNF is useful for improving the efficiency, but it is actually a bit too restrictive • Disadvantage of CNF: • Resulting recognition table lacks the information we need to construct a derivation using the original grammar!**A short retrospective of CYK cont.**• In the transformation process, some non-terminals were thrown away (non-productive) • Missing information could be added.**A short retrospective of CYK cont.**• Result: almost the same recognition table. • Extra information on non-terminals • Obtained in a simpler and much more efficient way.**Thank you**for your attention!