1 / 32

CYK ) Cocke-Younger-Kasami) Parsing Algorithm

دانشگاه صنعتی امیر کبیر دانشکده مهندسی کامپیوتر. CYK ) Cocke-Younger-Kasami) Parsing Algorithm. سید محمد حسین معطر پردازش زبان طبیعی. Parsing Algorithms. CFGs are basis for describing (syntactic) structure of NL sentences Thus - Parsing Algorithms are core of NL analysis systems

mervin
Download Presentation

CYK ) Cocke-Younger-Kasami) Parsing Algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. دانشگاه صنعتی امیر کبیر دانشکده مهندسی کامپیوتر CYK )Cocke-Younger-Kasami)Parsing Algorithm سید محمد حسین معطر پردازش زبان طبیعی

  2. Parsing Algorithms • CFGs are basis for describing (syntactic) structure of NL sentences • Thus - Parsing Algorithms are core of NL analysis systems • Recognition vs. Parsing: • Recognition - deciding the membership in the language: • Parsing – Recognition+ producing a parse tree forit • Parsing is more “difficult” than recognition? (time complexity) • Ambiguity - an input may have exponentially manyparses

  3. Parsing Algorithms • Parsing General CFLs vs. Limited Forms • Efficiency: • Deterministic (LR) languages can be parsed in linear time • A number of parsing algorithms for general CFLs requireO(n3)time • Asymptotically best parsing algorithm for general CFLs requires O(n2.37), but is not practical • Utility - why parse general grammars and not just CNF? • Grammar intended to reflect actual structure of language • Conversion to CNF completely destroys the parse structure

  4. CYK )Cocke-Younger-Kasami) • One of the earliest recognition and parsing algorithms • The standard version of CYK can only recognize languages defined by context-free grammars in Chomsky Normal Form (CNF). • It is also possible to extend the CYK algorithm to handle some grammars which are not in CNF • Harder to understand • Based on a “dynamic programming” approach: • Build solutions compositionally from sub-solutions • Store sub-solutions and re-use them whenever necessary • Uses the grammar directly (no PDA is used) • Recognition version: decide whether S == > w ?

  5. CYK Algorithm • The CYK algorithm for the membership problem is as follows: • Let the input string be a sequence of n letters a1 ... an. • Let the grammar contain r terminal and nonterminal symbols R1 ... Rr, and let R1 be the start symbol. • Let P[n,n,r] be an array of booleans. Initialize all elements of P to false. • For each i = 1 to n • For each unit production Rj -> ai, set P[i,1,j] = true. • For each i = 2 to n -- Length of span • For each j = 1 to n-i+1 -- Start of span • For each k = 1 to i-1 -- Partition of span • For each production RA -> RB RC • If P[j,k,B] and P[j+k,i-k,C] then set P[j,i,A] = true • If P[1,n,1] is true • Then string is member of language • Else string is not member of language

  6. CYK Pseudocode On input x = x1x2 … xn : for (i = 1 to n) //create middle diagonal for (each var. A) if(Axi) add A to table[i-1][i] for (d = 2 to n) // d’th diagonal for (i = 0 to n-d) for (k = i+1 to i+d-1) for (each var. A) for(each var. B in table[i][k]) for(each var. C in table[k][k+d]) if(ABC) add A to table[i][k+d] return Stable[0][n] ? ACCEPT : REJECT

  7. CYK Algorithm • this algorithm considers every possible consecutive subsequence of the sequence of letters and sets P[i,j,k] to be true if the sequence of letters starting from i of length j can be generated from Rk. • Once it has considered sequences of length 1, it goes on to sequences of length 2, and so on. • For subsequences of length 2 and greater, it considers every possible partition of the subsequence into two halves, and checks to see if there is some production P -> Q R such that Q matches the first half and R matches the second half. If so, it records P as matching the whole subsequence. • Once this process is completed, the sentence is recognized by the grammar if the subsequence containing the entire string is matched by the start symbol

  8. CYK Algorithm for Deciding Context Free Languages Q: Consider the grammar G given by S  e | AB | XB T  AB | XB X  AT A  a B  b • Is x = aaabb in L(G ) • Is x = aaabbb in L(G )

  9. CYK Algorithm for Deciding Context Free Languages The algorithm is “bottom-up” in that we start with bottom of derivation tree. S  e | AB | XB T  AB | XB X  AT A  a B  b a a a b b

  10. CYK Algorithm for Deciding Context Free Languages 1) Write variables for all length 1 substrings S  e | AB | XB T  AB | XB X  AT A  a B  b a a a b b A A A B B

  11. CYK Algorithm for Deciding Context Free Languages 2) Write variables for all length 2 substrings S e | AB| XB T  AB| XB X  AT A  a B  b a a a b b A A A B B S,T T

  12. CYK Algorithm for Deciding Context Free Languages 3) Write variables for all length 3 substrings S  e | AB | XB T  AB | XB X  AT A  a B  b a a a b b A A A B B S,T T X

  13. CYK Algorithm for Deciding Context Free Languages 4) Write variables for all length 4 substrings S e | AB | XB T AB | XB X  AT A  a B  b a a a b b A A A B B S,T T X S,T

  14. CYK Algorithm for Deciding Context Free Languages • Write variables for all length 5 substrings. S  e | AB | XB T  AB | XB X  AT A  a B  b REJECT! a a a b b A A A B B S,T T X S,T X

  15. CYK Algorithm for Deciding Context Free Languages Now look at aaabbb : S  e | AB | XB T  AB | XB X  AT A  a B  b a a a b b b

  16. CYK Algorithm for Deciding Context Free Languages 1) Write variables for all length 1 substrings. S  e | AB | XB T  AB | XB X  AT A  a B  b a a a b b b A A A B B B

  17. CYK Algorithm for Deciding Context Free Languages 2) Write variables for all length 2 substrings. S e | AB| XB T  AB| XB X  AT A  a B  b a a a b b b A A A B B B S,T

  18. CYK Algorithm for Deciding Context Free Languages 3) Write variables for all length 3 substrings. S  e | AB | XB T  AB | XB X  AT A  a B  b a a a b b b A A A B B B S,T T X

  19. CYK Algorithm for Deciding Context Free Languages 4) Write variables for all length 4 substrings. S e | AB | XB T AB | XB X  AT A  a B  b a a a b b b A A A B B B S,T T X S,T

  20. CYK Algorithm for Deciding Context Free Languages 5) Write variables for all length 5 substrings. S  e | AB | XB T  AB | XB X  AT A  a B  b a a a b b b A A A B B B S,T T X S,T X

  21. CYK Algorithm for Deciding Context Free Languages 6) Write variables for all length 6 substrings. Se | AB | XB T AB | XB X  AT A  a B  b S is included so aaabbb accepted! a a a b b b A A A B B B S,T T X S,T X S,T

  22. CYK Algorithm for Deciding Context Free Languages Can also use a table for same purpose.

  23. CYK Algorithm for Deciding Context Free Languages 1. Variables for length 1 substrings.

  24. CYK Algorithm for Deciding Context Free Languages 2. Variables for length 2 substrings.

  25. CYK Algorithm for Deciding Context Free Languages 3. Variables for length 3 substrings.

  26. CYK Algorithm for Deciding Context Free Languages 4. Variables for length 4 substrings.

  27. CYK Algorithm for Deciding Context Free Languages 5. Variables for length 5 substrings.

  28. CYK Algorithm for Deciding Context Free Languages 6. Variables for aaabbb. ACCEPTED!

  29. Parsing results • We keep the results for every wijin a table. • Note that we only need to fill in entries up to the diagonal – thelongest substring starting ati is of length n-i+1

  30. Constructing parse tree • we need to construct parse trees for string w: • Idea: • Keep back-pointers to the table entries that we combine • At the end - reconstruct a parse from the back-pointers • This allows us to find all parse trees

  31. Ambiguity Efficient Representation of Ambiguities • Local Ambiguity Packing : • a Local Ambiguity - multiple ways to derive the samesubstring from a non-terminal • All possible ways to derive each non-terminalare storedtogether • When creating back-pointers, create a single back-pointerto the “packed” representation • Allows to efficiently represent a very large number ofambiguities (even exponentially many) • Unpacking - producing one or more of the packed parse treesby following the back-pointers.

  32. References • Hopcroft and Ullman,“Intro. to Automata Theory, Lang. and Comp.”Section 6.3, pp. 139-141 • “CYK algorithm ” , Wikipedia, the free encyclopedia • A representationbyZeph Grunschlag

More Related