1 / 82

Parsing

Parsing. Chapter 15. The Job of a Parser. Given a context-free grammar G :. Examine a string and decide whether or not it is a syntactically well-formed member of L ( G ), and

varden
Download Presentation

Parsing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parsing Chapter 15

  2. The Job of a Parser Given a context-free grammar G: • Examine a string and decide whether or not it is a syntactically well-formed member of L(G), and • If it is, assign to it a parse tree that describes its structure and thus can be used as the basis for further interpretation.

  3. Problems with Solutions So Far • We want to use a natural grammar that will produce a natural parse tree. But: • decideCFLusingGrammar, requires a grammar that is in Chomsky normal form. • decideCFLusingPDA, requires a grammar that is in Greibach normal form. • We want an efficient parser. But both procedures require search and take time that grows exponentially in the length of the input string. • All either procedure does is to determine membership in L(G). It does not produce parse trees.

  4. Easy Issues • Actually building parse trees: Augment the parser with a function that builds a chunk of tree every time a rule is applied. • Using lookahead to reduce nondeterminism: It is often possible to reduce (or even eliminate) nondeterminism by allowing the parser to look ahead at the next one or more input symbols before it makes a decision about what to do.

  5. Dividing the Process • Lexical analysis: done in linear time with a DFSM • Parsing: done in, at worst O(n3) time.

  6. Lexical Analysis level = observation - 17.5; Lexical analysis produces a stream of tokens: id = id - id

  7. Specifying id with a Grammar ididentifier | integer | float identifierletteralphanum alphanumletteralphnum | digitalphnum |  integer - unsignedint | unsignedint unsignedintdigit | digit unsignedint digit0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 ….

  8. Using Reg Ex’s to Specify an FSM There exist simple tools for building lexical analyzers. The first important such tool: Lex

  9. Lex Rules Get rid of blanks and tabs: [ \t]+; Find identifiers: [A-Za-z][A-Za-z0-9]* {return(ID); } Return INTEGER and save a value: [0-9]+ {sscanf(yytext, "%d", &yylval); return (INTEGER); }

  10. Dealing with Rule Conflicts • A longer match is preferred over a shorter one. • When lengths are equal, choose the first one. • Suppose that Lex has been give the following two rules: • integer {action 1} • [a-z]+ {action 2} • Example 1: integers • Example 2: integer

  11. Parsing • Top-down parsers: • A simple but inefficient recursive descent parser. • Modifying a grammar for top-down parsing. • LL parsing. • Bottom-up parsers: • The simple but not efficient enough Cocke-Kasami-Younger (CKY) algorithm. • LR parsing. • Parsers for English and other natural languages.

  12. Top-Down, Depth-First Parsing SNPVP $ NPtheN | N | ProperNoun Ncat | dogs | bear | girl | chocolate | rifle ProperNoun Chris | Fluffy VPV | V NP Vlike | likes | thinks | shot | smells Input: the cat likes chocolate $

  13. Top-Down, Depth-First Parsing SNPVP $ NPtheN | N | ProperNoun Ncat | dogs | bear | girl | chocolate | rifle ProperNoun Chris | Fluffy VPV | V NP Vlike | likes | thinks | shot | smells Input: the cat likes chocolate $

  14. Top-Down, Depth-First Parsing SNPVP $ NPtheN | N | ProperNoun Ncat | dogs | bear | girl | chocolate | rifle ProperNoun Chris | Fluffy VPV | V NP Vlike | likes | thinks | shot | smells Input: the cat likes chocolate $

  15. Top-Down, Depth-First Parsing SNPVP $ NPtheN | N | ProperNoun Ncat | dogs | bear | girl | chocolate | rifle ProperNoun Chris | Fluffy VPV | V NP Vlike | likes | thinks | shot | smells Input: the cat likes chocolate $

  16. Top-Down, Depth-First Parsing SNPVP $ NPtheN | N | ProperNoun Ncat | dogs | bear | girl | chocolate | rifle ProperNoun Chris | Fluffy VPV | V NP Vlike | likes | thinks | shot | smells Input: the cat likes chocolate $

  17. Top-Down, Depth-First Parsing SNPVP $ NPtheN | N | ProperNoun Ncat | dogs | bear | girl | chocolate | rifle ProperNoun Chris | Fluffy VPV | V NP Vlike | likes | thinks | shot | smells Input: the cat likes chocolate $ Fail

  18. Top-Down, Depth-First Parsing SNPVP $ NPtheN | N | ProperNoun Ncat | dogs | bear | girl | chocolate | rifle ProperNoun Chris | Fluffy VPV | V NP Vlike | likes | thinks | shot | smells Input: the cat likes chocolate $ Backup to:

  19. Top-Down, Depth-First Parsing SNPVP $ NPtheN | N | ProperNoun Ncat | dogs | bear | girl | chocolate | rifle ProperNoun Chris | Fluffy VPV | V NP Vlike | likes | thinks | shot | smells Input: the cat likes chocolate $

  20. Top-Down, Depth-First Parsing SNPVP $ NPtheN | N | ProperNoun Ncat | dogs | bear | girl | chocolate | rifle ProperNoun Chris | Fluffy VPV | V NP Vlike | likes | thinks | shot | smells Input: the cat likes chocolate $ Built, unbuilt, built again

  21. Building and Discarding Subtrees NPtheNominal | Nominal | ProperNoun | NPPP NominalN | AdjsN AdjsAdvAdjs | Adjs and Adjs | AdjAdjs | Adj Nstudent | raincoat Adjtall | self-possessed | green Advstrikingly PPPrepNP Prepwith the strikingly tall and self-possessed student with the green raincoat

  22. Left-Recursive Rules EE + T ET TTF TF F (E) Fid On input: id + id + id : Then: And so forth.

  23. Removing Left-Recursive Rules

  24. Modifying the Expression Grammar EE + T ET TTF TF F (E) Fid ETE E + TE E TFT TFT T F (E) F id becomes

  25. Indirect Left Recursion SYa YSa Y This form too can be eliminated.

  26. But There is a Price

  27. Using Lookahead and Left Factoring Goal: Procrastinate branching as long as possible. To do that, we will: • Change the parsing algorithm so that it exploits the ability to look one symbol ahead in the input before it makes a decision about what to do next, and • Change the grammar to help the parser procrastinate decisions.

  28. Exploiting Lookahead (1) F (E) (2) Fid (3) Fid(E) Looking ahead one character makes it possible to choose between rule (1) and rules(2)/(3). But how is it possible to choose between (2) and (3)?

  29. Left Factoring (1) F (E) (2) Fid (3) Fid(E) (1) F (E) (1.5) F id X (2) X (3) X (E) becomes More generally: A1 A2 … An AA' A'1 A'2 … A'n becomes

  30. Predictive Parsing • It will be possible to build a predictive top-down parser for a grammar G iff: • Every string that is generated by G has a unique • left-most derivation, and • It is possible to determine each step in that derivation by • looking ahead some fixed number k of characters. • In this case, we say that G is LL(k).

  31. LL(k) Grammars • An LL(k) grammar allows a predictive parser: • that scans its input Left to right • to build a Left-most derivation • if it is allowed k lookahead symbols. • Every LL(k) grammar is unambiguous (because every string it generates has a unique left-most derivation). • But not every unambiguous grammar is LL(k).

  32. Two Important Functions • first() is the set of terminal symbols that can occur as the first symbol in any string derived from  using RG. If  derives , then first(). • follow(A) is the set of all terminal symbols that can immediately follow whatever A produces in some string in L(G).

  33. Computing First and Follow SAXB$ AaA |  Xc |  BbB |  first(S) = {a, c, b, $}. first(A) = {a, }. first(AX) = {a, c, }. first(AXB) = {a, c, b, }. follow(S) = . follow(A) = {c, b, $}. follow(X) = {b, $}. follow(B) = {$}.

  34. When is a Grammar LL(1)? Whenever G contains two competing rules A and A, all of the following are true: • No terminal symbol is an element of both first() and first(). •  cannot be derived from both of  and . • If  can be derived from one of  or , assume it is . Then there may be two competing derivations: S1A2 and S1A2 12 12 12 So there must be no terminal symbol that is an element of both follow(A) and first().

  35. Not Every CF Language is LL(k) • No inherently ambiguous language is LL(k). • Some others aren’t either: • {anbncmd : n, m 0}  {anbmcme : n, m 0} • {anbn, n 0}  {ancn, n 0} (deterministic CF)

  36. Recursive Descent Parsing ABA | a BbB | b A(n: parse tree node labeled A) = case (lookahead = b : /* Use ABA. Invoke B on a new daughter node labeled B. Invoke A on a new daughter node labeled A. lookahead = a : /* Use Aa. Create a new daughter node labeled a.

  37. Table-Driven LL(1) Parsing SAB$ | AC$ AaA | a BbB | b Cc

  38. Bottom-Up Parsing • Cocke-Kasami-Younger (CKY) • Shift-reduce parsing • LR(1) parsing

  39. CKY • Bottom-up • Chart parser • Dynamic programming • Grammar in Chomsky Normal form Row 5 Row 4 Row 3 Row 2 Row 1 id + id  id

  40. Exploiting Chomsky Normal Form • All rules have one of the following two forms: • Xa, where a, or • XBC, where B and C are elements of V - . • So we need two techniques for filling in T: • To fill in row 1, use rules of the form Xa. • To fill in rows 2 through n, use rules of the form XBC.

  41. The CKY Algorithm /* Fill in the first (bottom-most) row of T. For j = 1 to n do: If G contains the rule Xaj, then add X to T[1, j]. /* Fill in the remaining rows, starting with row 2. For i = 2 to n do: For j = 1 to n-i+1 do: For k = 1 to i-1 do: For each rule XYZ do: If YT[k, j] and ZT[i-k, j+k], then: #### Insert X into T[i, j]. If SGT[n, 1] then accept else reject.

  42. A CKY Example Consider parsing the string aab with the grammar: SA B AA A Aa Ba Bb CKY begins by filling in the bottom row of T as follows: Row 3 Row 2 Row 1 Input string a a b

  43. A CKY Example SA B AA A Aa Ba Bb Row 3 Row 2 Row 1 Input string a a b

  44. The Complexity of CKY /* Fill in the first (bottom-most) row of T. For j = 1 to n do: If G contains the rule Xaj, then add X to T[1, j]. /* Fill in the remaining rows, starting with row 2. For i = 2 to n do: For j = 1 to n-i+1 do: For k = 1 to i-1 do: For each rule XYZ do: If YT[k, j] and ZT[i-k, j+k], then: Insert X into T[i, j]. If SGT[n, 1] then accept else reject. O(n) n – 1 n/2 n/2 O(|G|) O(1) O(n3)

  45. Context-Free Parsing and Matrix Multiplication • CF parsing can be described as Boolean matrix multiplication. • Strassen’s algorithm: O(n2.807) • Coppersmith-Winograd algorithm O(n2.376) • Boolean matrix multiplication can be described as CF-parsing. • If P is a O(gn3-) CF parser, then P can be efficiently converted into a O(n3-/3) matrix multiplier.

  46. Shift-Reduce Parsing A bottom-up left-to-right parser that can do two things: • Shift an input symbol onto the parser’s stack and build, in the parse tree, a terminal node labeled with that input symbol. • Reduce a string of symbols from the top of the stack to a nonterminal symbol, using one of the rules of the grammar. Each time it does this, it also builds the corresponding piece of the parse tree.

  47. A Shift-Reduce Example Parse: id + id  id Using: (1) EE+T (2) ET (3) TT*F (4) TF (5) F(E) (6) Fid

  48. A Shift-Reduce Example (1) EE+T (2) ET (3) TT*F (4) TF (5) F(E) (6) Fid id + id  id

  49. A Shift-Reduce Example (1) EE+T (2) ET (3) TT*F (4) TF (5) F(E) (6) Fid id + id  id

  50. A Shift-Reduce Example (1) EE+T (2) ET (3) TT*F (4) TF (5) F(E) (6) Fid id+ id id

More Related