1 / 56

Parsing

Parsing. Discrete Mathematics and Its Applications Baojian Hua bjhua@ustc.edu.cn. Syntax Tree. A systematic way to put some program into memory data type definition + a bunch of functions programmer explicit calls them tedious and error-prone

genna
Download Presentation

Parsing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parsing Discrete Mathematics and Its Applications Baojian Hua bjhua@ustc.edu.cn

  2. Syntax Tree • A systematic way to put some program into memory • data type definition + a bunch of functions • programmer explicit calls them • tedious and error-prone • But we write programs in ASCII form, so how can we construct the tree automatically? • A technique called (automatic) parsing • A program clever enough to do this automatically

  3. Roadmap stream of characters stream of tokens abstract syntax • Lexer: eat ascii sequence, emit token sequence • Parser: eat token sequence, emit abstract syntax trees • other part: later in this course Lexer Parser other part

  4. Parsing • Take as input a sequence of terminals, and construct syntax trees automatically • Problem: how do we know whether a sequence of input tokens is valid?

  5. Example Am I valid? y m z S ::= x A | y B A ::= u C | v C B ::= t C | m C C ::= w | z Oops, mismatch! S -> x A Nonterminals: S A B C terminals: ID_X ID_Y ID_U ID_V ID_T ID_M ID_W ID_Z

  6. Example Am I valid? y m z S ::= x A | y B A ::= u C | v C B ::= t C | m C C ::= w | z Another try! S -> y B

  7. Example Am I valid? y m z S ::= x A | y B A ::= u C | v C B ::= t C | m C C ::= w | z Aha, Great! S -> y B

  8. Example Derive “m z” y m z S ::= x A | y B A ::= u C | v C B ::= t C | m C C ::= w | z Recursion S -> y B

  9. Example Derive “m z” y m z S ::= x A | y B A ::= u C | v C B ::= t C | m C C ::= w | z First try S -> y B B -> t C

  10. Example Derive “m z” y m z S ::= x A | y B A ::= u C | v C B ::= t C | m C C ::= w | z Mismatch S -> y B B -> t C

  11. Example Derive “m z” y m z S ::= x A | y B A ::= u C | v C B ::= t C | m C C ::= w | z Second try S -> y B B -> m C

  12. Example Derive “z” y m z S ::= x A | y B A ::= u C | v C B ::= t C | m C C ::= w | z Recursion S -> y B B -> m C

  13. Example Derive “z” y m z S ::= x A | y B A ::= u C | v C B ::= t C | m C C ::= w | z Mismatch S -> y B B -> m C C -> w

  14. Example Derive “z” y m z S ::= x A | y B A ::= u C | v C B ::= t C | m C C ::= w | z Second try S -> y B B -> m C C -> z

  15. Example Derive “z” y m z S ::= x A | y B A ::= u C | v C B ::= t C | m C C ::= w | z Matched. Sucess S -> y B B -> m C C -> z S y B m C z

  16. Recursive Decedent Algorithm • This process can be described by a recursive decedent algorithm • For each nonterminal, write a (recursive) parsing function • every RHS becomes a case in a big switch • function may take some semantic actions, besides parsing • later in this course

  17. Recursive Decedent Algorithm // The function interface looks like: void parseS (); void parseA (); void parseB (); void parseC (); S ::= x A | y B A ::= u C | v C B ::= t C | m C C ::= w | z

  18. Recursive Decedent Algorithm struct token t; // recall the token module t = getToken (); // and the lexer module void parserS () { switch (t) { case ID_X: t = getToken (); parseA (); return; case ID_Y: t = getToken (); parseB (); return; S ::= x A | y B A ::= u C | v C B ::= t C | m C C ::= w | z

  19. Recursive Decedent Algorithm default: // not ID_X or ID_Y error (“want ‘x’ or ‘y’); return; } } // Leave the algorithm for // parseA (), parseB () and // parseC () to you. S ::= x A | y B A ::= u C | v C B ::= t C | m C C ::= w | z

  20. Summary so Far • Recursive decedent parsing: • also called predictive parsing, or top-down parsing • simple and efficient • can be coded by hand quickly • see problem 2 in lab #4 • But the constraint is that not all formal grammar can be parsed by a recursive decedent parser • Example below

  21. Example Derive me x m z S ::= x A | x B A ::= m C | m C B ::= m z | m z C ::= w | y S -> x A or S -> x B

  22. Recursive Decedent Algorithm? struct token t; // recall the token module t = getToken (); // and the lexer module void parserS () { switch (t) { case ID_X: t = getToken (); ?????? // what code here? return; default: error (“….”); return;} } S ::= x A | x B A ::= m C | m C B ::= m z | m z C ::= w | y

  23. Another Example Derive me x = 3+4; y = x-(1+2); stm -> id = exp; | id = exp; stm exp -> exp + exp | exp - exp | num | id | (exp) stm -> id = exp; or stm -> id = exp; stm

  24. Moral • We’d introduce a notion of what’s a production’s RHS s could start with • s\in (T\/N)* • We call it a first (terminal) set, written as F[s], for string s\in (T\/N)* • Next, we first compute the first set F for given terminals or nonterminals

  25. First Set F S ::= x A | y B A ::= u C | v C B ::= t C | m C C ::= w | z F [S] = {x, y} F [A] = {u, v} F [B] = {t, m} F [C] = {w, z} // And generalize to string F [x A] = {x} F [y B] = {y} F [u C] = {u} F [v C] = {v} …

  26. First Set F S ::= x A | y B A ::= u C | v C B ::= t C | m C C ::= w | z F (S) = {x, y} F (A) = {u, v} F (B) = {t, m} F (C) = {w, z} // And generalize to string F (x A) = {x} F (y B) = {y} F (u C) = {u} F (v C) = {v} … // Then why this grammar could be parsed?

  27. Parsing Table

  28. Example S ::= x A | x B A ::= m C | m C B ::= m z | m z C ::= w | y F (S) = ? F (A) = ? F (B) = ? F (C) = ? Predicative parsing?

  29. Another Example stm -> id = exp; | id = exp; stm exp -> exp + exp | exp - exp | num | id | (exp) F (stm) = ? F (exp) = ? Predicative parsing?

  30. Empty Production Rules Z ::= d | X Y Z Y ::= c | \eps X ::= Y | a F (Z) = ? F (Y) = ? F (X) = ? Predicative parsing?

  31. Algorithm #1: nullable // nullable[X]: whether X derives \eps or not // all initialized to false repeat for each production rule X -> Y1 Y2 … Yn if (nullable[Y1, …, Yn]=true) or (n=0)) nullable[X] = true until nullable[] did not change

  32. Example Z ::= d | X Y Z Y ::= c | \eps X ::= Y | a // initialization nullable[Z, Y, X] = false // round 1 nullable[Z] = false nullable[Y] = true nullable[X] = true // round 2 nullable[Z] = false nullable[Y] = true nullable[X] = true // finished!

  33. Algorithm #2: First Set // F[X]: first terminals X could derive // all initialized to empty repeat for each production rule X -> Y1 Y2 … Yn for (i=1 to n) if (nullable[Y1, Y2, Y_{i-1}]=true or (i=1)) F[X] = F[X] \/ F[Yi] until F[] did not change

  34. Example Z ::= d | X Y Z Y ::= c | \eps X ::= Y | a // initialization F[Z, Y, X] = {} // round 1 F[Z] = {d} F[Y] = {c} F[X] = {c, a} // round 2 F[Z] = {d, c, a} F[Y] = {c} F[X] = {c, a} // round 3…

  35. Pitfalls S ::= x A B A ::= y | \eps B ::= y | \eps Try to calculate nullable[] and F[] Try to derive “x y” What’s the problem?

  36. Algorithm #3: Follow Set // W[X]: terminals may follow X // all initialized to empty repeat for each production rule X -> Y1 Y2 … Yn for (i=1 to n) for (j=i+1 to n) if (nullable[Y_{i+1}, …, Yn]=true or (i=n)) W[Yi] = W[Yi] \/ W[X] if (nullable[Y_{i+1}, …, Y_{j-1}]=true or(i+1=j)) W[Yi] = W[Yi] \/ F[Yj] until W[] did not change

  37. Example Z ::= d | X Y Z Y ::= c | \eps X ::= Y | a // initialization W[Z, Y, X] = {} // round 1 W[Z] = {EOF} W[Y] = {d, c, a} W[X] = {c, d, a} // round 2 W[Z] = {EOF} W[Y] = {d, c, a} W[X] = {d, c, a} // finished!

  38. First and Follow Together Z ::= d | X Y Z Y ::= c | \eps X ::= Y | a // first F[Z] = {d, c, a} F[Y] = {c} F[X] = {c, a} // follow W[Z] = {EOF} W[Y] = {d, c, a} W[X] = {d, c, a} // first All[X] = { F[X], if nullable[X]=false { W[X], otherwise.

  39. Parsing Table

  40. LL(1) • Grammar whole predicative parsing tables contain no duplicate entries are called LL(1) • left-to-right parse, left-to-right-derivation, 1-symbol lookahead • one pass, no backtracking, very efficient • precise error reports • For some non-LL(1) grammar, there are some standard methods to transform them • example below

  41. Eliminating Left Recursion X -> X a | c // General transforming rules: X -> X α1 | … | X αm | β1 | … | βn // to X -> β1X’ | … | βnX’ X’ ->α1 X’ | … | αm X’ | \eps X -> c X’ X’ -> a X’ | \eps

  42. Eliminating Left Recursion S -> if (E) then S else S; | if (E) then S; // General rules: X -> α X1 | … | α Xn // to X -> αX’ X’ ->X1 | … | Xn S -> if (E) then S S’ S’ -> else S; | ;

  43. Eliminating Ambiguity • In programming language syntax, ambiguity often arises from missing operator precedence or associativity • * higher precedence than +? • * and + are left associative? • Ambiguious grammar are hard to use as language syntax

  44. E E E - E E - E 15 E E 4 - E - E 15 3 3 4 Ambiguous grammars • A grammar is ambiguous if there is a sentence with >1 parse tree 15 - 3 - 4

  45. Association exp -> exp - exp | num // Derivation #1: exp -> exp - exp -> 15 - exp -> 15 - exp - exp -> 15 - 3 - exp -> 15 - 3 - 4 // Derivation #2: exp -> 15 X -> 15 - 3 X -> 15 - 3 - 4 X -> 15 - 3 - 4 exp -> num X X -> - num X | \eps

  46. Precedence exp -> exp - exp | exp * exp | num // But the derivation: 3-4*5 exp -> 3 X -> 3 - 4 X -> 3 - 4 X -> 3 - 4 * 5 X -> 3 - 4 * 5 X -> 3 - 4 * 5 \eps -> 3 - 4 * 5 // What’s the problem? exp -> num X X -> - num X | * num X | \eps

  47. Precedence exp -> exp - exp | exp * exp | num // The derivation: 3-4*5 exp -> term X -> 3 Y X -> 3 \eps X -> 3 X -> 3 - term X -> 3 - 4 Y X -> 3 - 4 * 5 Y X -> 3 - 4 * 5 \eps X -> 3 - 4 * 5 X -> 3 - 4 * 5 \eps -> 3 - 4 * 5 exp -> term X X -> - term X | \eps term -> num Y Y -> * num Y | \eps

  48. Parsing Function // Given production: X -> s // Parsing function has the form: void parseX () { trans (s); } // case analysis on possible shape of s: // 1. s == a; for some terminal a trans (a) = if (t==a) t = nextToken (); else error (“syntax error: expecting: a”);

  49. Parsing Function // Given production: X -> s void parseX () { trans (s); } // case analysis on possible shape of s: // 2. s == Y; for some nonterminal Y trans (Y) = Y ()

  50. Parsing Function // Given production: X -> s void parseX () { trans (s); } // case analysis on possible shape of s: // 3. s == s1 s2 … sn trans (s1 s2 … sn) = trans(s1) trans(s2) … trans(sn)

More Related