1 / 60

Ambiguity, Precedence, Associativity & Top-Down Parsing

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10 (From slides by G. Necula & R. Bodik). Administrivia. Team assignments this evening for all those not listed as having one. HW#3 is now available, due next Tuesday morning (Monday is a holiday). Remaining Issues.

hedwig
Download Presentation

Ambiguity, Precedence, Associativity & Top-Down Parsing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ambiguity, Precedence, Associativity& Top-Down Parsing Lecture 9-10 (From slides by G. Necula & R. Bodik) Prof. Hilfinger CS164 Lecture 9

  2. Administrivia • Team assignments this evening for all those not listed as having one. • HW#3 is now available, due next Tuesday morning (Monday is a holiday). Prof. Hilfinger CS164 Lecture 9

  3. Remaining Issues • How do we find a derivation of s ? • Ambiguity: what if there is more than one parse tree (interpretation) for some string s ? • Errors: what if there is no parse tree for some string s ? • Given a derivation, how do we construct an abstract syntax tree from it? Prof. Hilfinger CS164 Lecture 9

  4. Ambiguity • Grammar E  E + E | E * E | ( E ) | int • Strings int + int + int int * int + int Prof. Hilfinger CS164 Lecture 9

  5. Ambiguity. Example The string int + int + int has two parse trees E E E + E E E + E E int int E + E + int int int int + is left-associative Prof. Hilfinger CS164 Lecture 9

  6. Ambiguity. Example The string int * int + int has two parse trees E E E + E E E * E E int int E + E * int int int int * has higher precedence than + Prof. Hilfinger CS164 Lecture 9

  7. Ambiguity (Cont.) • A grammar is ambiguous if it has more than one parse tree for some string • Equivalently, there is more than one rightmost or leftmost derivation for some string • Ambiguity is bad • Leaves meaning of some programs ill-defined • Ambiguity is common in programming languages • Arithmetic expressions • IF-THEN-ELSE Prof. Hilfinger CS164 Lecture 9

  8. Dealing with Ambiguity • There are several ways to handle ambiguity • Most direct method is to rewrite the grammar unambiguously E  E + T | T T  T * int | int | ( E ) • Enforces precedence of * over + • Enforces left-associativity of + and * Prof. Hilfinger CS164 Lecture 9

  9. Ambiguity. Example The int * int + int has only one parse tree now E E E + T E E * int int E + E T int int T int * int Prof. Hilfinger CS164 Lecture 9

  10. Ambiguity: The Dangling Else • Consider the grammar E  if E then E | if E then E else E | OTHER • This grammar is also ambiguous Prof. Hilfinger CS164 Lecture 9

  11. if if E1 if E1 E4 if E2 E3 E2 E3 E4 The Dangling Else: Example • The expression if E1 then if E2 then E3 else E4 has two parse trees • Typically we want the second form Prof. Hilfinger CS164 Lecture 9

  12. The Dangling Else: A Fix • else matches the closest unmatched then • We can describe this in the grammar (distinguish between matched and unmatched “then”) E  MIF /* all then are matched */ | UIF /* some then are unmatched */ MIF  if E then MIF else MIF | OTHER UIF  if E then E | if E then MIF else UIF • Describes the same set of strings Prof. Hilfinger CS164 Lecture 9

  13. if if E1 if E1 E4 if E2 E3 E2 E3 E4 The Dangling Else: Example Revisited • The expression if E1 then if E2 then E3 else E4 • A valid parse tree (for a UIF) • Not valid because the then expression is not a MIF Prof. Hilfinger CS164 Lecture 9

  14. Ambiguity • Impossible to convert automatically an ambiguous grammar to an unambiguous one • Used with care, ambiguity can simplify the grammar • Sometimes allows more natural definitions • But we need disambiguation mechanisms • Instead of rewriting the grammar • Use the more natural (ambiguous) grammar • Along with disambiguating declarations • Most tools allow precedence and associativity declarations to disambiguate grammars • Examples … Prof. Hilfinger CS164 Lecture 9

  15. E E E + E E E + E E int int E + E + int int int int Associativity Declarations • Consider the grammar E E + E | int • Ambiguous: two parse trees of int + int + int • Left-associativity declaration: %left ‘+’ Prof. Hilfinger CS164 Lecture 9

  16. E E E * E E E + E E int int E * E + int int int int Precedence Declarations • Consider the grammar E E + E | E * E | int • And the string int + int * int • Precedence declarations: %left ‘+’ • %left ‘*’ Prof. Hilfinger CS164 Lecture 9

  17. A t1 B t4 C D t3 t2 t4 How It’s Done I: Intro to Top-Down Parsing • Terminals are seen in order of appearance in the token stream: t1 t2 t3 t4 t5 • The parse tree is constructed • From the top • From left to right • … As for leftmost derivation Prof. Hilfinger CS164 Lecture 9

  18. Top-down Depth-First Parsing • Consider the grammar E  T + E | T T  ( E ) | int | int * T • Token stream is: int * int • Start with top-level non-terminal E • Try the rules for E in order Prof. Hilfinger CS164 Lecture 9

  19. Depth-First Parsing. Example int * int • Start with start symbol E • Try E  T + ET + E • Then try a rule for T ( E) (E) + E • But (  input int; backtrack to T + E • TryT int . Token matches. int + E • But + input *; back to T + E • Try T int * Tint*T+E • But (skipping some steps) + can’t be matched • Must backtrack to E Prof. Hilfinger CS164 Lecture 9

  20. E T int T * int Depth-First Parsing. Example int * int • Try E  T • Follow same steps as before for T • And succeed with T int * T and T  int • Withthe following parse tree Prof. Hilfinger CS164 Lecture 9

  21. Depth-First Parsing • Parsing: given a string of tokens t1 t2 ... tn, find a leftmost derivation (and thus, parse tree) • Depth-first parsing: Beginning with start symbol, try each production exhaustively on leftmost non-terminal in current sentential form and recurse. Prof. Hilfinger CS164 Lecture 9

  22. Depth-First Parsing of t1t2 … tn • At a given moment, have sentential form that looks like this: t1 t2 … tk A …, 0kn • Initially, k=0 and A… is just start symbol • Try a production for A: if A  BC is a production, the new form is t1 t2 … tk B C … • Backtrack when leading terminals aren’t prefix of t1t2 … tn and try another production • Stop when no more non-terminals and terminals all matched (accept) or no more productions left (reject) Prof. Hilfinger CS164 Lecture 9

  23. When Depth-First Doesn’t Work Well • Consider productions S  S a | a: • In the process of parsing S we try the above rules • Applied consistently in this order, get infinite loop • Could re-order productions, but search will have lots of backtracking and general rule for ordering is complex • Problem here is left-recursive grammar: one that has a non-terminal S S + S for some  Prof. Hilfinger CS164 Lecture 9

  24. Elimination of Left Recursion • Consider the left-recursive grammar S S  |  • S generates all strings starting with a  and followed by any number of ’s • Can rewrite using right-recursion S  S’ S’   S’ |  Prof. Hilfinger CS164 Lecture 9

  25. Elimination of left Recursion. Example • Consider the grammar S  1 | S 0 (  = 1 and  = 0 ) can be rewritten as S  1 S’ S’  0 S’ |  Prof. Hilfinger CS164 Lecture 9

  26. More Elimination of Left Recursion • In general S S 1 | … | S n | 1 | … | m • All strings derived from S start with one of 1,…,mand continue with several instances of1,…,n • Rewrite as S 1 S’ | … | m S’ S’ 1 S’ | … | n S’ |  Prof. Hilfinger CS164 Lecture 9

  27. General Left Recursion • The grammar S A  |  (1) A  S  (2) is also left-recursive because S+ S   • This left recursion can also be eliminated by first substituting (2) into (1) • There is a general algorithm (e.g. Aho, Sethi, Ullman §4.3) • But personally, I’d just do this by hand. Prof. Hilfinger CS164 Lecture 9

  28. An Alternative Approach • Instead of reordering or rewriting grammar, can use top-downbreadth-first search. S  S a | a String: aaa S S a a (not all matched) S a a a a S a a a a a a Prof. Hilfinger CS164 Lecture 9

  29. Summary of Top-Down Parsing So Far • Simple and general parsing strategy • Left recursion must be eliminated first • … but that can be done automatically • Or can use breadth-first search • But backtracking (depth-first) or maintaining list of possible sentential forms (breadth-first) can make it slow • Often, though, we can avoid both … Prof. Hilfinger CS164 Lecture 9

  30. Predictive Parsers • Modification of depth-first parsing in which parser “predicts” which production to use • By looking at the next few tokens • No backtracking • Predictive parsers accept LL(k) grammars • L means “left-to-right” scan of input • L means “leftmost derivation” • k means “need  k tokens of lookahead to predict” • In practice, LL(1) is used Prof. Hilfinger CS164 Lecture 9

  31. Recursive Descent: Grammar as Program • In recursive descent, we think of a grammar as a program. Here, we’ll look at predictive version. • Each non-terminal is turned into a procedure • Each right-hand side transliterated into part of the procedure body for its non-terminal • First, define • next() current token of input • scan(t) check that next()=t(else ERROR), and then read new token. Prof. Hilfinger CS164 Lecture 9

  32. Recursive Descent: Example ($ = end marker) P  S $ S  T S’ S’  + S |  T  int | ( S ) def P (): S(); scan($) def S(): T(); S’() def S’(): if next() == ‘+’: scan(‘+’); S() elif next() in [‘)’, $]: pass else: ERROR def T(): if next () == int: scan(int) elif next() == ‘(‘: scan(‘(‘); S(); scan (‘)’) else: ERROR predict But where do prediction tests come from? Prof. Hilfinger CS164 Lecture 9

  33. Predicting Right-hand Sides • The if-tests are conditions by which parser predicts which right-hand side to use. • In our example, used only next symbol (LL(1)); but could use more. • Can be specified as a 2D table • One dimension for current non-terminal to expand • One dimension for next token • A table entry contains one production Prof. Hilfinger CS164 Lecture 9

  34. But First: Left Factoring • With the grammar E  T + E | T T  int | int * T | ( E ) • Impossible to predict because • For T two productions start with int • For E it is not clear how to predict • A grammar must be left-factored before use for predictive parsing Prof. Hilfinger CS164 Lecture 9

  35. Left-Factoring Example • Starting with the grammar E  T + E | T T  int | int * T | ( E ) • Factor out common prefixes of productions • E  T X • X  + E |  • T  ( E ) | int Y • Y  * T |  Prof. Hilfinger CS164 Lecture 9

  36. Recursive Descent for Real • So far, have presented a purist view. • In fact, use of recursive descent makes life simpler in many ways if we “cheat” a bit. • Here’s how you really handle left recursion in recursive descent, for S  S A | R: def S(): R () while next()  FIRST(A): A() • It’s a program: all kinds of shortcuts possible. Prof. Hilfinger CS164 Lecture 9

  37. LL(1) Parsing Table Example • Left-factored grammar E  T X X  + E |  T  ( E ) | int Y Y  * T |  • The LL(1) parsing table ($ is a special end marker): Prof. Hilfinger CS164 Lecture 9

  38. LL(1) Parsing Table Example (Cont.) • Consider the [E, int] entry • Means “When current non-terminal is E and next input is int, use production E  T X” • T X can generate string with int in front • Consider the [Y,+] entry • “When current non-terminal is Y and current token is +, get rid of Y” • We’ll see later why this is right Prof. Hilfinger CS164 Lecture 9

  39. LL(1) Parsing Tables. Errors • Blank entries indicate error situations • Consider the [E,*] entry • “There is no way to derive a string starting with * from non-terminal E” Prof. Hilfinger CS164 Lecture 9

  40. Using Parsing Tables • Essentially table-driven recursive descent, except • We use a stack to keep track of pending non-terminals • We reject when we encounter an error state • We accept when we encounter end-of-input Prof. Hilfinger CS164 Lecture 9

  41. LL(1) Parsing Algorithm initialize stack = <S,$> repeat case stack of <X, rest> : if T[X,next()] == Y1…Yn: stack  <Y1… Yn rest>; else: error (); <t, rest> : scan (t); stack  <rest>; until stack == < > Prof. Hilfinger CS164 Lecture 9

  42. LL(1) Parsing Example Stack Input Action E $ int * int $ T X T X $ int * int $ int Y int Y X $ int * int $ terminal Y X $ * int $ * T * T X $ * int $ terminal T X $ int $ int Y int Y X $ int $ terminal Y X $ $  X $ $  $ $ ACCEPT Prof. Hilfinger CS164 Lecture 9

  43. Constructing Parsing Tables • LL(1) languages are those definable by a parsing table for the LL(1) algorithm such that no table entry is multiply defined • Once we have the table • Can create table-driver or recursive-descent parser • The parsing algorithms are simple and fast • No backtracking is necessary • We want to generate parsing tables from CFG Prof. Hilfinger CS164 Lecture 9

  44. E T + Predicting Productions I • Top-down parsing expands a parse tree from the start symbol to the leaves • Always expand the leftmost non-terminal E int * int + int Prof. Hilfinger CS164 Lecture 9

  45. E T + T int * Predicting Productions II • Top-down parsing expands a parse tree from the start symbol to the leaves • Always expand the leftmost non-terminal E • The leaves at any point form a string bAg • b contains only terminals • The input string is bbd • The prefix b matches • The next token is b int *int + int Prof. Hilfinger CS164 Lecture 9

  46. E T + T T int * int Predicting Productions III • Top-down parsing expands a parse tree from the start symbol to the leaves • Always expand the leftmost non-terminal E • The leaves at any point form a string bAg(A=T,g • b contains only terminals • gcontains any symbols • The input string is bbd(b=int) • So Agmust derive bd int * int +int Prof. Hilfinger CS164 Lecture 9

  47. E T + T T int * int int Predicting Productions IV • Top-down parsing expands a parse tree from the start symbol to the leaves • Always expand the leftmost non-terminal E • So choose production for T that can eventually derive something that starts withint int * int + int Prof. Hilfinger CS164 Lecture 9

  48. Predicting Productions V • Go back to previous grammar, with E  T X X  + E |  T  ( E ) | int Y Y  * T |  E Here, YX must match + int Since + int doesn’t start with *, can’t use Y  * T T X intY int + int Prof. Hilfinger CS164 Lecture 9

  49. Predicting Productions V • Go back to previous grammar, with E  T X X  + E |  T  ( E ) | int Y Y  * T |  E But + can follow Y, so we choose Y   T X intY int + int Prof. Hilfinger CS164 Lecture 9

  50. FIRST and FOLLOW • To summarize, if we are trying to predict how to expandA with b the next token, then either: • b belongs to an expansion of A • Any A a can be used if b can start a string derived from a • In this case we say that b  First(a) • or b does not belong to an expansion of A, A has an expansion that derives , and b belongs to something that can follow A (so S *bAbw) • We say that b  Follow(A) in this case. Prof. Hilfinger CS164 Lecture 9

More Related