1 / 48

Parsing: LR(1) and LALR(1) Methods

This lecture covers LR(1) and LALR(1) parsing techniques, including constructing transition diagrams and parser tables, handling conflicts, and eliminating ambiguities.

seabrook
Download Presentation

Parsing: LR(1) and LALR(1) Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fall 2016-2017 Compiler PrinciplesLecture 4: Parsing part 3 Roman Manevich Ben-Gurion University of the Negev

  2. Tentative syllabus mid-term exam

  3. Previously • LR(0) parsing • Running the parser • Constructing transition diagram • Constructing parser table • Detecting conflicts • SLR(0) • Eliminating conflicts via FOLLOW sets

  4. Agenda • LR(1) • LALR(1) • Automatic LR parser generation • Handling ambiguities

  5. Going beyond SLR(0) (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L Some common language constructs introduce conflicts even for SLR

  6. q3 R S → R  q0 S’ →  S S →  L = R S →  R L →  * R L →  id R →  L S q1 S’ → S q9 S → L = R  q2 L S → L  = R R → L  = R q6 S → L =  R R →  L L →  * R L →  id q5 id * L → id  id * id q4 L → *  R R →  L L →  * R L →  id * L q8 R → L  L q7 L → * R  R

  7. shift/reduce conflict (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L q2 S → L  = R R → L  = q6 S → L =  R R →  L L →  * R L →  id • S → L  = R vs. R → L  • FOLLOW(R) contains = • S → L = R → * R = R • SLR cannot resolve conflict

  8. Inputs requiring shift/reduce (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L q2 S → L  = R R → L  = q6 S → L =  R R →  L L →  * R L →  id For the input id the rightmost derivationS’ → S → R → L → id requires reducing in q2 For the input id = idS’ → S → L = R → L = L → L = id → id = idrequires shifting

  9. LR(1) grammars • In SLR: a reduce item N  α is applicable only when the lookahead is in FOLLOW(N) • But for a given context (state) are all tokens in FOLLOW(N) indeed possible? • Not always • We can compute a context-sensitive (i.e., specific to a given state) subset of FOLLOW(N) and use it to remove even more conflicts • LR(1) keeps lookahead with each LR item • Idea: a more refined notion of FOLLOW computed per item

  10. LR(1) item To be matched Already matched Input N  αβ, t Hypothesis about αβ being a possible handle: so far we’ve matched α, expecting to see βand after reducing N we expect to see the token t

  11. LR(1) items LR(1) items [L → ● id, *] [L → ● id, =] [L → ● id, id] [L → ● id, $] [L → id ●, *] [L → id ●, =] [L → id ●, id] [L → id ●, $] (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L LR(0) items [L → ● id] [L → id ●] • LR(1) item is a pair • LR(0) item • Lookahead token • Meaning • We matched the part left of the dot, looking to match the part on the right of the dot, followed by the lookahead token • Example • The production L  id yields the following LR(1) items

  12. Computing Closure for LR(1) • For every [A → α ● Bβ , c] in S • for every production B→δ and every token b in the grammar such that b  FIRST(βc) • Add [B → ● δ , b] to S

  13. q3 q0 (S → R ∙ , $) (S’ → ∙ S , $) (S → ∙ L = R , $) (S → ∙ R , $) (L → ∙ * R , = ) (L → ∙ id , = ) (R → ∙ L , $ ) (L → ∙ id , $ ) (L → ∙ * R , $ ) q11 q9 (S → L = R ∙ , $) (L → id ∙ , $) R q1 S (S’ → S ∙ , $) id R id L q2 q12 (S → L ∙ = R , $) (R → L ∙ , $) (R → L ∙ , $) = id * * q10 (S → L = ∙ R , $) (R → ∙ L , $) (L → ∙ * R , $) (L → ∙ id , $) L q6 q4 q5 id (L → id ∙ , $) (L → id ∙ , =) (L → * ∙ R , =) (R → ∙ L , =) (L → ∙ * R , =) (L → ∙ id , =) (L → * ∙ R , $) (R → ∙ L , $) (L → ∙ * R , $) (L → ∙ id , $) (L → * ∙ R , $) (R → ∙ L , $) (L → ∙ * R , $) (L → ∙ id , $) * L q8 R q7 (R → L ∙ , =) (R → L ∙ , $) R q13 (L → * R ∙ , =) (L → * R ∙ , $) (L → * R ∙ , $)

  14. Back to the conflict q2 (S → L ∙ = R , $) (R → L ∙ , $) = (S → L = ∙ R , $) (R → ∙ L , $) (L → ∙ * R , $) (L → ∙ id , $) q6 Is there a conflict now?

  15. LALR(1) • LR(1) tables have huge number of entries • Often don’t need such refined observation (and cost) • Idea: find states with the same LR(0) component and merge their lookaheads component as long as there are no conflicts • LALR(1) not as powerful as LR(1) in theory but works quite well in practice • Merging may not introduce new shift-reduce conflicts, only reduce-reduce, which is unlikely in practice

  16. q3 q0 (S → R ∙ , $) (S’ → ∙ S , $) (S → ∙ L = R , $) (S → ∙ R , $) (L → ∙ * R , = ) (L → ∙ id , = ) (R → ∙ L , $ ) (L → ∙ id , $ ) (L → ∙ * R , $ ) q11 q9 (S → L = R ∙ , $) (L → id ∙ , $) R q1 S (S’ → S ∙ , $) id R id L q2 q12 (S → L ∙ = R , $) (R → L ∙ , $) (R → L ∙ , $) = id * * q10 (S → L = ∙ R , $) (R → ∙ L , $) (L → ∙ * R , $) (L → ∙ id , $) L q6 q4 q5 id (L → id ∙ , $) (L → id ∙ , =) (L → * ∙ R , =) (R → ∙ L , =) (L → ∙ * R , =) (L → ∙ id , =) (L → * ∙ R , $) (R → ∙ L , $) (L → ∙ * R , $) (L → ∙ id , $) (L → * ∙ R , $) (R → ∙ L , $) (L → ∙ * R , $) (L → ∙ id , $) * L q8 R q7 (R → L ∙ , =) (R → L ∙ , $) R q13 (L → * R ∙ , =) (L → * R ∙ , $) (L → * R ∙ , $)

  17. q3 q0 (S → R ∙ , $) (S’ → ∙ S , $) (S → ∙ L = R , $) (S → ∙ R , $) (L → ∙ * R , = ) (L → ∙ id , = ) (R → ∙ L , $ ) (L → ∙ id , $ ) (L → ∙ * R , $ ) q11 q9 (S → L = R ∙ , $) (L → id ∙ , $) R q1 S (S’ → S ∙ , $) id R id L q2 q12 (S → L ∙ = R , $) (R → L ∙ , $) (R → L ∙ , $) = id * * q10 (S → L = ∙ R , $) (R → ∙ L , $) (L → ∙ * R , $) (L → ∙ id , $) L q6 q4 q5 id (L → id ∙ , $) (L → id ∙ , =) (L → * ∙ R , =) (R → ∙ L , =) (L → ∙ * R , =) (L → ∙ id , =) (L → * ∙ R , $) (R → ∙ L , $) (L → ∙ * R , $) (L → ∙ id , $) (L → * ∙ R , $) (R → ∙ L , $) (L → ∙ * R , $) (L → ∙ id , $) * L q8 R q7 (R → L ∙ , =) (R → L ∙ , $) R q13 (L → * R ∙ , =) (L → * R ∙ , $) (L → * R ∙ , $)

  18. q3 q0 (S → R ∙ , $) (S’ → ∙ S , $) (S → ∙ L = R , $) (S → ∙ R , $) (L → ∙ * R , = ) (L → ∙ id , = ) (R → ∙ L , $ ) (L → ∙ id , $ ) (L → ∙ * R , $ ) q9 (S → L = R ∙ , $) R q1 S (S’ → S ∙ , $) R L q2 (S → L ∙ = R , $) (R → L ∙ , $) = id id * * q10 (S → L = ∙ R , $) (R → ∙ L , $) (L → ∙ * R , $) (L → ∙ id , $) q6 q4 q5 id (L → id ∙ , $) (L → id ∙ , =) (L → * ∙ R , =) (R → ∙ L , =) (L → ∙ * R , =) (L → ∙ id , =) (L → * ∙ R , $) (R → ∙ L , $) (L → ∙ * R , $) (L → ∙ id , $) (L → * ∙ R , $) (R → ∙ L , $) (L → ∙ * R , $) (L → ∙ id , $) * id L L q8 R q7 (R → L ∙ , =) (R → L ∙ , $) R (L → * R ∙ , =) (L → * R ∙ , $)

  19. Left/Right- recursion • At home: create a simple grammar withleft-recursion and one with right-recursion • Construct corresponding LR(0) parser • Any conflicts? • Run on simple input and observe behavior • Attempt to generalize observation for long inputs

  20. Example: non-LR(1) grammar (1) S  Y b c $ (2) S  Z b d $ (3) Y  a (4) Z a S ∙Y b c, $S ∙ Y b c, $Y ∙ a, bZ ∙ a, b a Y  a ∙, bZ  a ∙, b reduce-reduce conflicton lookahead ‘b’

  21. Automated parser generation(via CUP)

  22. High-level structure text Lexerspec .java Lexical analyzer JFlex javac Lexer.java LANG.lex (Token.java) tokens Parserspec .java Parser CUP javac Parser.javasym.java LANG.cup AST

  23. Expression calculator exprexpr + expr | expr - expr | expr * expr | expr / expr | - expr | ( expr ) | number • Goals of expression calculator parser: • Is 2+3+4+5 a valid expression? • What is the meaning (value) of this expression?

  24. Syntax analysis with CUP • CUP – parser generator • Generates an LALR(1) Parser • Input: spec file • Output: a syntax analyzer • Can dump automaton and table tokens Parserspec .java Parser CUP javac AST

  25. CUP spec file • Package and import specifications • User code components • Symbol (terminal and non-terminal) lists • Terminals go to sym.java • Types of AST nodes • Precedence declarations • The grammar • Semantic actions to construct AST

  26. Parsing ambiguous grammars

  27. Symbol typeexplained later Expression Calculator – 1st Attempt terminal Integer NUMBER; terminal PLUS, MINUS, MULT, DIV; terminal LPAREN, RPAREN; non terminal Integer expr; expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr | LPAREN expr RPAREN | NUMBER ;

  28. + + + * * + + + a a a a b b c c b b c c Ambiguities a + b * c a + b + c

  29. + + + * + * + + a a a a b b c c b b c c Ambiguities as conflicts for LR(1) a + b * c a + b + c

  30. Increasing precedence Expression Calculator – 2nd Attempt terminal Integer NUMBER; terminal PLUS,MINUS,MULT,DIV; terminal LPAREN, RPAREN; terminal UMINUS; non terminal Integer expr; precedence left PLUS, MINUS; precedence left DIV, MULT; precedence left UMINUS; expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ; Contextual precedence

  31. Parsing ambiguous grammars using precedence declarations • Each terminal assigned with precedence • By default all terminals have lowest precedence • User can assign his own precedence • CUP assigns each production a precedence • Precedence of rightmost terminal in production • or user-specified contextual precedence • On shift/reduce conflict resolve ambiguity by comparing precedence of terminal and production and decides whether to shift or reduce • In case of equal precedencesleft/right help resolve conflicts • left means reduce • right means shift • More information on precedence declarations in CUP’s manual

  32. + + + + a a b c b c Resolving ambiguity (associativity) precedence left PLUS a + b + c

  33. + * * + a a b c b c Resolving ambiguity (op. precedence) precedence left PLUSprecedence left MULT a + b * c

  34. Resolving ambiguity (contextual) precedence left MULTMINUS expr%prec UMINUS * - * - b a b a - a * b

  35. Resolving ambiguity terminal Integer NUMBER; terminal PLUS,MINUS,MULT,DIV; terminal LPAREN, RPAREN; terminal UMINUS; precedence left PLUS, MINUS; precedence left DIV, MULT; precedence left UMINUS; expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ; UMINUS never returnedby scanner(used only to define precedence) Rule has precedence of UMINUS

  36. More CUP directives • precedence nonassoc NEQ • Non-associative operators: < > == != etc. • 1<2<3 identified as an error (semantic error?) • start non-terminal • Specifies start non-terminal other than first non-terminal • Can change to test parts of grammar • Getting internal representation • Command line options: • -dump_grammar • -dump_states • -dump_tables • -dump

  37. Generated from tokendeclarations in .cup file Scanner integration import java_cup.runtime.*; %% %cup %eofval{ return new Symbol(sym.EOF); %eofval} NUMBER=[0-9]+ %% <YYINITIAL>”+” { return new Symbol(sym.PLUS); } <YYINITIAL>”-” { return new Symbol(sym.MINUS); } <YYINITIAL>”*” { return new Symbol(sym.MULT); } <YYINITIAL>”/” { return new Symbol(sym.DIV); } <YYINITIAL>”(” { return new Symbol(sym.LPAREN); } <YYINITIAL>”)” { return new Symbol(sym.RPAREN); } <YYINITIAL>{NUMBER} { return new Symbol(sym.NUMBER, new Integer(yytext())); } <YYINITIAL>\n { } <YYINITIAL>. { } Parser gets terminals from the scanner

  38. Recap • Package and import specifications and user code components • Symbol (terminal and non-terminal) lists • Define building-blocks of the grammar • Precedence declarations • May help resolve conflicts • The grammar • May introduce conflicts that have to be resolved

  39. Abstract syntaxtree construction

  40. Assigning meaning • So far, only validation • Add Java code implementing semantic actions expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ;

  41. Assigning meaning non terminal Integer expr; expr ::= expr:e1 PLUS expr:e2 {: RESULT = new Integer(e1.intValue() + e2.intValue()); :} | expr:e1 MINUS expr:e2 {: RESULT = new Integer(e1.intValue() - e2.intValue()); :} | expr:e1 MULT expr:e2 {: RESULT = new Integer(e1.intValue() * e2.intValue()); :} | expr:e1 DIV expr:e2 {: RESULT = new Integer(e1.intValue() / e2.intValue()); :} | MINUS expr:e1 {: RESULT = new Integer(0 - e1.intValue(); :} %prec UMINUS | LPAREN expr:e1 RPAREN {: RESULT = e1; :} | NUMBER:n {: RESULT = n; :} ; • Symbol labels used to name variables • RESULT names the left-hand side symbol

  42. Abstract Syntax Trees • More useful representation of syntax tree • Less clutter • Actual level of detail depends on your design • Basis for semantic analysis • Later annotated with various information • Type information • Computed values • Technically – a class hierarchy of abstract syntax tree nodes

  43. Parse tree vs. AST expr + expr + expr expr expr 1 + ( 2 ) + ( 3 ) 1 2 3

  44. AST hierarchy example expr int_const plus minus times divide

  45. AST construction • AST Nodes constructed during parsing • Stored in push-down stack • Bottom-up parser • Grammar rules annotated with actions for AST construction • When node is constructed all children available (already constructed) • Node (RESULT) pushed on stack

  46. int_const int_const int_const val = 3 val = 2 val = 1 plus plus e1 e1 e2 e2 AST construction expr ::= expr:e1 PLUS expr:e2 {: RESULT = new plus(e1,e2); :} | LPAREN expr:e RPAREN {: RESULT = e; :} | INT_CONST:i {: RESULT = new int_const(…, i); :} 1 + (2) + (3) expr + (2) + (3) expr + (expr) + (3) expr + (3) expr + (expr) expr expr expr expr expr expr 1 + ( 2 ) + ( 3 )

  47. Example of lists terminal Integer NUMBER; terminal PLUS,MINUS,MULT,DIV,LPAREN,RPAREN,SEMI; terminal UMINUS; non terminal Integer expr; non terminal expr_list, expr_part; precedence left PLUS, MINUS; precedence left DIV, MULT; precedence left UMINUS; expr_list ::= expr_list expr_part | expr_part ; expr_part ::= expr:e {: System.out.println("= " + e); :} SEMI ; expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ; Executed when e is shifted

  48. Next lecture:IR and Operational Semantics

More Related