1 / 48

Compiler construction in4020 – lecture 3

Compiler construction in4020 – lecture 3. Koen Langendoen Delft University of Technology The Netherlands. program text. token description. scanner generator. lexical analysis. tokens. syntax analysis. AST. context handling. annotated AST. Summary of lecture 2.

carr
Download Presentation

Compiler construction in4020 – lecture 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compiler constructionin4020 – lecture 3 Koen Langendoen Delft University of Technology The Netherlands

  2. program text token description scanner generator lexical analysis tokens syntax analysis AST context handling annotated AST Summary oflecture 2 • lexical analyzer generator • description  FSA • FSA construction • dotted items • character moves •  moves

  3. Quiz 2.24 Will the function identify() (to access a symbol table)still work if the hash function maps all identifiers onto the same number? 2.26 Tutor X insists that macro processing must be implemented as a separate phase between reading the program and the lexical analysis. Show mr. X wrong.

  4. program text lexical analysis tokens parser language syntax analysis generator grammar AST context handling annotated AST Overview • syntax analysis: tokens  AST • AST construction • by hand: recursive descent • automatic: top-down (LLgen), bottom-up (yacc)

  5. syn-tax: the way in which words are put together to form phrases, clauses, or sentences. Webster’s Dictionary Syntax analysis • parsing: given a context-free grammar and a stream of tokens, find the derivation that ties them together. • result: parse tree expression ‘-’ expression term ‘*’ term term factor ‘*’ factor factor term term identifier ‘*’ ‘c’ identifier identifier factor factor identifier constant ‘b’ ‘a’ ‘b’ ‘4’

  6. Syntax analysis • parsing: given a context-free grammar and a stream of tokens, find the derivation that ties them together. • result: parse tree expression ‘-’ expression term ‘*’ term term factor ‘*’ factor factor term term identifier ‘*’ ‘c’ identifier identifier factor factor identifier constant ‘b’ ‘a’ ‘b’ ‘4’

  7. Context free grammar • G = (VN, VT, S, P) • VN : set of Non-terminal symbols • VT : set of Terminal symbols • S : Start symbol (S  VN) • P : set of Production rules {N  } • VN VT =  • P = {N   | N  VN    (VN VT)*}

  8. expression EOF term rest_expression IDENTIFIER Top-down parsing • process tokens left to right • expression grammar input expression EOF expression  term rest_expression term  IDENTIFIER | ‘(’ expression ‘)’ rest_expression  ‘+’ expression |  • example expression input aap + ( noot + mies )

  9. rest_expression expression term rest_expr IDENT IDENT IDENT  • • Bottom-up parsing • process tokens left to right • expression grammar input expression EOF expression  term rest_expression term  IDENTIFIER | ‘(’ expression ‘)’ rest_expression  ‘+’ expression |  aap + ( noot + mies ) • • • • •

  10. 1 5 2 3 1 4 4 5 2 3 Comparison

  11. Recursive descent parsing • each rule N translates to a boolean function • return true if a terminal production of N was matched • return false otherwise (without consuming any token) • try alternatives of N in turn • a terminal symbol must match the current token • a non-terminal is matched by calling its routine input expression EOF int input(void) { return expression() && require(token(EOF)); }

  12. Recursive descent parsing expression  term rest_expression int expression(void) { return term() && require(rest_expression()); } term  IDENTIFIER | ‘(’ expression ‘)’ int term(void) { return token(IDENTIFIER) || token('(') && require(expression()) && require(token(')')); }

  13. Recursive descent parsing rest_expression  ‘+’ expression |  auxiliary functions • consume matched tokens • report syntax errors int rest_expression(void) { return token('+') && require(expression()) || 1; } int token(int tk) { if (Token.class != tk) return 0; get_next_token(); return 1; } int require(int found) { if (!found) error(); return 1; }

  14. Automatic top-down parsing • follow recursive descent scheme, but avoid interpretation overhead • for each rule and alternative determine the tokens it can start with: FIRST set • parsing scheme for rule N  A1 | A2 | … • if token  FIRST(N) thenERROR • if token  FIRST(A1) then parse A1 • if token  FIRST(A2) then parse A2 • …

  15. Exercise (7 min.) • design an algorithm to compute the FIRST sets of all non-terminals in a context free grammar. • hint: consider the types of rules • alternatives • composition • empty productions input expression EOF expression  term rest_expression term  IDENTIFIER | ‘(’ expression ‘)’ rest_expression  ‘+’ expression | 

  16. Answers

  17. Answers (Fig 2.58, page 122) • N w (w  VT ) FIRST(N) ={w} • N  A1 | A2 | … FIRST(N) = FIRST(Ai ) • N  FIRST(N) = {} • N  A  FIRST(N) =FIRST(A) ,   FIRST(A) FIRST(N) =FIRST(A) \ {}FIRST() , otherwise closure algorithm

  18. Break

  19. Predictive parsing • similar to recursive descent, but no back-tracking • functions “know” what they are doing input  expression EOF FIRST(expression) = {IDENT, ‘(‘} void input(void) {switch (Token.class) {case IDENT: case '(':expression(); token(EOF);break;default: error(); } } void token(int tk) { if (Token.class != tk) error(); get_next_token(); }

  20. Predictive parsing expression  term rest_expression FIRST(term) = {IDENT, ‘(‘} void expression(void) {switch (Token.class) {case IDENT: case '(':term(); rest_expression(); break;default: error(); } } term  IDENTIFIER | ‘(’ expression ‘)’ void term(void) {switch (Token.class) {case IDENT: token(IDENT); break;case '(': token('('); expression(); token(')');break;default: error(); } }

  21. Predictive parsing rest_expression  ‘+’ expression | FIRST(rest_expr) = {‘+’, } • FIRST() = {} • check nothing? • NO: token  FOLLOW(rest_expr) void rest_expression(void) {switch (Token.class) {case '+': token('+'); expression(); break;case EOF: case ')': break;default: error(); } } FOLLOW(rest_expr) = {EOF, ‘)’}

  22. Limitations of LL(1) parsers • FIRST/FIRST conflict term  IDENTIFIER | IDENTIFIER ‘[‘ expression ‘]’ | ‘(’ expression ‘)’ • FIRST/FOLLOW conflict S  A ‘a’ ‘b’ A  ‘a’ |  • left recursion expression  expression ‘-’ term | term FIRST(A) = {‘a’} = FOLLOW(A)

  23. Making grammars LL(1) • manual labour • rewrite grammar • adjust semantic actions • three rewrite methods • left factoring • substitution • left-recursion removal

  24. Left factoring term  IDENTIFIER | IDENTIFIER ‘[‘ expression ‘]’ • factor out common prefix term  IDENTIFIERafter_identifier after_identifier | ‘[‘ expression ‘]’ ‘[’ FOLLOW(after_identifier)

  25. Substitution S  A ‘a’ ‘b’ A  ‘a’ |  • replace non-terminal by its alternative S  ‘a’ ‘a’ ‘b’ | ‘a’ ‘b’

  26. N   Left-recursion removal N  N  |  • replace by N   M M   M |  • example expression  expression ‘-’ term | term           ... expression  term tail tail ‘-’ term tail | 

  27. Exercise (7 min.) • make the following grammar LL(1) expression  expression ‘+’ term | expression ‘-’ term | term term  term ‘*’ factor | term ‘/’ factor | factor factor ‘(‘ expression ‘)’ | func-call | identifier | constant func-call  identifier ‘(‘ expr-list? ‘)’ expr-list  expression (‘,’ expression)* • and what about S  if E then S (else S)?

  28. Answers

  29. Answers • substitution F ‘(‘ E ‘)’ | ID ‘(‘ expr-list? ‘)’ | ID | constant • left factoring E E( ‘+’ |‘-’ )T | T T  T ( ‘*’| ‘/’ ) F | F F ‘(‘ E ‘)’ | ID ( ‘(‘ expr-list? ‘)’ )? | constant • left recursion removal E T(( ‘+’ |‘-’ )T )* T  F(( ‘*’| ‘/’ ) F )* • if-then-else grammar is ambiguous

  30. program text lexical analysis tokens parser language syntax analysis generator grammar AST context handling LL(1) push-down automaton annotated AST automatic generation

  31. LL(1) push-down automaton • stack right-hand side of production

  32. LL(1) push-down automaton input prediction stack aap + ( noot + mies ) EOF input

  33. LL(1) push-down automaton input prediction stack aap + ( noot + mies ) EOF input replace non-terminal by transition entry

  34. LL(1) push-down automaton expression EOF prediction stack aap + ( noot + mies ) EOF input

  35. LL(1) push-down automaton expression EOF prediction stack aap + ( noot + mies ) EOF input replace non-terminal by transition entry

  36. LL(1) push-down automaton term rest-expr EOF prediction stack aap + ( noot + mies ) EOF input

  37. LL(1) push-down automaton term rest-expr EOF prediction stack aap + ( noot + mies ) EOF input replace non-terminal by transition entry

  38. LL(1) push-down automaton IDENT rest-expr EOF prediction stack aap + ( noot + mies ) EOF input

  39. LL(1) push-down automaton IDENT rest-expr EOF prediction stack pop matching token aap + ( noot + mies ) EOF input

  40. LL(1) push-down automaton rest-expr EOF prediction stack + ( noot + mies ) EOF input

  41. LL(1) push-down automaton rest-expr EOF prediction stack + ( noot + mies ) EOF input

  42. LL(1) push-down automaton + expression EOF prediction stack + ( noot + mies ) EOF input

  43. LL(1) push-down automaton + expression EOF prediction stack + ( noot + mies ) EOF input

  44. LL(1) push-down automaton expression EOF prediction stack ( noot + mies ) EOF input

  45. LL(1) push-down automaton expression EOF prediction stack ( noot + mies ) EOF input

  46. LLgen • top-down parser generator • to be used in assignment #1 • discussed in lecture 5

  47. program text lexical analysis tokens parser language syntax analysis generator grammar AST context handling annotated AST Summary • syntax analysis: tokens  AST • top-down parsing • recursive descent • push-down automaton • making grammars LL(1)

  48. Homework • study sections: • 1.10 closure algorithm • 2.2.4.6 error handling in LL(1) parsers • print handout for next week [blackboard] • find a partner for the “practicum” • register your group • send e-mail to koen@pds.twi.tudelft.nl

More Related