1 / 45

Lesson 10

Lesson 10. CDT301 – Compiler Theory , Spring 2011 Teacher : Linus Källberg. Outline. Flex Bison Abstract syntax trees. Flex. Flex. Tool for automatic generation of scanners Open-source version of Lex Takes regular expressions as input Outputs a C (or C++) file for the scanner. Flex.

kelly-kirk
Download Presentation

Lesson 10

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lesson 10 CDT301 – CompilerTheory, Spring 2011 Teacher: Linus Källberg

  2. Outline • Flex • Bison • Abstract syntax trees

  3. Flex

  4. Flex • Tool for automatic generation of scanners • Open-source version of Lex • Takes regular expressions as input • Outputs a C (or C++) file for the scanner

  5. Flex mylexer.l mylexer.obj mylexer.c Regexps intyylex() … 01101000110101010… Flex C compiler

  6. The input file to Flex Definitions %% Rules %% Usercode

  7. The definitions section • Macro definitions: • Specify a letter: letter [A-Za-z] • Specify a delimiter: delimiter [ ,:;.] • Specify a digit: digit [0-9] • Specify an identifier: id letter(letter|digit)*

  8. The definitions section • User code: %{ #include <stdio.h> inta_nice_global_variable = 0; intmy_favourite_function(void) {return 42;} %}

  9. The rulessection • Rule = regexp + C code • Longest matching pattern is used • If two equally long patterns match, the first one in the file is used • Examples: =|>=?|<(=|>)? { return RELOP; } {id} { return ID; }

  10. The regexplanguage of Flex ? Previous regexp is optional {} Macro expansion (defined in the definitions section) . Matches any character that is not end of line $ Matches the end of a line ^ Matches the beginning of a line [] Matches any enclosed character

  11. The [] syntax • Similar to | but more powerful • Example: digit [0123456789] is the same as digit 0|1|2|3|4|5|6|7|8|9 • Special characters inside the brackets: – and ^ digit [0-9] letter [A-Za-z] non_digit [^0-9]

  12. The user code section • Only C code valid here • Will be copied unchanged to the generated C file

  13. The generated scanner • By default, a function called yylex() is defined • Works similar to your GetNextToken() from lab 1 • The name can be changed with options • Some globals are defined as well (can be changed into local variables with options): yyin The file to read from yytext The matched lexeme (char*) yyleng The length of yytext yylineno Line number of the match

  14. The yywrap() function • Calleduponend-of-file • Should be supplied by the user • Suppressed with %option noyywrapor --noyywrap

  15. Scanner states in Flex • Affectswhat tokens should be recognized • Example from the language ALF: { fref 32 DEADC0DE } <- Identifier { hex_val DEADC0DE } <- Hexconstant

  16. Scanner states in Flex • Declarestate: %x READ_HEX • Use the state to make rulesconditional: hex_val { BEGIN(READ_HEX);return HEX_VAL_KW; } [a-zA-Z_][a-zA-Z0-9_]* { return ID; } <READ_HEX>[0-9a-fA-F]+ { BEGIN(INITIAL);return NUM; }

  17. Online resources http://flex.sourceforge.net/manual/index.html

  18. Bison

  19. Bison • Tool for automatic generation of parsers • Open-source alternative to Yacc • Takes an SDT scheme as input • Outputs C (or C++) source code for an LALR parser • Commonly used together with Flex

  20. Bison myparser.y myparser.obj myparser.c myparser.h SDT scheme intparse() … 01101000110101010… Token definitions Bison C compiler

  21. The input file to Bison Definitions %% SDT scheme %% Usercode

  22. Definitions section • Define tokens • Define operator precedence • Define operator associativity • Define the types of grammar symbol attributes • Write C codebetween %{ and %} • Issuecertaincommands to Bison

  23. Token definition • Normal case: %token IDENTIFIER %token WHILE • Token, precedence, associativity, and type: %left <Operator> RELOP %left <Operator> MINUSOP PLUSOP %right <Operator> NOTOP • Enables use of ambiguous grammars!

  24. Definingtypes • Just enter the type inside <> before the list of tokens: %left <Operator> RELOP %left <Operator> MULOP %right <Operator> NOTOP UNOP %token <String> ID STRING • Or the same for non-terminals: %type <Node> stmntexpractualsexprs

  25. The variable yylval • Used by the lexical analyzer to store token attributes • Default type is int • May be given another type(s) using %union: %union { int Operator; char *String; NODE_TYPE Node; } • The type (member name) is then used like this: %token <String> ID STRING

  26. Code provided by the user • yyerror(char* msg) • Function called on syntax errors • yylex() • Function called to get the next token

  27. Options to Bison • Given on the command line or in the grammar file • --defines or %defines: Output a C header file with definitions useful to a scanner • Tokens (#defines) and the type on yylval • %error-verbose: More detailed error messages • --name-prefix or %name-prefix: Change the default “yy” prefix on all names • %define api.pure: Do not use globals • --verbose or %verbose: Writedetailed information to extra output file

  28. Translationschemesection decl : BASIC_TYPE idents ';' ; idents : idents ',' ident | ident ; ident : ID ;

  29. Semanticactions • Written in C • Executed when the production is used in a reduction • $$, $1, $2, etc. refer to the attributes of the grammar symbols • Can be used as regular C variables • $$ refer to the attribute of the head, $1 to the attribute of the first symbol in the body, etc. E : E '+' T { $$ = $1 + $3; } ;

  30. Using ambiguousgrammars in Bison • Default actions: • Reduce/reduce: choose first rule in file • Shift/reduce: alwaysshift • With explicit precedence and associativity: • Shift/reduce: Compareprec/ass of rule with that of lookahead token

  31. The %expectdeclaration • To suppress shift/reducewarnings: %expect n where n is the exact nr of conflicts

  32. Contextualprecedence • Same token mighthave different precedencedepending on context: expr→ expr – expr | expr * expr | – expr | id • StackInput • … – expr * expr …

  33. Contextualprecedence • Define dummy token: %left '-' %left '*' %left UMINUS • Use the %precmodifier: expr→ – expr %prec UMINUS

  34. Examples of parser configurations StackInputAction … if (cond) stmtelse … shift StackInputAction … expr + expr * … shift StackInputAction … expr * expr + … red. expr→ expr * expr StackInputAction … expr * expr * … red. expr→ expr * expr

  35. Online resources http://www.gnu.org/software/bison/manual/html_node/index.html

  36. Abstract syntax trees

  37. Abstract syntax trees • “AST” or just “syntax tree” E + E E + a * E E a 5 b 5 b *

  38. Syntax trees vs. parsetrees Parsetrees: Syntax trees: Interior nodes are “operators”, leaves are operands Commonly constructed as an explicit data structure Represents the abstract syntax • Interior nodes are nonterminals, leaves are terminals • Rarely constructed as an explicit data structure • Represents the concrete syntax

  39. Why syntax trees? • Simplifies subsequent analyses • Independent on the parsing strategy • Makes it easier to add new analysis passes without having to modify the parser • More compact representation than parse trees

  40. Syntax treeexample if (a < 1) b = 2 + 3; else { c = d * 4; e(f, 5); } if null < = call e null = null a 1 + f 5 null b c * 2 3 d 4

  41. Exercise (1) • Draw an abstract syntax tree for the statement while (i < 100) { x = 2 * x; i = i + 1; }

  42. Constructing a syntaxtree in Bison expr : expr '+' expr { $$ = createOpNode($1, '+' ,$3); } | expr '*' expr { $$ = createOpNode($1, '*' ,$3); } | ID { $$ = createIdNode($1.name); } ;

  43. Constructing a syntaxtree in Bison stmt : RETURN expr ';' { $$ = mReturn($2, $1); } ; stmts : stmtsstmt { $$ = connectStmts($1, $2); } | { $$ = NULL; } ;

  44. Conclusion • Flex generates C source code for a scanner given a set of regular expressions • Bison generates C source code for a bottom-up parser given a syntax-directed translation scheme • Building syntax trees simplifies subsequent analyses of the program • Syntax trees can be built in semantic actions

  45. Next time • Syntax-directed definitions and translationschemes • Semanticanalysis and typeanalysis

More Related