150 likes | 336 Views
More yacc. What is yacc. Tool to produce a parser given a grammar YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar and to produce the source code of the syntactic analyzer of the language produced by this grammar
E N D
What is yacc • Tool toproduce a parser given a grammar • YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar and to produce the source code of the syntactic analyzer of the language produced by this grammar • Input is a grammar (rules) and actions to take upon recognizing a rule • Output is a C program and optionally a header file of tokens
Works with lex • Lex is a scanner generator • Input is description of patterns and actions • Output is a C program which contains a function yylex() which, when called, matches patterns and performs actions per input • Typically, the generated scanner performs lexical analysis and produces tokens for the (YACC-generated) parser
Structure of a YACC File • Has the same three-part structure as Lex • Each part is separated by a %% symbol • The three parts are even identical: • definition section • rules section • code section (copied directly into the generated program)
Definition Section • Declare tokens used in the grammar and types of values used on the stack here • Tokens that are single quoted characters like “=“ or “+” need not be declared. • Literal C code can be included in a block in this section using %{…%}
Declaring Tokens • The tokens that are used in the grammar must be declared • Include lines like the one below in the definition section: %token CHARSTRING INT IDENTIFIER %token LPAREN RPAREN
The Rules Section • The rules of the grammar are placed here. • Here is an example of the basic syntax: Expr INTEGER + INTEGER | INTEGER - INTEGER expr : INTEGER + INTEGER {action} | INTEGER – INTEGER {action} ; YACC grammar definition
YACC Actions • Simiar to Lex, actions can be defined that will be performed whenever a production is applied in the stream of tokens. • These are usually included after the production whose action is to be defined. • Since every symbol in the grammar has a corresponding value, it will be necessary to access those values. • Accessing the YACC stack will be the way to do this.
Accessing the Stack • Since YACC generates an LR parser, it will push the symbols that it reads along with their values on a stack until it is ready to reduce. • To access these values, include a dollar sign with a number to get at each value in the production in the action definition.
Refers to the value of the left nonterminal Accessing the Stack expr : INTEGER + INTEGER {$$ = $1 + $3} | INTEGER – INTEGER {$$ = $1 - $3} ;
Tokens and values come from lex LEX YACC yyparse yylex
Revisiting Lex • The Lex file will have to be modified to work with the YACC parser in two main places. • In the definition section, include this statement: #include “y.tab.h” • That is a header file automatically created by YACC when the parser is generated. • The actions for the rules need to be changed too.
Revisiting Lex Actions • For tokens with a value, assign that value to yylval. YACC can read the value from that variable. • Include a return statement for the token name (this is the same name that is defined at the top of the YACC file). if {return IF;} [1-9][0-9]* {yylval = atoi(yytext); return INTEGER;}
The %union Declaration • Different tokens have different data types. • INTEGER are integers, FLOAT are floats, CHARACTERSTRING are char *, IDENTIFIER are pointers to the entry in the symbol table for that identifier. • The %union will allow the parser to apply the right data type to the right token.
The %union Declaration YACC Definition Section %union { intintValue; float floatValue; } %token <intValue> INTEGER %token <floatValue> FLOAT Lex Rules Section … {yylval.intValue = atoi(yytext); return INTEGER;} … {yylval.floatValue = atof(yytext); return FLOAT;}