lex & yacc Tutorial Feb. 18, 2005

1 / 15

lex & yacc Tutorial Feb. 18, 2005 - PowerPoint PPT Presentation

lex & yacc Tutorial Feb. 18, 2005. Outline. Overview of lex and yacc. Structure of lex specification Regular expression Structure of yacc specification Some hints of lab1. Scanner, parser, lex and yacc. symbol table. source program. Scanner. Parser. token. lex.yy.c. y.tab.c.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

PowerPoint Slideshow about 'lex & yacc Tutorial Feb. 18, 2005' - demetrius-contos

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

lex & yacc Tutorial

Feb. 18, 2005

Outline
• Overview of lex and yacc.
• Structure of lex specification
• Regular expression
• Structure of yacc specification
• Some hints of lab1
Scanner, parser, lex and yacc

symbol

table

source

program

Scanner

Parser

token

lex.yy.c

y.tab.c

Lex/flex

Yacc/bison

Lex spec (.l)

yacc spec (.y)

Scanner, parser, lex and yacc (cont)

“23+15”

Scanner

Parser

token

lex.yy.c

intyylex(void) {

yytext=“23”;

yylval = 23;

return NUM;

}

y.tab.c

intyyparse(void){

while (ret = yylex()) {

….

}

}

Lex/flex

Yacc/bison

Lex spec (.l)

yacc spec (.y)

lex.yy.c is generated after running

> lex x.l

x.l

%{

< C global variables, prototypes, comments >

%}

[DEFINITION SECTION]

%%

[RULES SECTION]

%%

C auxiliary subroutines

This part will be embedded into lex.yy.c

substitutions, code and start states; will be copied into lex.yy.c

define how to scan and what action to take for each token

any user code. For example, a main function to call the scanning function yylex().

Skeleton of a lex specification (.l file)
The rule of lex specification file

Rule section is list of rules

<pattern> { corresponding actions }

<pattern> { corresponding actions }

… … …

[1-9][0-9]* { yylval = atoi (yytext);

return NUMBER;

}

Actions are C statements

Pattern in regular expr form

Two notes on using lex
• Longest match:
• e.g: input=“abc” pattern=“[a-z]+ {…}”, token= “abc” not “a” or “ab”.)
• 2. more applicable rules, favor 1st e.g input = “post”, rule1 = “post” {print ( “hi”); },rule2 = “[a-zA-Z]+ {printf(“Hi!!!!!”), rule1 is applied.

Scanner, parser, lex and yacc

symbol

table

source

program

Scanner

Parser

token

lex.yy.c

lex.yy.c

Yacc/bison

Lex spec (.l)

yacc spec (.y)

Skeleton of a yacc specification (.y file)

y.tab.c is generated after running

> yacc x.y

x.y

%{

< C global variables, prototypes, comments >

%}

[DEFINITION SECTION]

%%

[PRODUCTION RULES SECTION]

%%

< C auxiliary subroutines>

This part will be embedded into y.tab.c

contains token declarations. Tokens are recognized in lexer.

define how to “understand” the input language, and what actions to take for each “sentence”.

any user code. For example, a main function to call the parser function yyparse()

Production Rules Section of yacc Spec File

A production

Rule section is list of rules

nontermsym : symbol1 symbol2 … { actions }

| symbol3 symbol4 … { actions }

| …

;

Alternatives

expr : expr ‘+’ expr { \$\$ = \$1 + \$3 }

Value of non-terminal on lhs

Value of n-th sym on rhs

An example of rule section

statement : expression { printf (“ = %g\n”, \$1); }

expression : expression ‘+’ expression { \$\$ = \$1 + \$3; }

| expression ‘-’ expression { \$\$ = \$1 - \$3; }

| expression ‘*’ expression { \$\$ = \$1 * \$3; }

| expression ‘/’ expression { \$\$ = \$1 / \$3 ; }

| NUMBER { \$\$ = \$1; }

;

• It is ambiguous grammar (shift/reduce conflict)! think about how to parse the input “2+3*5”.
• Resolve conflict without grammar transformation (next slide)

Definition section

%left ‘+’ ‘-’

%left ‘*’ ‘/’

Higher prec operators are defined later

Specify the associativity

An example of rule section (cont)
• Define operator’s precedence and associativityresolve shift/reduce conflict in previous slide
A case study – The Calculator

zcalc.l

zcalc.y

%{

#include “zcalc.h”

%}

%union { double dval; struct symtab *symp; }

%token <symp> NAME

%token <dval> NUMBER

%left ‘+’ ‘-’

%type <dval> expression

%%

statement_list : statement ‘\n’ | statement_list statement ‘\n’

statement : NAME ‘=‘ expression {\$1->value = \$3;}

| expression { printf (“ = %g\n”, \$1); }

expression : expression ‘+’ expression { \$\$ = \$1 + \$3; }

| expression ‘-’ expression { \$\$ = \$1 - \$3; }

| NUMBER { \$\$ = \$1; }

| NAME { \$\$ = \$1->value; }

%%

struct symtab * symlook( char *s )

{ /* this function looks up the symbol table and check whether the symbol s is already there. If not, add s into symbol table. */

}

int main() {

yyparse();

return 0;

}

%{

#include “zcalc.tab.h”

#include “y.tab.h”

%}

%%

([0-9]+|([0-9]*\.[0-9]+)([eE][-+]?[0-9]+)?)

{ yylval.dval = atof(yytext);

return NUMBER; }

[ \t] ;

[a-zA-Z][a-zA-Z0-(]*

{ struct symtab *sp = symlook(yytext);

yylval.symp = sp;

return NAME;

}

%%

Yacc –d zcalc.y

Hints to Lab #1(Exercise 2)

1: How to recognize “prefix”, “postfix” and “infix” in lexer?

%%

“prefix” { return PREFIX;}

“postfix” { return POSTFIX; }

“infix” { return INFIX;}

Remember to declare PREFIX, POSTFIX and INFIX as “token” in your .y file.

2. How to combine three modes together?

Remember you have to use three sets of grammars for three modes, and a global variable to remember the state. For example, you may have the following grammar:

prog = prog INFIX ‘\n’ in_stmts |

prog POSTFIX ‘\n’ post_stmts |

prog PREFIX ‘\n’ pre_stmts

;

expression : infix_expr | post_expr | pre_expr;

infix_expr: infix_expr ‘+’ infix_expr { if(flag == 0) \$\$ = \$1 + \$3;} | ……

pre_expr : ‘+’ pre_expr pre_expr ( if( flag == 1) \$\$ = \$2 + \$3;} | ……

……

Hints to Lab1 (Exercise 4-5)

3. How to build up and print AST

Now the statements and expressions in your grammar are no longer “double” type. As shown in the lab handout, the statement_list is a linked list, and expression is a tree structure like this:

struct EXP{ struct EXP* exp1;

struct EXP* exp2;

struct OP operator;

}

Some functions are associated with the linked list and the tree structure, such as linkNewStatement(), makeExpression(struct EXP* exp1, struct EXP* exp2, struct OP operator), etc. Remember that the action field for each production in your yacc file can call any function you have declared. Just as a sentence is recursively parsed, your AST is recursively built-up and traversed.

Another important point: building up the AST and printing out the AST can NOT be done in one pass!