slide1 n.
Download
Skip this Video
Download Presentation
Tutorial On Lex & Yacc

Loading in 2 Seconds...

play fullscreen
1 / 31

Tutorial On Lex & Yacc - PowerPoint PPT Presentation


  • 170 Views
  • Uploaded on

Tutorial On Lex & Yacc. Presented By Dewan Tanvir Ahmed Lecturer, CSE Bangladesh University of Engineering and Technology. Purpose of Tutorial. Provide a brief, non-technical, black-box introduction to lex and yacc. How to run lex and yacc. How to run them in Windows environment.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Tutorial On Lex & Yacc' - kennedy-bell


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Tutorial

On

Lex & Yacc

Presented By

Dewan Tanvir Ahmed

Lecturer, CSE

Bangladesh University of Engineering and Technology

purpose of tutorial
Purpose of Tutorial
  • Provide a brief, non-technical, black-box introduction to lex and yacc.
  • How to run lex and yacc.
  • How to run them in Windows environment.

More study needed

Upcoming Assignments are on lex yacc

May be included in CSE309

lex what is it

Lex: what is it?

  • Lex: a tool for automatically generating a lexer or scanner given a lex specification (.l file)
  • A lexer or scanner is used to perform lexical analysis, or the breaking up of an input stream into meaningful units, or tokens.
  • For example, consider breaking a text file up into individual words.
skeleton of a lex specification l file

*.c is generated after running

x.l

%{

< C global variables, prototypes, comments >

%}

[DEFINITION SECTION]

%%

[RULES SECTION]

%%

< C auxiliary subroutines>

This part will be embedded into *.c

substitutions, code and start states; will be copied into *.c

define how to scan and what action to take for each token

any user code. For example, a main function to call the scanning function yylex().

Skeleton of a lex specification (.l file)
the rules section
The rules section

%%

[RULES SECTION]

<pattern> { <action to take when matched> }

<pattern> { <action to take when matched> }

%%

Patterns are specified by regular expressions.

For example:

%%

[A-Za-z]* { printf(“this is a word”); }

%%

regular expression basics
Regular Expression Basics

. : matches any single character except \n

* : matches 0 or more instances of the preceding regular expression

+ : matches 1 or more instances of the preceding regular expression

? : matches 0 or 1 of the preceding regular expression

| : matches the preceding or following regular expression

[ ] : defines a character class

() : groups enclosed regular expression into a new regular expression

“…”: matches everything within the “ “ literally

lex reg exp cont
Lex Reg Exp (cont)

x|yx or y

{i} definition of i

x/yx, only if followed by y (y not removed from input)

x{m,n} m to n occurrences of x

xx, but only at beginning of line

x$ x, but only at end of line

"s" exactly what is in the quotes (except for "\" and

following character)

A regular expression finishes with a space, tab or newline

meta characters
Meta-characters
  • meta-characters (do not match themselves, because they are used in the preceding reg exps):
    • ( ) [ ] { } < > + / , ^ * | . \ " $ ? - %
  • to match a meta-character, prefix with "\"
  • to match a backslash, tab or newline, use \\, \t, or \n
regular expression examples
Regular Expression Examples
  • an integer: 12345
    • [1-9][0-9]*
  • a word: cat
    • [a-zA-Z]+
  • a (possibly) signed integer: 12345 or -12345
    • [-+]?[1-9][0-9]*
  • a floating point number: 1.2345
    • [0-9]*”.”[0-9]+
lex regular expressions
Lex Regular Expressions

Lex uses an extended form of regular expression:

(c: character, x,y: regular expressions, s: string, m,n integers and i: identifier).

  • c any character except meta-characters (see below)
  • [...] the list of enclosed chars (may be a range)
  • [...] the list of chars not enclosed
  • . any ASCII char except newline
  • xy concatenation of x and y
  • x* same as x*
  • x+ same as x+ (i.e. x* but not )
  • x? an optional x (same as x+ )
two rules
Two Rules
  • lex will always match the longest (number of characters) token possible.
  • 2. If two or more possible tokens are of the same length, then the token with the regular expression that is defined first in the lex specification is favored.
regular expression examples1
Regular Expression Examples
  • a delimiter for an English sentence
    • “.” | “?” | ! OR
    • [“.””?”!]
  • C++ comment: // call foo() here!!
    • “//”.*
  • white space
    • [ \t]+
  • English sentence: Look at this!
    • ([ \t]+|[a-zA-Z]+)+(“.”|”?”|!)
special functions
Special Functions
  • yytext
    • where text matched most recently is stored
  • yyleng
    • number of characters in text most recently matched
  • yylval
    • associated value of current token
  • yymore()
    • append next string matched to current contents of yytext
  • yyless(n)
    • remove from yytext all but the first n characters
  • unput(c)
    • return character c to input stream
  • yywrap()
    • may be replaced by user
    • The yywrap method is called by the lexical analyser whenever it inputs an EOF as the first character when trying to match a regular expression
yacc what is it
Yacc: what is it?

Yacc: a tool for automatically generating a parser given a grammar written in a yacc specification (.y file)

A grammar specifies a set of production rules, which define a language. A production rule specifies a sequence of symbols, sentences, which are legal in the language.

skeleton of a yacc specification y file
Skeleton of a yacc specification (.y file)

*.c is generated after running

x.y

%{

< C global variables, prototypes, comments >

%}

[DEFINITION SECTION]

%%

[PRODUCTION RULES SECTION]

%%

< C auxiliary subroutines>

This part will be embedded into *.c

contains token declarations. Tokens are recognized in lexer.

define how to “understand” the input language, and what actions to take for each “sentence”.

any user code. For example, a main function to call the parser function yyparse()

slide17

Structure of yacc File

  • Definition section
    • declarations of tokens
    • type of values used on parser stack
  • Rules section
    • list of grammar rules with semantic routines
  • User code
the production rules section
The Production Rules Section

%%

production : symbol1 symbol2 … { action }

| symbol3 symbol4 … { action }

| …

production: symbol1 symbol2 { action }

%%

an example

statement

expression

expression

expression

number

expression

expression

number

expression

expression

number

number

+

5

4

-

+

3

2

An example

%%

statement : expression { printf (“ = %g\n”, $1); }

expression : expression ‘+’ expression { $$ = $1 + $3; }

| expression ‘-’ expression { $$ = $1 - $3; }

| NUMBER { $$ = $1; }

%%

According these two productions,

5 + 4 – 3 + 2 is parsed into:

choosing a grammar
S -> E

E -> E + T

E -> E - T

E -> T

T -> T * F

T -> T / F

T -> F

F -> ( E )

F -> ID

S -> E

E -> E + E

E ->E - E

E -> E * E

E -> E / E

E -> ( E )

E -> ID

Choosing a Grammar
precedence and associativity
Precedence and Associativity

%right ‘='

%left '-' '+'

%left '*' '/'

%right '^'

defining values
Defining Values

expr : expr '+' term { $$ = $1 + $3; }

| term { $$ = $1; }

;

term : term '*' factor { $$ = $1 * $3; }

| factor { $$ = $1; }

;

factor : '(' expr ')' { $$ = $2; }

| ID

| NUM

;

defining values1
Defining Values

$1

expr : expr '+' term { $$ = $1 + $3; }

| term { $$ = $1; }

;

term : term '*' factor { $$ = $1 * $3; }

| factor { $$ = $1; }

;

factor : '(' expr ')' { $$ = $2; }

| ID

| NUM

;

defining values2
Defining Values

expr : expr '+' term { $$ = $1 + $3; }

| term { $$ = $1; }

;

term : term '*' factor { $$ = $1 * $3; }

| factor { $$ = $1; }

;

factor : '(' expr ')' { $$ = $2; }

| ID

| NUM

;

$2

defining values3
Defining Values

expr : expr '+' term { $$ = $1 + $3; }

| term { $$ = $1; }

;

term : term '*' factor { $$ = $1 * $3; }

| factor { $$ = $1; }

;

factor : '(' expr ')' { $$ = $2; }

| ID

| NUM

;

$3

Default: $$ = $1;

example lex

scanner.l

Example: Lex

%{

#include <stdio.h>

#include "y.tab.h"

%}

id [_a-zA-Z][_a-zA-Z0-9]*

wspc [ \t\n]+

semi [;]

comma [,]

%%

int { return INT; }

char { return CHAR; }

float { return FLOAT; }

{comma} { return COMMA; } /* Necessary? */

{semi} { return SEMI; }

{id} { return ID;}

{wspc} {;}

example definitions

decl.y

Example: Definitions

%{

#include <stdio.h>

#include <stdlib.h>

%}

%start line

%token CHAR, COMMA, FLOAT, ID, INT, SEMI

%%

example rules

decl.y

Example: Rules

/*This production is not part of the "official"

grammar. It's primary purpose is to recover from

parser errors, so it's probably best if you leave ot here. */

line : /* lambda */

| line decl

| line error

{

printf("Failure :-(\n");

yyerrok;

yyclearin;

}

;

example rules1

decl.y

Example: Rules

decl : type ID list

{ printf("Success!\n");

} ;

list : COMMA ID list

| SEMI

;

type : INT | CHAR | FLOAT

;

%%

example supplementary code

decl.y

Example: Supplementary Code

extern FILE *yyin;

main()

{

do {

yyparse();

} while(!feof(yyin));

}

yyerror(char *s)

{

/* Don't have to do anything! */

}

ad