Implementing lexical analyzer using finite automation

Implementing lexical analyzer using finite automation

We are given the following regular definition: if -> if then -> then else -> else relop -> <| <=|=|<>|>|>= id -> letter(letter|digit)* num -> digit+(.digit+)? (E(+|-)?digit+)? letter -> [a-z]|[A-Z] digit ->[0-9]

Recognize the keyword: if, then, else and lexemes: relop, id, num • delim -> blank|tab|newline ws -> delim+ if a match for ws is found lexical analyzer does not return a token to parser. It proceeds to find a token following the white space and return that to parser.

Transition diagrams • Transition diagram depicts the actions that takes place when a lexical analyzer is called by parser to get the next token • TD keeps track of information about characters that are seen as fwd pointer scans the input • Position in TD are drawn as circles called states • States are connected by arrows called edges • Edges leaving state s have labels indicating i/p characters that can next appear after transition diagram have reached state s.

letter/digit * letter start delimiter 1 0 2 • Start state: state where control resides when we begin to recognize a token. • No valid transitions indicate failure • Accepting state: state in which token can be found. • * indicates state in which retraction must takes place

There may be several transition diagrams • If failure occurs while following one transition diagram, then retract the fwd pointer to where it was in start state of this diagram and activate next transition diagram • If failure occurs in all transition diagrams, lexical error will be detected and error recovery routines will be invoked • e.g. DO 5 I=1.25 DO 5 I=1,25

Recognition of reserved words • Initialize appropriately the symbol table in which information about identifiers is stored • Enter the reserved words into symbol table before any characters in the i/p are seen. • Make a note in the symbol table of the token to be returned when the keyword is identified. • Return statement next to accepting state uses gettoken()and install_id() to obtain token and attribute value • When a lexeme is identified, symbol table is checked • if found as keywordinstall_id() will return 0 • If an identifier , pointer to symbol table entry will be returned • gettoken() will return the corresponding token

Recognition of numbers • When accepting state is reached, • call a procedure install_num() that enters the lexeme into table of numbers and returns a pointer to created entry • Returns the token NUM

Implementing lexical analyzer • Token nexttoken( ) • { • While (1) • { • switch(state) { • case 0: c=nextchar(); • If (c==blank|| c==tab|| c==newline) { • State =0; • lexeme_beginning++; • } • else if (c==’<’) state=1; • else if (c ==’=’)state=5; • else if (c==’>’) state=6; • else state=fail(); • break; • case 1: c= nextchar(); • if (c==’=’) state=2; • else if (c==’>’) state=3; • else state=4; • break; • case 2: token.attribute=LE; • token.name=relop; • return token;

case 8: retract (1); • token.attribute=GT; • token.name=relop; • return token; • case 9: c= nextchar(); • if (isletter(c)) state=10; • else state= fail(); • break; • case 10: c= nextchar(); • if (isletter(c)) state=10; • else if (isdigit(c)) state=10; • else state=11; • break; • case11: retract (1); • entry=install_id( ); • name=gettoken(); • token.name= name; • token. attribute=entry; • return token; • break; • /* cases 12-24 here for numbers*/

case 25: c= nextchar(); • if (isidgit(c)) state=26; • else state=fail(); • break; • case 26: c= nextchar(); • if (isidgit(c)) state=26; • else state=27; • break; • case 27:retract (1); install_num( ); • return (NUM); • } • } • }

Code for next state • int state=0, start=0; • intlexical_value; • int fail() • { • forward=token_beginning; • switch( start){ • case 0:start=9; break; • case 9: start=12; break; • case 12: start=20; break; • case 20: start=25; break; • case 25: recover( ); break; • default: /* compiler error*/ • } • return start; • }

Implementing lexical analyzer using finite automation

Implementing lexical analyzer using finite automation

Presentation Transcript

Lexical Analyzer

Lexical Analyzer

Compiler Design 3. Lexical Analyzer, Flex

Chapter 10 lexical analyzer (lex)

Lexical Analysis: Finite Automata

4b Lexical analysis Finite Automata

TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA

Exercise: Build Lexical Analyzer Part

Design of lexical analyzer using LEX

Chapter 3 Lexical analyzer

Lexical and Syntax Analyzer (Chapter 4)

Lexical Analyzer (Checker)

Lexical Analyzer

Lexical Analyzer

Lexical Analyzer in Perspective

Lexical Analyzer

Lexical Analyzer

4b Lexical analysis Finite Automata

4b Lexical analysis Finite Automata