1 / 15

Implementing lexical analyzer using finite automation

Implementing lexical analyzer using finite automation. We are given the following regular definition: if -> if then -> then else -> else relop -> <| <=|=|<>|>|>= id -> letter( letter|digit )* num -> digit + (.digit + )? (E(+|-)?digit + )? letter -> [a-z]|[A-Z]

gzifa
Download Presentation

Implementing lexical analyzer using finite automation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Implementing lexical analyzer using finite automation

  2. We are given the following regular definition: if -> if then -> then else -> else relop -> <| <=|=|<>|>|>= id -> letter(letter|digit)* num -> digit+(.digit+)? (E(+|-)?digit+)? letter -> [a-z]|[A-Z] digit ->[0-9]

  3. Recognize the keyword: if, then, else and lexemes: relop, id, num • delim -> blank|tab|newline ws -> delim+ if a match for ws is found lexical analyzer does not return a token to parser. It proceeds to find a token following the white space and return that to parser.

  4. Transition diagrams • Transition diagram depicts the actions that takes place when a lexical analyzer is called by parser to get the next token • TD keeps track of information about characters that are seen as fwd pointer scans the input • Position in TD are drawn as circles called states • States are connected by arrows called edges • Edges leaving state s have labels indicating i/p characters that can next appear after transition diagram have reached state s.

  5. letter/digit * letter start delimiter 1 0 2 • Start state: state where control resides when we begin to recognize a token. • No valid transitions indicate failure • Accepting state: state in which token can be found. • * indicates state in which retraction must takes place

  6. There may be several transition diagrams • If failure occurs while following one transition diagram, then retract the fwd pointer to where it was in start state of this diagram and activate next transition diagram • If failure occurs in all transition diagrams, lexical error will be detected and error recovery routines will be invoked • e.g. DO 5 I=1.25 DO 5 I=1,25

  7. Recognition of reserved words • Initialize appropriately the symbol table in which information about identifiers is stored • Enter the reserved words into symbol table before any characters in the i/p are seen. • Make a note in the symbol table of the token to be returned when the keyword is identified. • Return statement next to accepting state uses gettoken()and install_id() to obtain token and attribute value • When a lexeme is identified, symbol table is checked • if found as keywordinstall_id() will return 0 • If an identifier , pointer to symbol table entry will be returned • gettoken() will return the corresponding token

  8. Recognition of numbers • When accepting state is reached, • call a procedure install_num() that enters the lexeme into table of numbers and returns a pointer to created entry • Returns the token NUM

  9. Implementing lexical analyzer • Token nexttoken( ) • { • While (1) • { • switch(state) { • case 0: c=nextchar(); • If (c==blank|| c==tab|| c==newline) { • State =0; • lexeme_beginning++; • } • else if (c==’<’) state=1; • else if (c ==’=’)state=5; • else if (c==’>’) state=6; • else state=fail(); • break; • case 1: c= nextchar(); • if (c==’=’) state=2; • else if (c==’>’) state=3; • else state=4; • break; • case 2: token.attribute=LE; • token.name=relop; • return token;

  10. case 8: retract (1); • token.attribute=GT; • token.name=relop; • return token; • case 9: c= nextchar(); • if (isletter(c)) state=10; • else state= fail(); • break; • case 10: c= nextchar(); • if (isletter(c)) state=10; • else if (isdigit(c)) state=10; • else state=11; • break; • case11: retract (1); • entry=install_id( ); • name=gettoken(); • token.name= name; • token. attribute=entry; • return token; • break; • /* cases 12-24 here for numbers*/

  11. case 25: c= nextchar(); • if (isidgit(c)) state=26; • else state=fail(); • break; • case 26: c= nextchar(); • if (isidgit(c)) state=26; • else state=27; • break; • case 27:retract (1); install_num( ); • return (NUM); • } • } • }

  12. Code for next state • int state=0, start=0; • intlexical_value; • int fail() • { • forward=token_beginning; • switch( start){ • case 0:start=9; break; • case 9: start=12; break; • case 12: start=20; break; • case 20: start=25; break; • case 25: recover( ); break; • default: /* compiler error*/ • } • return start; • }

More Related