Download
lexical analyzer n.
Skip this Video
Loading SlideShow in 5 Seconds..
Lexical Analyzer PowerPoint Presentation
Download Presentation
Lexical Analyzer

Lexical Analyzer

194 Views Download Presentation
Download Presentation

Lexical Analyzer

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Lexical Analyzer Lecturer: Esti Stein brd4.braude.ac.il/~esti2 61102 Compilers Software Eng. Dept. – Ort Braude

  2. What is a lexical analyzer? Stream of (token, value) pairs Read in characters and group them into tokens. [most of the compilation time is spent on lexical analysis]. Source Program Lexical Analyzer Symbol table 61102 Compilers Software Eng. Dept. – Ort Braude

  3. Why using a lexical analyzer? • Modular design – partitioning the compiler to independent parts. • The parser is dealing with words (not characters). • Isolate character set dependencies: • ASCII versus EBCDIC • Isolate representation of symbols: • < > versus != , { } versus begin..end 61102 Compilers Software Eng. Dept. – Ort Braude

  4. A token is: A place holder for logical entity: • keywords • constants • operators • punctuation • Identifiers Not white spaces and comments. 61102 Compilers Software Eng. Dept. – Ort Braude

  5. Example of tokenizing if( val1 + val2 >= 6.5) todo = false; 61102 Compilers Software Eng. Dept. – Ort Braude

  6. Example [program]: token Getoken( ) { SkipWhiteSpace( ); c = getchar( ); if( isletter(c )) return( ScanForIdentifier( ) ); if( isdigit(c )) return( ScanForConstant( ) ); switch( c) { case ‘(‘: return( LEFT_PAREN); case ‘)‘: return( RIGHT_PAREN); case ‘+’: return( ScanForAddOrIncrement( )); case ‘=‘: return( ScanForAssignOrEqual( )); case ‘/’: return( ScanForCommentOrDivide( )); … default: return( ERROR); } } 61102 Compilers Software Eng. Dept. – Ort Braude

  7. Automating: Most tokens can be easily defined by a regular grammar: • the user defines tokens in a form equivalent to regular grammar • the system converts the grammar into code. Variety of tools – lex, flex .. 61102 Compilers Software Eng. Dept. – Ort Braude

  8. Regular Expressions & Automata See at the “Technion” tutorial – about automata. 61102 Compilers Software Eng. Dept. – Ort Braude

  9. Exercise 1: A real number consists of two parts: • The integer part, consisting of one or more digits. A number may not begin with a zero, unless the integer part is just zero. • The decimal part, consisting of a decimal point followed by one or more digits. Construct a regular expression for real numbers. 61102 Compilers Software Eng. Dept. – Ort Braude

  10. Converting an NDFA to a DFA Convert to DFA… 61102 Compilers Software Eng. Dept. – Ort Braude

  11. Converting an NDFA to a DFA[2] 61102 Compilers Software Eng. Dept. – Ort Braude

  12. The Code S: c = getchar( ); if( c = = ‘a’) goto SA; if( c = = ‘b’) goto S; error( ); SA: c = getchar( ); if( c = = ‘a’) goto SAC; if( c = = ‘b’) goto S; error( ); SAC: c = getchar( ); if( c = = ‘a’) goto SAC; if( c = = ‘b’) goto SBC; error( ); … 61102 Compilers Software Eng. Dept. – Ort Braude

  13. The Code[2] token LexicalDriver( LexTable) { state = laststate; for(;;) { c = NextChar( ); state = LexTable[ state, c]; if( state != error && state != finalstate) { AddToToken( c); AdvanceInput( ); } else break; } if( state != finalstate) return( ERROR); else return( Token[ finalstate]); } 61102 Compilers Software Eng. Dept. – Ort Braude

  14. Output Lexical Errors • A compiler produce a listing of the compiled program + error messages – near the locations of the errors. • The errors are queued and printed once a new-line is reached. • Two ways for recover: • Ignore erroneous token, and start new token. • Delete the 1st char. Read and start re-reading the input. (complicate!) • Be careful not to propagate error messages! 61102 Compilers Software Eng. Dept. – Ort Braude

  15. LEX – the Lexical Analyzer See at the “Technion” tutorial – about the Lex. 61102 Compilers Software Eng. Dept. – Ort Braude