1 / 14

CPSC 325 - Compiler

CPSC 325 - Compiler. Tutorial 2 Scanner & Lex. Tokens. Input. Token Stream: Each significant lexical chunk of the program is represented by a token Operators & Punctuation: { } ! + - = * ; : … Keywords: if while return goto Identifier: id & actual name

webb
Download Presentation

CPSC 325 - Compiler

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CPSC 325 - Compiler Tutorial 2 Scanner & Lex

  2. Tokens Input • Token Stream: Each significant lexical chunk of the program is represented by a token • Operators & Punctuation: { } ! + - = * ; : … • Keywords: if while return goto • Identifier: id & actual name • Constants: kind & value; int, floating-point character, string, …

  3. Input text if( x >= y ) y = 10; Token Stream Token – example 1 IF LP ID(x) GEQ ID(y) RP ID(y) Assign INT(10) SEMI

  4. Tokens Parser IF LP ID(x) GEQ ID(y) RP ID(y) Assign INT(10) SEMI IfStmt >= assign ID(y) ID(y) INT(10) ID(x)

  5. Sample Grammar • Program ::= statement | program statement • Statement ::= assignStmt | ifStmt • assignStmt ::= id = expr; • ifStmt ::= if ( expr ) Statement • Expr ::= id | int | expr + expr • id ::= a | b | … | y | z • Int ::= 1 | 2 | … | 9 | 0

  6. Why Separate the Scanner and Parser? • Simplicity & Separation of Concerns • Scanner hides details from parser (comments, whitespace, input files, etc.) • Parser is easier to build; has simpler input stream • Efficiency • Scanner can use simpler, faster design • (But still often consumes a surprising amount of the compiler’s total execution time)

  7. Principle of Longest Match • In most of languages, the scanner should pick the longest possible string to make up the next token if there is a choice. • Example return apple != banana; Should be recognized as 5 tokens Not more (not parts of words or identifier, or ! And = as separate tokens) return ID(apple) NEQ ID(banana) SEMI

  8. Scanner DFA Example (1) White space or comments 0 Accept EOF 1 end of input Accept LP ( 2 Accept RP ) 3 4 ; Accept SEMI

  9. Scanner DFA Example (2) White space or comments Accept NEQ 6 ! = 5 Accept NOT 7 other 8 < = Accept LEQ 9 other 10 Accept LESS

  10. Scanner DFA Example (3) White space or comments [0-9] [0-9] 11 Accept INT other 12

  11. Scanner DFA Example (4) White space or comments [a-zA-Z] [a-zA-Z] 13 Accept ID or keyword other 14

  12. Lex/Flex • Use Flex instead of Lex • Use Bison instead of yacc • When compile, link to the library • flex file.lex • gcc –o object lex.yy.c –ll • object

  13. Lex - Structure • Declarations/Definitions %% • Rules/Production - Lex expression - white space - C statement (optional) %% • Additional Code/Subroutines

  14. Lex – Basic operators • * - zero or more occurrences • . - “ANY” character • .* - matches any sequence • | - separator • + - one or more occurrences. (a+ :== aa*) • ? - zero or one of something. (b? :== (b+null) • [ ] - choice, so [12345]  (1|2|3|4|5) (Note: [*+] represent a choice between star and plus. They lost their specialty. • - - [a-zA-Z]  a to z and A to Z, all the letters. • \ - \* matches *, and \. Match period or decimal point.

More Related