1 / 18

Lexical Analysis

Lexical Analysis. Lecture 2 Mon, Jan 17, 2005. Tokens. A token has a type and a value. Types include ID , NUM , ASSIGN , LPAREN , etc. Values are used primarily with identifiers and numbers. If we read “count”, the type is ID and the value is “count”.

rneely
Download Presentation

Lexical Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lexical Analysis Lecture 2 Mon, Jan 17, 2005

  2. Tokens • A token has a type and a value. • Types include ID, NUM, ASSIGN, LPAREN, etc. • Values are used primarily with identifiers and numbers. • If we read “count”, the type is ID and the value is “count”. • If we read “123”, the type is NUM and the value is “123”.

  3. Analyzing Tokens • Each type of token can be described by a regular expression. (Why?) • Therefore, the set of all tokens can be described by a regular expression. (Why?) • Regular expressions are accepted by DFAs. • Therefore, the tokens can be processed and accepted by a DFA.

  4. Regular Expressions • The set of all regular expressions may be defined in two parts. • The basic part. •  represents the language {}. • a represents the language {a} for every a  . • Call these languages L() and L(a), respectively.

  5. Regular Expressions • The recursive part. • Let r and s denote regular expressions. • r | s represents the language L(r)  L(s). • rs represents the language L(r)L(s). • r* represents the language L(r)*. • In other words • L(r | s) = L(r)  L(s). • L(rs) = L(r)L(s). • L(r*) = L(r)*.

  6. Example: Identifiers • Identifiers in C++ can be represented by a regular expression. • r = A | B | … | Z | a | b | … | z • s = 0 | 1 | … | 9 • t = r(r | s)*

  7. Regular Expressions • A regulardefinition of a regular expression is a “grammar” of the form • d1 r1, • d2 r2, : • dn rn, where each ri is a regular expression over   {d1, …, di – 1}.

  8. Regular Expressions • Note that this definition does not allow recursively defined tokens. • In other words, di cannot be defined in terms of di, even indirectly.

  9. Example: Identifiers • We may now describe C++ identifiers as follows. • letter  A | B | … | Z | a | b | … | z • digit  0 | 1 | … | 9 • id  letter(letter | digit)*

  10. Lexical Analysis • After writing a regular expression for each kind of token, we may combine them into one big regular expression describing all tokens. • id  letter(letter | digit)* • num  digit(digit)* • relop  < | > | == | != | >= | <= • token  id | num | relop | …

  11. Transition Diagrams • A regular expression may be represented by a transition diagram. • The transition diagram provides a good guide to writing a lexical analyzer program.

  12. letter | digit letter id: digit digit num: letter letter | digit token: digit digit Example: Transition Diagram

  13. Transition Diagrams • Unfortunately, it is not that simple. • At what point may we stop in an accepting state? • Do not read “count” as 5 identifiers: “c”, “o”, “u”, “n”, “t”. • When we stop in an accepting state, we must be able to determine the type of token processed. • Did we read the ID token “count” or did we read the IF token “if”?

  14. = ! < = = = ==: !=: <=: Transition Diagrams • Consider transitions diagrams to accept relational operators ==, !=, <, >, <=, and >=. and so on.

  15. 2 = | ! = 1 4 relop: < | > = 3 Transition Diagrams • Combine them into a single transition diagram.

  16. Transition Diagrams • When we reach an accepting state, how can we tell which operator was processed?. • In general, we design the diagram so that each kind of token has its own accepting state.

  17. Transition Diagrams • If we reach state 3, how do we decide whether to continue to state 4? • We read characters until the current character does not match any pattern, i.e., it would lead to the dead state. • At that point, we accept the string, minus the last character. • Later, processing resumes with the last character.

  18. = = other relop: ! = other < other = other other > = other Transition Diagrams

More Related