slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
October 25, 2014 PowerPoint Presentation
Download Presentation
October 25, 2014

Loading in 2 Seconds...

  share
play fullscreen
1 / 15
Download Presentation

October 25, 2014 - PowerPoint PPT Presentation

maryam-sampson
113 Views
Download Presentation

October 25, 2014

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Computer Science at Azusa Pacific University CS400 Compiler Construction Sheldon X. Liang Ph. D. October 25, 2014 Azusa, CA 1 October 25, 2014 Azusa Pacific University, Azusa, CA 91702,Tel: (800) 825-5278 Department of Computer Science,http://www.apu.edu/clas/computerscience/

  2. CS@APU: CS400 Compiler Construction The Reason Why Lexical Analysis is a Separate Phase • Simplifies the design of the compiler • LL(1) or LR(1) parsing with 1 token lookahead would not be possible (multiple characters/tokens to match) • Provides efficient implementation • Systematic techniques to implement lexical analyzers by hand or automatically from specifications • Stream buffering methods to scan input • Improves portability • Non-standard symbols and alternate character encodings can be normalized (e.g. trigraphs) 2 October 25, 2014 Azusa Pacific University, Azusa, CA 91702,Tel: (800) 825-5278 Department of Computer Science,http://www.apu.edu/clas/computerscience/

  3. CS@APU: CS400 Compiler Construction Interaction of the Lexical Analyzer with the Parser Token,tokenval SourceProgram Parser LexicalAnalyzer Get nexttoken error error Symbol Table 3 October 25, 2014 Azusa Pacific University, Azusa, CA 91702,Tel: (800) 825-5278 Department of Computer Science,http://www.apu.edu/clas/computerscience/

  4. CS@APU: CS400 Compiler Construction Attributes of Tokens Lexical analyzer y := 31 + 28*x <id, “y”> <assign, > <num, 31> <+, > <num, 28> <*, > <id, “x”> token Parser tokenval(token attribute) 4 October 25, 2014 Azusa Pacific University, Azusa, CA 91702,Tel: (800) 825-5278 Department of Computer Science,http://www.apu.edu/clas/computerscience/

  5. CS@APU: CS400 Compiler Construction Lexical Analysis & Lexical Analyzer Generators Regular Expressions Finite Automata Formalization RE Conversion FA Lexer Design 5 October 25, 2014 Azusa Pacific University, Azusa, CA 91702,Tel: (800) 825-5278 Department of Computer Science,http://www.apu.edu/clas/computerscience/

  6. CS@APU: CS400 Compiler Construction Keep in mind following questions • Token • Lexical units • Atom parse element • Abstracted in syntax: e.g. Id • Lexeme • Specific string making up token • Value / attribute related to a token • Concrete in language, e.g., Amt • Spec of patterns for tokens • Alphabet - a finite set • String s - a finite sequence from  • Language – a specific set of strings 6 October 25, 2014 Azusa Pacific University, Azusa, CA 91702,Tel: (800) 825-5278 Department of Computer Science,http://www.apu.edu/clas/computerscience/

  7. CS@APU: CS400 Compiler Construction Tokens, Patterns, and Lexemes • A token is a classification of lexical units • For example: id and num • Lexemes are the specific character strings that make up a token • For example: abc and 123 • Patterns are rules describing the set of lexemes belonging to a token • For example: “letter followed by letters and digits” and “non-empty sequence of digits” 7 October 25, 2014 Azusa Pacific University, Azusa, CA 91702,Tel: (800) 825-5278 Department of Computer Science,http://www.apu.edu/clas/computerscience/

  8. CS@APU: CS400 Compiler Construction Specification of Patterns for Tokens: Definitions • An alphabet is a finite set of symbols (characters) • A string s is a finite sequence of symbols from  • s denotes the length of string s •  denotes the empty string, thus  = 0 • A language is a specific set of strings over some fixed alphabet  8 October 25, 2014 Azusa Pacific University, Azusa, CA 91702,Tel: (800) 825-5278 Department of Computer Science,http://www.apu.edu/clas/computerscience/

  9. CS@APU: CS400 Compiler Construction Specification of Patterns for Tokens: String Operations • The concatenation of two strings x and y is denoted by xy • The exponentation of a string s is defined by s0 = si = si-1s for i > 0note that s = s = s 9 October 25, 2014 Azusa Pacific University, Azusa, CA 91702,Tel: (800) 825-5278 Department of Computer Science,http://www.apu.edu/clas/computerscience/

  10. CS@APU: CS400 Compiler Construction Specification of Patterns for Tokens: Language Operations • UnionL M = {ssL or sM} • ConcatenationLM = {xyx L and yM} • ExponentiationL0 = {}; Li = Li-1L • Kleene closure L* = i=0,…,Li • Positive closureL+ = i=1,…,Li 10 October 25, 2014 Azusa Pacific University, Azusa, CA 91702,Tel: (800) 825-5278 Department of Computer Science,http://www.apu.edu/clas/computerscience/

  11. CS@APU: CS400 Compiler Construction Specification of Patterns for Tokens: Regular Expressions • Basis symbols: •  is a regular expression denoting language {} • a   is a regular expression denoting {a} • If r and s are regular expressions denoting languages L(r) and M(s) respectively, then • rs is a regular expression denoting L(r)  M(s) • rs is a regular expression denoting L(r)M(s) • r* is a regular expression denoting L(r)* • (r) is a regular expression denoting L(r) • A language defined by a regular expression is called a regular set 11 October 25, 2014 Azusa Pacific University, Azusa, CA 91702,Tel: (800) 825-5278 Department of Computer Science,http://www.apu.edu/clas/computerscience/

  12. CS@APU: CS400 Compiler Construction Nondeterministic Finite Automata • An NFA is a 5-tuple (S, , , s0, F) whereS is a finite set of states is a finite set of symbols, the alphabet is a mapping from S to a set of statess0  S is the start stateF  S is the set of accepting (or final) states 12 October 25, 2014 Azusa Pacific University, Azusa, CA 91702,Tel: (800) 825-5278 Department of Computer Science,http://www.apu.edu/clas/computerscience/

  13. CS@APU: CS400 Compiler Construction Conversion of an NFA into a DFA • The subset construction algorithm converts an NFA into a DFA using:-closure(s) = {s}  {ts  …  t} -closure(T) = sT -closure(s)move(T,a) = {ts a t and s  T} • The algorithm produces:Dstates is the set of states of the new DFA consisting of sets of states of the NFADtran is the transition table of the new DFA 13 October 25, 2014 Azusa Pacific University, Azusa, CA 91702,Tel: (800) 825-5278 Department of Computer Science,http://www.apu.edu/clas/computerscience/

  14. CS@APU: CS400 Compiler Construction Got it with following questions • Tokens • Lexical units • Atom parse element • Abstracted in syntax: e.g. Id • Lexeme • Specific string making up token • Value / attribute related to a token • Concrete in language, e.g., Amt • Spec of patterns for tokens • Alphabet - a finite set • String s - a finite sequence from  • Language – a specific set of strings 14 October 25, 2014 Azusa Pacific University, Azusa, CA 91702,Tel: (800) 825-5278 Department of Computer Science,http://www.apu.edu/clas/computerscience/

  15. CS@APU: CS400 Compiler Construction Thank you very much! Questions? 15 October 25, 2014 Azusa Pacific University, Azusa, CA 91702,Tel: (800) 825-5278 Department of Computer Science,http://www.apu.edu/clas/computerscience/