580 likes | 759 Views
CSC 3315 Lexical and Syntax Analysis. Hamid Harroud School of Science and Engineering, Akhawayn University http://www.aui.ma/~H.Harroud/csc3315/. Lexical Analysis. Convert source file characters into token stream. Remove content-free characters (comments, whitespace, ...)
E N D
CSC 3315Lexical and Syntax Analysis HamidHarroud School of Science and Engineering, Akhawayn University http://www.aui.ma/~H.Harroud/csc3315/
Lexical Analysis • Convert source file characters into token stream. • Remove content-free characters (comments, whitespace, ...) • Detect lexical errors (badly-formed literals, illegal characters, ...) • Output of lexical analysis is input to syntax analysis. • Idea: Look for patterns in input character sequence, convert to tokens with attributes, and pass them to parser in stream.
Specifying Lexical Analysers • Can define lexical analyzer via list of pairs: (regular expression, action) where regular expression describes token pattern and action is a piece of code, parameterized by the matching lexeme, that returns a (token, attribute) pair • Example • (digit+, {return new Token(NUM,parseInt(lexeme));}) • (alpha(alpha|digit)∗, {return new Token(ID,lexeme);}) • (space|tab|newline, {}) • (.,.) • So R.E’s can help us specify scanners.
Regular Expressions • A regular expression (R.E.) is a concise formal characterization of a regular language. • Example: The regular language containing all IDENTs is described by the regular expression letter (letter | digit)∗ where “| ” means “or” and “e∗” means “zero or more copies of e.” • Regular languages are one particular kind of formal languages.
Finite Automaton Input String Output “Accept” or “Reject” Finite Automaton
Transition Graph initial state accepting state transition state
Initial Configuration Input String
Reading the Input Input finished accept
Rejection Input finished reject
Another Rejection reject
Another Example accept
Rejection Example Input finished reject
Languages Accepted by FAs • FA • The language contains all input strings accepted by = { strings that bring to an accepting state}
Example accept
Example accept accept accept
Formal Definition Finite Automaton (FA) : set of states : input alphabet : transition function : initial state : set of accepting states
Example = { all strings with prefix } accept
Example = { all strings without substring }
Regular Languages • Definition: • A language is regular if there is • FA such that • Observation: • All languages accepted by FAs form the family of regular languages
Examples of Regular Languages There exist automata that accept these Languages (see previous slides). { all strings with prefix } { all strings without substring } There exist languages which are not Regular: There is no FA that accepts such a language.