120 likes | 265 Views
COMP313A Programming Languages. Lexical Analysis (2). Lookahead. <=, <>, < When we read a token delimiter to establish a token we need to make sure that it is still available It is the start of the next token! This is lookahead Decide what to do based on the character we ‘haven’t read’
 
                
                E N D
COMP313A Programming Languages Lexical Analysis (2)
Lookahead • <=, <>, < • When we read a token delimiter to establish a token we need to make sure that it is still available • It is the start of the next token! • This is lookahead • Decide what to do based on the character we ‘haven’t read’ • Sometimes implemented by reading from a buffer and then pushing the input back into the buffer • And then starting with recognizing the next token
Classic Fortran example • DO 99 I=1,10 becomes DO99I=1,10 versus DO99I=1.10 • When can the lexical analyzer assign a token? • Push back into input buffer • or ‘backtracking’
Finite Automata • A recogniser determines if an input string is a sentence in a language • Uses a regular expression • Turn the regular expression into a finite automaton • Could be deterministic or non-deterministic
Transition diagram for identifiers • RE • Identifier -> letter (letter | digit)* letter accept start letter other 0 1 2 digit
a start a b b accept 0 1 2 3 b Non-deterministic finite state automata b a start b b a accept 0 1 2 3 a b a Equivalent deterministic finite state automata
Transition Table (NFA) Input Symbol
Transition Table (DFA) Input Symbol
From a Regular Expression to an NFAThompson’s Construction (a | b)* abb e a 2 3 e e start e e a b b 0 1 6 7 8 9 10 e e 4 5 accept b e
Converting an NFA to a DFA • Subset Construction • NFA – each entry in the transition table is a set of states • In the resulting DFA each state will correspond to a set of NFA states • A DFA state keeps track of all the states the NFA can be in after reading an input symbol
Subset Construction • Work out all the states reachable directly from the start state on epsilon transitions (e-closure). Combine these into the start state for the DFA…. • We’ll do the rest on the board in the lecture
LEX (FLEX) • Tool for generating programs which recognise lexical patterns in text • Takes regular expressions and turns them into a program • You will learn the basics in a lab on Thursday