Lexical Analysis (2 Lectures). Overview. Basic Concepts Regular Expressions Language Lexical analysis by hand Regular Languages Tools NFA DFA Scanning tools Lex / Flex / JFlex / ANTLR. Scanning Perspective. Purpose Transform a stream of symbols Into a stream of tokens.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Lexical Analysis
(2 Lectures)
“
”
“
”
fi(a==f(x)) …generates no lexical error in C
Alphabet Language
{0,1}{0,10,100,1000,10000,…}
{0,1,100,000,111,…}
{a,b,c}{abc,aabbcc,aaabbbccc,…}
{A…Z}{TEE,FORE,BALL…}
{FOR,WHILE,GOTO…}
{A…Z,a…z,0…9,{All legal PASCAL progs}
+,,…,<,>,…}{All grammatically correct English Sentences}
Special Languages: Φ – EMPTY LANGUAGE
ε – contains empty string ε only
Precedeence
tab{A,…,Z,a,...,z}*{A,…,Z,a,....,z}*bat
{A,…,Z}*1 {A,…,Z}*2 {A,…,Z}*3 {A,…,Z}*
…
…
…
“+”
“?”
…
class Scanner {
InputStream _in;
char _la; // The lookahead character
char[] _window; // lexeme window
Token nextToken() {
startLexeme(); // reset window at start
while(true) {
switch(_state) {
case 0: {
_la = getChar();
if (_la == ‘<’) _state = 1;
else if (_la == ‘=’) _state = 5;
else if (_la == ‘>’) _state = 6;
else failure(state);
}break;
case 6: {
_la = getChar();
if (_la == ‘=’) _state = 7;
else _state = 8;
}break;
}
}
}
}
case 7: {
return new Token(GEQUAL);
}break;
case 8: {
pushBack(_la);
return new Token(GREATER);
}
An NFA is a mathematical model that consists of :
• S, a set of states
•Σ, the symbols of the input alphabet
•move, a transition function.
•move(state, symbol) → set of states
•move : S ×Σ∪{∈} → Pow(S)
• A state, s0∈ S, the start state
• F ⊆ S, a set of final or accepting states.
Transition Diagrams :
Transition Tables:
Number states (circles), arcs, final states, …
More suitable to representation within a computer
We’ll see examples of both !
∈
0
2
1
j
i
a
start
a
b
b
3
b
S = { 0, 1, 2, 3 }
s0 = 0
F = { 3 }
Σ = { a, b }
What Language is defined ?
What is the Transition Table ?
∈(null) moves possible
i n p u t
a
b
0
{ 0, 1 }
{ 0 }
state
1

{ 2 }
Switch state but do not use any input symbol
2

{ 3 }
Build a Disjunction
0
2
1
a
start
a
b
b
3
b
• Given an input string, we trace moves
• If no more input & in final state, ACCEPT
EXAMPLE: Input: ababb
OR
move(0, a) = 0
move(0, b) = 0
move(0, a) = 1
move(1, b) = 2
move(2, b) = 3
ACCEPT !
move(0, a) = 1
move(1, b) = 2
move(2, a) = ? (undefined)
REJECT !
0
2
1
4
a
start
a
b
b
3
a
b
a
a, b
Σ
0
2
1
a
start
a
b
b
3
b
aabb is accepted along path :
0 → 0 → 1 → 2 → 3
BUT… it is not accepted along the valid path:
0 → 0 → 0 → 0 → 0
computing the εclosure
forall(t in T) push(t);
initialize εclosure(T) to T;
while stack is not empty do begin
t = pop();
for each u ε S with edge t→u labeled ε
if u is not in εclosure(T)
add u to εclosure(T) ;
push u onto stack
computing the
The set of states
The transitions
let Q = εclosure(s0) ;
D = { Q };
enQueue(Q)
while queue not empty do
X = deQueue();
for each a ε Σ do
Y := εclosure(move(X,a));
T[X,a] := Y
if Y is not in D
D = D U { Y }
enQueue(Y);
end
end
::= s in Σ
::= rs
::= r  s
::= r*
r13
r5
r12

r3
r4
r11
r10
)
(
a
r9
a
r1
r2
r7
r8

r0
c
*
r6
*
b
b
c
r3:
r0:
r2:
a
b
c
∈
∈
∈
∈
a
∈
b
∈
b
∈
b
∈
c
∈
c
r4 : r1 r2
r1:
r5 : r3 r4
∈
∈
∈
r7:
b
∈
b
b
∈
∈
∈
c
∈
r8:
∈
∈
∈
∈
r11:
a
a
∈
∈
c
c
∈
∈
r9 : r7  r8
r12 : r11 r10
∈
r6:
c
∈
∈
r10 : r9
∈
a
∈
b
∈
c
2
3
4
5
6
7
∈
∈
∈
17
1
b
10
11
∈
∈
∈
∈
∈
a
∈
c
∈
8
9
12
13
14
15
16
∈
r13 : r5  r12
(a  b)*abb
∈
∈
(abc)*ab
etc.
Recognizer!