Lexical Analysis — Scanner (textbook ch#3.1–3.4, 3.6, 3.7, 3.5, 3.8). 薛智文 [email protected] http://www.csie.ntu.edu.tw/~cwhsueh/ 98 Spring. Lexical Analyzer vs. Parser. source program. semantic analysis. Lexical Analyzer. token. Parser. getNextToken. Symbol Table.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Lexical Analysis — Scanner
(textbook ch#3.1–3.4, 3.6, 3.7, 3.5, 3.8)
薛智文
http://www.csie.ntu.edu.tw/~cwhsueh/
98 Spring
source
program
semantic
analysis
Lexical
Analyzer
token
Parser
getNextToken
Symbol
Table
/37
/37
/37
/37
∞
∞
/37
/37
/37
/37
e.g. {anbn| n ≧ 0}.
/37
A is a nonterminal and α, β and γ are strings of terminals and nonterminals.
The strings α and β may be empty, but γ must be nonempty.
/37
a
b
c
a
start
c
0
1
2
3
/37
a
c
a
start
b
c
0
1
2
3
and a set of accepting states are also given.
/37
a
a
ε
1
start
0
b
b
ε
3
2
4
/37
a
0
0
c
c
b
a
b
c
2
3
1
2
3
1
a
c
a
start
b
c
0
1
2
3
s starting state;
while there are inputs and s is a legal state do
s Table[s, input]
end while
ifs∈ accepting states then ACCEPT else REJECT
/37
a
start
a
b
b
0
1
2
b
b
b
a
a
a
b
a
2
0
0
0
0
1
0
0
b
0 Reject!
3 Accept!
3
/37
ε
starting state for r
starting state for r
NFA for r
NFA for r
ε
start
ε
start
r|s
r＊
NFA for s
ε
ε
starting state for s
ε
accepting state for r
accepting state for r
starting state for s
starting state for r
NFA for r
ε
start
NFA for s
rs
ε
convert all accepting states in r into non accepting states and add ε−transitions
/37
a
ε
2
3
ε
start
a
b
b
ε
ε
ε
ε
0
1
6
7
8
9
10
11
b
ε
4
5
ε
ε
12
/37
start
a
ε
2
3
ε
a
b
b
ε
ε
ε
ε
0
1
6
7
8
9
10
11
b
ε
4
5
ε
ε
12
/37
/37
a
ε
2
3
ε
start
a
b
b
ε
ε
ε
ε
0
1
6
7
8
9
10
11
b
ε
4
5
ε
ε
12
0,1,2,3,4,
6,7,8,9
start
a
0,1,2,4,6,7
b
0,1,2,4,5,6,7
/37
a
ε
2
3
ε
start
a
b
b
ε
ε
ε
ε
0
1
6
7
8
9
10
11
b
ε
4
5
ε
ε
b
D
b
a
start
a
b
b
A
B
C
12
E
a
a
b
/37
/37
/37
/37
/37
/37
Can be skipped
/37
%{
/* some initial C programs */
#define BEGINSYM 1
#define INTEGER 2
#define IDNAME 3
#define REAL 4
#define STRING 5
#define SEMICOLONSYM 6
#define ASSIGNSYM 7
%}
/* regular definitions */
Digit [0-9]
Letter [a-zA-Z]
IntLit {Digit}+
Id {Letter}({Letter}|{Digit}|_)*
/37
%%
[ \t\n] {/* skip white spaces */}
[Bb][Ee][Gg][Ii][Nn] {return(BEGINSYM);}
{IntLit} {return(INTEGER);}
{Id} {
printf("var has %d characters, ",yyleng);
return(IDNAME);
}
({IntLit}[.]{IntLit})([Ee][+-]?{IntLit})? {return(REAL);}
\"[^\"\n]*\" {stripquotes(); return(STRING);}
";" {return(SEMICOLONSYM);}
":=" {return(ASSIGNSYM);}
. {printf("error --- %s\n",yytext);}
/37
%%
/* some final C programs */
stripquotes()
{
/* handling string within a quoted string */
int frompos, topos=0, numquotes = 2;
for(frompos=1; frompos<yyleng; frompos++){
yytext[topos++] = yytext[frompos];
}
yyleng -= numquotes;
yytext[yyleng] = ’\0’;
}
void main(){
int i;
i = yylex();
while (i>0 && i < 8) {
printf("<%s> is %d\n",yytext,i);
i = yylex();
}
}
/37
austin% lex test.l
austin% cc lex.yy.c -ll
austin% cat data
Begin
123.3 321.4E21
x := 365;
"this is a string"
austin% a.out < data
<Begin> is 1
<123.3> is 4
<321.4E21> is 4
var has 1 characters, <x> is 3
<:=> is 7
<365> is 2
<;> is 6
<this is a string> is 5
%austin
/37
{ action1
· · ·
}
/37
/37
/37
/37
ififthenelse=then ;
id id id
/37
/37