- 112 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' LEXICAL ANALYSIS' - uttara

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Outline

- Introduction to Lexical Analysis
- Token specification
- Language
- Regular Expressions (REs)

- Token recoginition
- REs NFA (Thompson’s construction, Algorithm 3.3)
- NFA DFA (subset construction, Algorithm 3.2)
- DFA minimal DFA (Algorithm 3.6)

- Programming

Lexical Analysis

Introduction

- Read the input characters
- Produce as output a sequence of tokens
- Eliminate white space and comments

token

lexical analyzer

source program

parser

get next token

symbol table

Lexical Analysis

Tokens, Patterns, Lexemes

Lexical Analysis

Outline

- Introduction
- Token specification
- Language
- Regular Expressions (REs)

- Token recoginition
- REs NFA (Thompson’s construction, Algorithm 3.3)
- NFA DFA (subset construction, Algorithm 3.2)
- DFA minimal DFA (Algorithm 3.6)

- Programming

Lexical Analysis

Alphabet, Strings and Languages

- Alphabet ∑: any finite set of symbols
- The Vietnamese alphabet {a, á, à, ả, ã, ạ, b, c, d, đ,…}
- The binary alphabet {0,1}
- The ASCII alphabet

- String: a finite sequence of symbols drawn from ∑ :
- Length |s| of a string s: the number of symbols in s
- The empty string, denoted , || = 0

- Language: any set of strings over ∑;
- its two special cases:
- : the empty set
- {}

- its two special cases:

Lexical Analysis

Examples of Languages

- ∑ ={a, á, à, ả, ã, ạ, b, c, d, đ,…}
- Vietnamese language

- ∑ = {0,1}
- A string is an instruction
- The set of Pentium instructions

- ∑ = the ASCII set
- A string is a program
- The set of C programs

Lexical Analysis

Terms (Fig.3.7)

Lexical Analysis

String operations

- String concatenation
- If x and y are strings, xy is the string formed by appending y to x.
E.g.: x = hom, y = nay xy = homnay

- is the identity: y = y; x = x

- If x and y are strings, xy is the string formed by appending y to x.
- String exponentiation
- s0 =
- si = si-1s
E.g. s = 01, s0 = , s2 =0101, s3 = 010101

Lexical Analysis

Language Operations (Fig 3.8)

Lexical Analysis

Examples

- L = {A,B,…,Z,a,b,…,z}
- D = {0,1,…,9}

letters and digits

strings consists of a letter followed by a digit

all four-letter strings

all strings of letters, including

all strings of letters and digits beginning with a letter

all strings of one or more digits

Lexical Analysis

Regular Expressions (Res) over Alphabet ∑

- Inductive base:
- is a RE, denoting the RL {}
- a ∑ is a RE, denoting the RL {a}

- Inductive step: Suppose r and s are REs, denoting the language L(r) and L(s). Then
- (r)|(s) is a RE, denoting the RL L(r) L(s)
- (r)(s) is a RE, denoting the RL L(r)L(s)
- (r)* is a RE, denoting the RL (L(r))*
- (r) is a RE, denoting the RL L(r)

Lexical Analysis

Precedence and Associativity

- Precedence:
- “*” has the highest precedence
- “concatenation” has the second highest precedence
- “|” has the lowest precedence

- Associativity:
- all are left-associative
E.g.: (a)|((b)*(c)) a|b*c

Unnecessary parentheses can be removed

- all are left-associative

Lexical Analysis

Example

- ∑ = {a, b}
- a|b denotes {a,b}
- (a|b)(a|b) denotes {aa,ab,ba,bb}
- a* denotes {,a,aa,aaa,aaaa,…}
- (a|b)* denotes ?
- a|a*b denotes ?

Lexical Analysis

Notational Shorthands

- One or more instances +: r+ = rr*
- denotes the language (L(r))+
- has the same precedence and associativity as *

- Zero or one instance ?: r? = r|
- denotes the language (L(r) {})

- Character classes
- [abc] denotes a|b|c
- [A-Z] denotes A|B|…|Z
- [a-zA-Z_][a-zA-Z0-9_]* denotes ?

Lexical Analysis

Outline

- Introduction
- Token specification
- Language
- Regular Expressions (REs)

- Token recoginition
- REs NFA (Thompson’s construction, Algorithm 3.3)
- NFA DFA (subset construction, Algorithm 3.2)
- DFA minimal DFA (Algorithm 3.6)

- Programming

Lexical Analysis

Nondeterministic finite automata

- A nondeterministic finite automaton (NFA) is a mathematical model that consists of
- a finite set of states S
- a set of input symbols ∑
- a transition function move: S ∑ S
- a start state s0
- a finite set of final or accepting states F

Lexical Analysis

Acceptance

- A NFA accepts an input string x iff there is some path in the transition graph from start state to some accepting state such that the edge labels along this path spell out x.

0

0

01010

B

1

0

0

1

0

A B A B A B

1

1

0

01011

0

0

1

1

1

error

A B A B A ?

Lexical Analysis

Deterministic finite automata

- A deterministic finite automaton (DFA) is a special case of NFA in which
- no state has an -transition, and
- for each state s and input symbol a, there is at most one edge labeled a leaving s.

Lexical Analysis

Thompson’s construction of NFA from REs

- guided by the syntactic structure of the RE r
- For ,
- For a in ∑

i

f

a

i

f

Lexical Analysis

i

f

f

Thompson’s construction (cont’d)- Suppose N(s) and N(t) are NFA’s for REs s and t
- For s|t,
- For st,
- For s*,
- For (s), use N(s) itself

N(s)

N(t)

N(t)

N(s)

f

i

N(t)

Lexical Analysis

Outline

- Introduction
- Token specification
- Language
- Regular Expressions (REs)

- Token recoginition
- REs NFA (Thompson’s construction)
- NFA DFA (subset construction)
- DFA minimal DFA (Algorithm 3.6)

- Programming

Lexical Analysis

Subset construction (cont’d)

Let s0 be the start state of the NFA;

Dstates contains the only unmarked state -closure(s0);

while there is an unmarked state T in Dstatesdo begin

mark T

for each input symbol a do begin

U := -closure(move(T; a));

if U is not in Dstatesthen

Add U as an unmarked state to Dstates;

DTran[T; a] := U;

end;

end;

Lexical Analysis

DFA

- Let (∑, S, T, F, s0) be the original NFA. The DFA is:
- The alphabet: ∑
- The states: all states in Dstates
- The transitions: DTran
- The accepting states: all states in Dstates containing at least one accepting state in F of the NFA
- The start state: -closure(s0)

Lexical Analysis

Outline

- Introduction
- Token specification
- Language
- Regular Expressions (REs)

- Token recoginition
- REs NFA (Thompson’s construction)
- NFA DFA (subset construction)
- DFA minimal DFA (Algorithm 3.6)

- Programming

Lexical Analysis

Minimise a DFA

Initially, create two states:

- one is the set of all final states: F
- the other is the set of all non-final states: S - F
while (more splits are possible) {

Let S = {s1,…, sn} be a state and c be any char in ∑

Let t1,…, tn be the successor states to s1,…, sn under c

if (t1,…, tn don't all belong to the same state) {

Split S into new states so that si and sj remain in the

same state iff ti and tj are in the same state

}

}

Lexical Analysis

B

D

A

C

B

D

Exampleb

Step1: {A,B,C,D} {E}

For a, {B,B,B,B}

For b, {C,D,C,E}

Split {A,B,C} {D} {E}

Step 2:

For b, {C,D,C}

Split {A,C} {B} {D} {E}

Step 3:

For a, {B,B}

For b, {C,C}

Terminate

b

b

a

a

b

b

E

a

a

a

b

b

b

a

b

b

E

a

a

a

Lexical Analysis

Outline

- Introduction
- Token specification
- Language
- Regular Expressions (REs)

- Token recoginition
- REs NFA (Thompson’s construction)
- NFA DFA (subset construction)
- DFA minimal DFA (Algorithm 3.6)

- Programming

Lexical Analysis

Input Buffering

begin…

Scanner

if (forward at end of first half) {

reload second half

forward++

} else

if (forward at end of second half) {

reload first half

forward = 0

} else

forward++

eof

Lexical Analysis

Input Buffering

begin…

Scanner

eof

forward = forward + 1

if (forward↑=eof) {

if (forward at end of first half) {

reload second half

forward++

} else

if (forward at end of second half) {

reload first half

forward = 0

} else

terminate the analysis

}

eof

eof

Lexical Analysis

1

6

5

Transition Diagrams<

=

relop <= | < |<>

return(relop,LE)

2

>

return(relop,NE)

3

other

4

return(relop,LT)

letter

other

return(id,lexeme)

7

id letter(letter|digit)*

letter or digit

Transition diagram is a DFA in which there is no edge leaving out of a final state

Lexical Analysis

Implementation

token nexttoken() {

while (1) {

switch (state) {

case 0: c = nextchar();

if (c == ‘<‘) state = 1;

else state = fail(0);

break;

case 1: c = nextchar();

if (c == ‘=‘) state = 2;

else if (c == ‘>’ state = 3;

else state = 4;

break;

case 2: retract(0);

return new Token(relop,”<=”);

case 4: retract(1);

return new Token(relop,”<”);

case 5: c = nextchar();

if (Character.isLetter(c))

state = 6;

else state = fail(5);

break;

case 6: c = nextchar();

if (Character.isLetter(c)

||Character.isDigit(c))

continue;

else state = 7;

break;

case 7: retract(1);

return new Token(id,

getLexeme());

Lexical Analysis

Implemetation (cont’d)

int fail(int current_state) {

forward = beginning;

switch (current_state) {

case 0: return 5;

case 5: error();

}

}

void retract(int flag) {

if (flag ==1)

move forward back

get lexeme from beginning to forward

move forward onward

beginning = forward

state = 0

}

b│e│g│i│n│:│=│ │ │…

Lexical Analysis

Outline

- Introduction
- Token specification
- Language
- Regular Expressions (REs)

- Token recoginition
- REs NFA (Thompson’s construction)
- NFA DFA (subset construction)
- DFA minimal DFA (Algorithm 3.6)

- Programming

Lexical Analysis

Download Presentation

Connecting to Server..