컴파일러 입문
This presentation is the property of its rightful owner.
Sponsored Links
1 / 40

컴파일러 입문 PowerPoint PPT Presentation


  • 110 Views
  • Uploaded on
  • Presentation posted in: General

컴파일러 입문. 제 7 장 LL 구문 분석. I. 결정적 구문 분석. ▶ Deterministic Top-Down Parsing ::= deterministic selection of production rules to be applied in top-down syntax analysis. ▶ One pass nobackup 1. Input string is scanned once from left to right.

Download Presentation

컴파일러 입문

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


4730574

컴파일러 입문

제 7 장

LL 구문 분석


4730574

I. 결정적 구문 분석

▶ Deterministic Top-Down Parsing

::= deterministic selectionof production rules to be applied

in top-down syntax analysis.

▶ One passnobackup

1. Input string is scannedonce from left to right.

2. Parsing process is deterministic.

▶ Top-down parsing with nobackup

::= deterministic top-down parsing.

called LL parsing.

"Left to right scanning and Leftparse"


4730574

▶ How to decide which production is to be applied:

sentential form : 1 2 … i-1Xα

input string : 1 2 … i-1 ii+1 …n

 X 1 | 2... | k ∈ P일 때,

i를 보고 X-production 중에unique하게 결정.

 the condition forno backtracking: FIRST와 FOLLOW가 필요.

(= LL condition)


First

FIRST

▶ Computation of FIRST(X), where X ∈ V.

1) if X∈VT, then FIRST(X) = {X}

2) if X∈VN and X a∈P, then FIRST(X) = FIRST(X)  {a}

if X  ∈ P, then FIRST(X) = FIRST(X)  {}

3) if X  Y1Y2 …Yk ∈ P and Y1Y2 …Yi-1*,

i

then FIRST(X) = FIRST(X)  ( FIRST(Yj) - {}).

j=1

if Y1Y2 …Yk* , then FIRST(X) = FIRST(X)  {}.

▶ FIRST() ::= the set of terminals that begin the strings derived from .

if * , then  is also in FIRST().

 FIRST(A) ::= { a∈VT∪{} | A * a,  ∈ V* }.


4730574

Text p.230

ex1) E  TE E+TE | 

T  FT T FT | 

F (E) | id

FIRST(E) = FIRST(T) = FIRST(F) = {(, id}

FIRST(E) = {+, }

FIRST(T) = {, }

ex2) PROGRAM  begin d semi X end

X  d semi X

X  s Y

Y  semi s Y | 

FIRST(PROGRAM) = {begin}

FIRST(X) = {d,s}

FIRST(Y) = {semi, }


4730574

▶ left-dependency graph

- the vertices are the terminal and nonterminal symbols and the

arcs go from X to Y if and only if X  X1...XnY, where

n  0, and each of X1,...,Xn can produce the empty string.

ex) S  AB

A  aA | 

B bB | 

A

a

S

B

b

FIRST(S) = {a, , b} FIRST(A) = {a, } FIRST(B) = {b, }


4730574

★ In general, A  A1A2...An

if A1 : non-nullable

if A1 : nullable

if A1A2 : nullable

A

A1

A1

A

A2

A1

A

A2

A3


Follow

FOLLOW

▶ FOLLOW(A)

::= the set of terminals that can appear immediately to the right

of A in some sentential form. If A can be the rightmost

symbol in some sentential form, then $ is in FOLLOW(A).

$ is the input right marker.

::= {a ∈ VT∪{$} | S *Aa, ,  ∈ V*}.

▶ Computation of FOLLOW(A)

1) FOLLOW(S) = {$}

2) if A B ∈ P and  ,

then FOLLOW(B) = FOLLOW(B) ∪ (FIRST() -)

3) if A B ∈ P or A B and *,

then FOLLOW(B) = FOLLOW(B) ∪ FOLLOW(A).


4730574

Text p.233

ex) E  TE'

E'  +TE' | 

T  FT'

T' FT' | 

F  (E) | id

Nullable = { E, T }

FIRST(E) = FIRST(T) = FIRST(F) = {(, id}

FIRST(E) = {+, } FIRST(T) = {, }

FOLLOW(E) = {),$} FOLLOW(E') = {),$}

FOLLOW(T) = {+,),$} FOLLOW(T') = {+,),$}

FOLLOW(F) = {,+,),$}


4730574

▶ LL condition

::= no backup condition

::= the condition for deterministic parsing of top-down method.

input : 12 ... i-1i ...n

derived string : 12...i-1X

X 1 | 2 ... | m

i를 보고 X-production들 중에서 X를 확장할 rule을

결정적으로 선택.

★ <LL condition> A  | ∈ P,

1. FIRST()  FIRST() = 

2. if * , FOLLOW(A)  FIRST() =


4730574

ex) A  aBc | Bc | dAa

B  bB | 

FIRST(A) = {a,b,c,d} FOLLOW(A) = {$,a}

FIRST(B) = {b, } FOLLOW(B) = {c}

1) A  aBc | Bc | dAa에서,

FIRST(aBc)  FIRST(Bc)  FIRST(dAa)

= {a}  {b,c}  {d} = 

2) B  bB | 에서,

FIRST(bB)  FOLLOW(B) = {b}  {c} = 

1), 2)에 의해 LL 조건을 만족한다.


Ii recursive descent

II. Recursive-descent 파서

▶ Recursive-descent parsing

::= A top-down method that uses a set of recursiveprocedures to

recognize its input with no backtracking.

▶ create a procedure for each nonterminal.

ex) G : S  aA | bB

A  aA | c

B  bB | d

procedure pS;

begin if nextsymbol = qa then

begin get_nextsymbol; pAend

else if nextsymbol = qb then

begin get_nextsymbol; pB end

else error

end;


4730574

 = aac$

procedure pA;

begin if nextsymbol = qa then begin get_nextsymbol; pA end

else if nextsymbol = qc then get_nextsymbol

else error

end;

procedure pB; ...

(* main *)

begin get_nextsymbol;

pS;

if next_symbol = '$' then accept else error

end.

 Procedure call sequence ::= leftmost derivation


4730574

▶ The main problem in constructing a recursive-descent syntax

analyzer is the choice of productions when a procedure is first

entered. To resolve this problem, we can compute the lookahead

of each production.

▶ LOOKAHEADof a production

Definition: LOOKAHEAD(A)

= FIRST({ | S *A*∈ VT*}).

Meaning : the set of terminals which can be generated by  and

if *, then FOLLOW(A) is added to the set.

Computing formula: LOOKAHEAD(A  X1X2...Xn)

= FIRST(X1X2...Xn)  FOLLOW(A)


4730574

ex) S  aSA | 

A  c

Nullable Set = {S}

FIRST(S) = {a, } FOLLOW(S) = {$,c}

FIRST(A) = {c} FOLLOW(A) = {$,c}

LOOKAHEAD(S  aSA) = FIRST(aSA)  FOLLOW(S) = {a}

LOOKAHEAD(S ) = FIRST()  FOLLOW(S) = {$,c}

LOOKAHEAD(A  c) = FIRST(c)  FOLLOW(A) = {c}

 Nullable => FIRST => FOLLOW => LOOKAHEAD


4730574

▶ Strong LL condition

 Definition : A   |  ∈ P,

LOOKAHEAD(A  )  LOOKAHEAD(A ) = .

 Meaning : for each distinct pair of productions with the same

left-hand side, it can select the unique alternate

that derives a string beginning with the input symbol.

 Definition : the grammar G is said to be strong LL(1)

if it satisfies the strong LL condition.

ex) G : S  aSA | 

A  c

 LOOKAHEAD(S  aSA) = {a}

 LOOKAHEAD(S ) = FOLLOW(S) = {$, c}

LOOKAHEAD(S  aSA)  LOOKAHEAD(S ) = 

 G는 strong LL(1)이다.


4730574

▶ Implementation of Recursive-descent parser

 If a grammar is strong LL(1), we can construct a parser for sentences of the

grammar using the following scheme.

a ∈ VT,

procedure pa; (* get_nextsymbol=scanner *)

begin

if nextsymbol = qa then get_nextsymbol

else error

end;

get_nextsymbol : 스캐너에 해당하는 루틴으로 입력 스트림으로부터

토큰 한 개를 읽어 변수 nextsymbol에 할당하는 일을

한다.


4730574

Text p.240

A ∈ VN,

procedure pA;

var i: integer;

begin

case nextsymbol of

LOOKAHEAD(A  X1X2...Xm): for i := 1 to m do pXi;

LOOKAHEAD(A  Y1Y2...Yn): for i := 1 to n do pYi;

:

LOOKAHEAD(A  Z1Z2...Zr): for i := 1 to r do pZi;

LOOKAHEAD(A ): ;

otherwise: error

end (* case *)

end;


4730574

▶ Improving the efficiency and structure of recursive-descent parser

1) Eliminating terminal procedures

::= In practice it is better not to write a procedure for each terminal.

Instead the action of advancing the input marker can always be initiated

by the nonterminal procedures. In this way many redundant tests can

be eliminated.

ex) text p.241 [예9]

2) BNF EBNF : reduce the number of productions and nonterminals.

① repetitive part : { }

② optional part : [ ]

③ alternation : ( | )


4730574

ex) < IF_st > ::= 'if ' < C > ' then ' < S > [ 'else ' < S > ]

procedure pIF;

begin if nextsymbol = qif then

begin get_nextsymbol; pC;

if nextsymbol = qthen then

begin get_nextsymbol; pS end

else error(10)

end

else error(20);

if nextsymbol = qelse then

begin get_nextsymbol; pS end

end;


4730574

ex) <id_list> ::= ' id ' { ' , ' ' id ' }

procedure pID_LIST;

begin if nextsymbol = qid then

begin get_nextsymbol;

while (nextsymbol = qcomma) do

begin get_nextsymbol;

if nextsymbol = qid then get_nextsymbol

else error

end

end

end;


4730574

<문제> 다음 grammar를 extended BNF로 바꾸고 그에 따른

recursive-descent parser를 위한 procedure를 작성하시오.

<D> ::= ' label ' <L> | ' integer ' <L>

<L> ::= <id> <R>

<R> ::= ' ; ' | ' , ' <L>

<L>  <id> (' , ' <id> )*' ; '

 <D> ::= ( ' label ' | ' integer ' ) <id> {' , ' <id>} ' ; '

*


4730574

procedure pD;

begin if nextsymbol in [qlabel,qinteger] then

begin get_nextsymbol;

if nextsymbol = qid then

begin get_nextsymbol;

while (nextsymbol = qcomma) do

begin get_nextsymbol;

if nextsymbol = qid

then get_nextsymbol

else error(3)

end

end

else error(2);

if nextsymbol = qsemi then get_nextsymbol

else error(4)

end

else error(1)

end;


4730574

Programming Assignment #1

 Implement a recursive-descent syntax analyzer for the grammar

given in exercise 5.24(text p. 189).

 Problem Specifications

- input : SPL program to find a Minimum and a Maximum.

- output : left parse

- methods : (1) write the get_nextsymbol routine.

(2) compute LOOKAHEADs for each production.

(3) create a procedure for each nonterminal.

(4) assemble the procedures with main program.

a set of

productions

LOOKAHEADs for

each nonterminal

Computation of

LOOKAHEADs


Iii predictive parsing

 $

: input

$

Driver routine

output

Table

stack

III. Predictive Parsing

▶ Predictive parsing

::= a deterministic parsing method using a stack. The stack contains a sequence of grammar symbols.

▶ Model of a predictive parser


4730574

 Current input symbol과 stack top symbol 사이의 관계에 따라 parsing.

The input buffer contains the string to be parsed, followed by $.

Initial configuration : STACK INPUT

$S $

 Parsing table(LL) : parsing action을 결정지어 줌.

※ M[X,a] = r : stack top symbol이 X이고 current symbol이 a일 때,

r번 생성 규칙으로 expand.

terminals

a

r

nonterminals X


4730574

▶ Parsing Actions

X : stack top symbol, a : current input symbol

1. if X = a = $, then accept.

2. if X = a, then pop X and advance input.

3. if X ∈ VN, then if M[X,a] = r (X),

then replace X by 

else error.


4730574

Text p.246

▶ Predictive parsing algorithm

set ip to point to the first symbol of $;

repeat

let X be the top stack symbol and a the symbol pointed to by ip;

if X is a terminal or $ then

if X = a then

pop X from the stack and advance ip

else error(1)

else /* X is nonterminal */

if M[X,a] = X  Y1Y2...Yk then

begin pop X from the stack;

push YkYk-1,...,Y1 onto the stack, with Y1 on top;

output the production X  Y1Y2...Yk

end

else error(2)

until X = $ /* stack is empty */


4730574

ex) G : 1. S  aSb

2. S  bA

3. A  aA

4. A  b

string : aabbbb

• Parsing Table:

terminals

a

b

nonterminals

S

1

2

A

3

4


4730574

STACK INPUT ACTIONS OUTPUT

$S aabbbb$ expand 1 1

$bSa aabbbb$ pop a and advance

$bS abbbb$ expand 1 1

$bbSa abbbb$ pop a and advance

$bbS bbbb$ expand 2 2

$bbAb bbbb$ pop b and advance

$bbA bbb$ expand 4 4

$bbb bbb$ pop b and advance

$bb bb$ pop b and advance

$b b$ pop b and advance

$ $ Accept

※ How to construct a predictive parsing table for the grammar.


Vi predictive

VT

a

VN

X

VI. Predictive 파싱 테이블의 구성

▶ main idea : If A  is a production with a in FIRST(), then

the parser will expand A by  when the current

input symbol is a. And if *, then we should

again expand A by  when the current input symbol

is in FOLLOW(A).

▶ parsing table(LL):

M[X,a] = r : expand X with r-production

blank : error


4730574

▶ Algorithm : for each production A,

1. a ∈ FIRST(), M[A,a] := <A>

2. if *, then b ∈ FOLLOW(A), M[A,b] := <A>.

ex) G: 1. E  TE' 2. E'  +TE' 3. E'  4. T  FT'

5. T' FT' 6. T'  7. F  (E) 8. F  id

FIRST(E)=FIRST(T)=FIRST(F)={ ( , id }

FIRST(E')={ + ,  } FIRST(T')={  ,  }

FOLLOW(E) = FOLLOW(E') = { ) , $ }

FOLLOW(T) = FOLLOW(T') = { + , ) , $ }

FOLLOW(F) = { + ,  , ) , $ }


4730574

  • Parsing Table:

Terminals

id

+

*

(

)

$

Nonterminals

E

1

1

E'

2

3

3

T

4

4

T'

6

5

6

6

F

8

7


4730574

▶ LL(1) Grammar

::= a grammar whose parsing table has no multiply-defined entries.

 multiply 정의되면 어느 rule로 expand해야 할 지 결정할 수 없기 때

문에 deterministic하게 parsing할 수 없다.

▶ LL(1) condition: A  | ,

1. FIRST() FIRST() = .

2. if , then FOLLOW(A) FIRST() =  .

ex) G : 1. S  iCtSS' 2. S  a

3. S'  eS 4. S' 

5. C  b

FIRST(S) = {i,a} FOLLOW(S) = {$,e}

FIRST(S') = {e, } FOLLOW(S') = {$,e}

FIRST(C) = {b} FOLLOW(C) = {t}

*


4730574

Parsing Table:

M[S',e] := <3,4>로 중복으로 정의되었음.

여기서, stack top이 S'이고 input symbol이 e일 때 3번 rule로

expand해야 할 지, 4번 rule로 expand해야 하는지 알 수 없다.

그러므로 G는 LL(1) grammar가 아니다.

ex) text p.252 예제14)

G : S  aA | abA  : abab

A Ab | a

a

b

e

i

t

$

S

2

1

S'

3,4

4

C

5


V strong ll k and ll k grammars

V. Strong LL(k) and LL(k) Grammars

▶ FIRSTk() = {| *, || = k or  and || < k}

▶ G is said to be strong LL(k), for some fixed integer k > 0, if

whenever there are two leftmost derivations.

1. S *A*x∈ VT*, and

2. S *A*y∈ VT* such that

3. FIRSTk(x) = FIRSTk(y). It follows that

4.  = .

▶ Meaning: Suppose we consider any state of the parse in which A is

the nonterminal currently being parsed and FIRSTk(x) is

the k-lookahead at the current point. Then, if the

k-lookahead is same, the two productions A  and A 

are identical. Any other information provided by the

closed portion and the open portion of the current state

of the parse will be disregarded.


4730574

▶ S A,  : closed portion,  : open portion

▶ Two states of the parse

FIRSTk(x) = FIRSTk(y) ===>  = .

*

S

S

A

A

x

y


4730574

▶ Def) LL(k) grammar:

1. S Ax ∈ VT*, and

2. S Ay ∈ VT* such that

3. FIRSTk(x) = FIRSTk(y). It follows that

4.  = .

ex) S  aAaa | bAba

A  b | 

S S

a A a a b A b a

b 

 lookahead가 ba일 때 A  b, A 중 어느 rule을 택할 수

있는가? 이제 본 symbol이a이면 A  b를 선택하고, b이면

A 를 선택한다. 따라서 SLL(2)는 아니며 LL(2)가 된다.

*

*

*

*


4730574

LL(k)

SLL(k)

▶ SLL(k) and LL(k)

▶ <theorem> strong LL(1)  LL(1)

Proof) () clear!

() Suppose that G is not strong LL(1).

Then, by definition, there are two distinct productions

A   and A  such that,

S 1A111111111

S 2A222222222

and FIRST(11) = FIRST(22).

*

*

*

*

*

*


4730574

Now we must prove that G is not LL(1).

1) 1= 2= , G is not LL(1).

Indeed, it is ambiguous.

2) one (or both) of 1 and 2 is not . 1.

FIRST1(1 1) = FIRST1(1) = FIRST1(2 2).

but then,

S 2A2222 12212

S 2A2222 22222

satisfy the property

FIRST1(1 2) = FIRST1(1) = FIRST1(2 2).

Thus, by definition, G is not LL(1).

*

*

*

*

*

*


  • Login