제 5 장 Context-Free 문법

컴파일러 입문 제 5 장 Context-Free 문법

서론 ▶ regular expression: the lexical structure of tokens - recognizer : FA(scanner) - id = l(l + d)* ▶CFG: the syntactic structure of programming languages - recognizer : PDA(parser) ▶ 프로그래밍 언어의 구문 구조를CFG로 표현할 경우의 장점: 1. 간단하고 이해하기 쉽다. 2. CFG로부터 인식기를 자동으로 구성할 수 있다. 3. 프로그램의 구조를 생성규칙에 의해 구분할 수 있으므로 번역시에 유용하다.

▶ CFG의 form : N. Chomsky의 type 2 grammar • A , where A  VN and  V* . • ▶recursive construction • ex) E  E OP E | (E) | -E | id • OP  |  |  | / | ↑ • VN =  E, OP  • VT =  (, ), , id, , , /, ↑ • ex) <if_statement>  'if' <condition> 'then' <statement> • VN : <와 >사이에 기술된 symbol. • VT : '와 '사이에 기술된 symbol.

유도와 유도 트리 ▶ Derivation : 12 start symbol로부터 sentence를 생성하는 과정에서 nonterminal을 이 nonterminal로 시작되는 생성 규칙의 right hand side로 대치 하는 과정. (1)  : derives in one step. if A  P, ,  V* then A. (2) * : derives in zero or more steps. 1.  V*, * 2. if *  and  then * (3) + : derives in one or more steps.

▶ L(G) : the language generated by G • = { | S *,  ∈ VT*} • ▶definition : • sentence : S *,  VT* 모두terminal로만 구성. • sentential form : S *,  V*. • ▶Choosing a nonterminal being replaced • - sentential form에서 어느 nonterminal을 선택할 것인가 ? • A , where  V*. • leftmost derivation: 가장 왼쪽에 있는 nonterminal을 대치해 • 나가는 방법. • rightmost derivation: 가장 오른쪽에 있는 nonterminal을 대치.

▶ A derivation sequence 01 ... n is called a • leftmost derivation if and only if i+1 is obtained from • i by applying a production to the leftmost nonterminal • in i for all i, 0  i  n-1. • ii+1 : 가장 왼쪽에 있는 nonterminal을 차례로 대치. • ▶parse : parser의 출력 형태 중에 한가지. •  left parse : leftmost derivation에서 적용된 생성 규칙 번호. • - top-down parsing • - start symbol로부터 sentence를 생성 •  right parse : rightmost derivation에서 적용된 생성 규칙 • 번호의 역순. • - bottom-up parsing • - sentence로부터 nonterminal로 reduce되어 결국엔 start symbol로 reduce.

유도 트리 ::= a graphical representation for derivations. ::= the hierarchical syntactic structure of sentences that is implied by the grammar. ▶ Definition : derivation tree CFG G = (VN,VT,P,S) &  VT* drawing a derivation tree. 1. nodes: symbol of V(VN VT) 2. root node: S(start symbol) 3. if A  VN, then a node A has at least one descendent. 4. if A  A1A2...An P, then A가 subtree의 root가 되고 좌로부터 A1,A2,...,An가 A의 자 노드가 되도록 tree를 구성. A A1 A2 ... An

▶ Nodes of derivation tree • internal(nonterminal) node  VN • external(terminal) node VT{} • ▶ ordered tree - child node들의 위치가 순서를 갖는 tree, • 따라서 derivation tree는 ordered tree이다. • A A •  • A1 A2 A2 A1

▶ Ambiguous Grammar • - A context-free grammar G is ambiguous if and only if it • produces more than one derivation trees for some sentence. • nondeterministic • - 설명 : 같은 sentence를 생성하는 tree가 2개 이상 존재할 때 • 이 grammar를 ambiguous하다고 하며, 결정적인 파싱을 • 위해 nondeterministic한 grammar를 deterministic하게 • 변환해야 한다. • (O) • AmbiguousNondeterministic • (X)

▶"G: ambiguous 증명" 하나의 sentence로 부터 2개 이상의 • derivation tree 생성. • ex) dangling else problem: • G: S  if C then S else S | if C then S | a • C  b • : if b then if b then a else a • 1) 2) • S S • if C then S else S if C then S • b if C then S a b if C then S else S • b a b a a

※ else : 일반적으로 right associativity를 만족한다. • if 문장의 경우 자신과 가장 가까운 if와 결합 • 하므로 두개의 트리 중 일반적으로 2)를 만족. • ▶ In a more general form, the ambiguity appears when there is • a production of the following form. • - production form : A  AA • - sentential form : AAA • - tree form : • A A • A α A or A α A • A α A A α A

▶ ambiguous  unambiguous • 1) 새로운 nonterminal을 도입해서 unambiguous grammar로 변환. • 2) 이 과정에서, precedence & associativity 규칙을 이용. • ☞ nondeterministic  deterministic • 예) G : E  E  E | E + E | a •  : a  a + a • – precedence rule의 적용 • 1) + >  2)  > + • E E • E  E E + E • a E + E E  E a • a a a a

새로운 nonterminal의 도입 • G : E  E + T | T • T  T * F | F • F  a • E • E + T • T T * F • F F a • a a ※ 그런데, grammar의 ambiguity를 check할 수 있는 algorithm이 존재하지 않으며, unambiguous하게 바꾸는 formal한 방법도 존재하지 않는다.

 id * id + id의 derivation tree: • expression • expression + term • term factor • term * factor primary • factor primary element • primary element id • element id • id - derivation tree가 하나 이므로 위 grammar는 unambiguous하다.

Text p.150 문법 변환 III.1 Introduction III.2 Useless Productions III.3-Productions III.4 Single productions III.5 Canonical Forms of Grammars

III.1 Introduction • ▶ Given a grammar, it is often desirable to modify the grammar so that • a certain structure is imposed on the language generated. •  grammar transformations without disturbing the language generated. • ▶ Definition : Two grammars G1 and G2 are equivalent if L(G1) = L(G2). • ▶ Two techniques • (i) Substitution : • if A B, B 1 | 2 | 3 … | n  P, then • P' = ( P - {A B} )  {A 1 | 2 | ... | n}. • (ii) Expansion : • A  <=> A X, X or A  X, X  • ex) P : S  aA | bB • A  bB | b • B  aA | a • ▶ All grammars can be transformed the equivalent grammars through • the substitution and expansion techniques.

III.2 Useless Productions • ▶ A useless production in a context-free grammar is one which can not be used in the generation of a sentence in the language defined by the grammar. •  it can be eliminated. • ▶ Definition : We say that a symbol X is useless if • not ∃S *Xy*xy, ,x,y VT*. • ▶ Splitting up into two distinct problems: • (i) Terminating nonterminal : • A  , * , where A  VN and  VT*. • (ii) Accessible symbol: • S *X, where X ∈ V and , V*. • ▶ An algorithm to remove useless productions will involve computing the terminating nonterminals followed by the accessible symbols.

p.153 • ▶ Terminating nonterminal을 구하는 방법: • Algorithm terminating; • begin • VN' := { A | A ∈ P,  ∈ VT* }; • repeat • VN' := VN' ∪ { A | A∈ P, ∈ ( VN' U VT )* } • until no change • end. p.154 • ▶ Accessible symbol을 구하는 방법: • Algorithm accessible; • begin • V' := { S }; (* initialization *) • repeat • V' := V' ∪ { X | some A X∈ P, A ∈ V' } • until no change • end.

▶ Uselessproductionremoval : • (1) Apply the algorithm for the terminating nonterminal. • (2) Apply the algorithm for the accessible symbol. • ex) S  A | B • A  aB | bS | b • B  AB | BB • C  AS | b • III.3 - Productions • ▶ Definition : We call that a production is  -production • if the form of the production is A , A  VN. • ▶ Definition : We say that a CFG G = (VN, VT, P, S ) is  -free if • (1) P has no -production, or • (2) There is exactly one -production S  and • S does not appear on the right hand side of any • productions in P. 연습문제 5.9 (1) p.185

▶ Conversion to an -free grammar: • Algorithm -free; • begin • VN := { A | A =>+ , A  VN }; (* nullable nonterminal *) • P' := P – { A  | A  VN }; • for A0B11B2...BKK ∈ P' ,where i ≠ and Bi VNdo • ifBi ∈ P' then • P' = P' ∪{ A 0X11X2...XKK | Xi = Bior Xi = } • else • P' = P' ∪{ A 0X11X2...XKK | Xi = } • end if • end for • ifS  VNthen P' := P' ∪ { S'  | S } • end. • ex1) A  AaA | ε ex2) S  aAbB • A  aA | ε • B  ε p.157

III.4 Single productions • ▶ Definition : A  B, where A,B  VN. • Algorithm Remove_Single_Production; • begin • P' := P – { A  B | A, B  VN }; • for each A  VNdo • VNA = { B | A *B } ; • for each B  VNAdo • for each B  P' do (* not single production *) • P' := P' ∪ { A  α} • end for • end for • end for • end. •  main idea : grammar substitution. p.158

ex) S  aA | A • A  bA | C • C  c • S  aA | bA | c • A  bA | c • C  c • ▶ Definition : • A CFG G = ( VN , VT, P, S ) is said to be cycle-free • if there is no derivation of the form A + A for any A in VN. • G is said to be proper if it is cycle-free, is -free, • and has no useless symbols.

III.5 Canonical Forms of Grammars • ▶ Definition: A CFG G = (VN,VT,P,S) is said to be in Chomsky Normal • Form(CNF) if each production in P is one of the forms • (1) A  BC with A,B, and C in VN, or • (2) A  a with a  VT, or • (3) if   L(G), then S  is a production, and S does not appear on • the right side of any production. • ▶ Conversion to CNF • A  X1 X2....XK, K > 2 •  A  X1' <X2 ... XK> • <X2...XK>  X2' <X3 ... XK> •  •  •  • <XK-1...XK>  XK-1' XK’ , where Xi' = Xi if Xi VN • add Xi'  Xi if Xi VT • ex) S  bA • A  bAA | aS | a

▶ Definition : A CFG G = (VN,VT,P,S) is said to be in Greibach Normal Form(GNF) if G is -free and each non--production in P is of the form A  a with a  VT and  VN*.

CFG 표기법 ☞ BNF(Backus-Naur Form), EBNF(Extended BNF), Syntax Diagram ▶ BNF  특수한 meta symbol을 사용하여 프로그래밍 언어의 구문을 명시하는 표기법. meta symbol : 새로운 언어의 구문을 표기하기 위하여 도입된 심벌들. • nonterminal symbol < > •  ::= (치환) • nonterminal symbol의 rewriting | (또는) - terminal symbol : ' ' - grammar symbol : VN ∪ VT

예1) VN = {S, A, B}, VT = {a, b} • P = {S AB, A  aA, A  a, B  Bb, B  b} •  BNF 표현: • <S> ::= <A> • <A> ::= a <A> | a • ::= b | b • 예2) Compound statement •  BNF 표현: • <compound-statement> ::= begin <statement-list> end • <statement-list> ::= <statement> • | <statement-list> ; <statement> <S> ::= <A> <A> ::= ' a ' <A> | ' a ' ::= ' b ' | ' b '

▶ Extended BNF(EBNF) • - 특수한 의미를 갖는 meta symbol을 사용하여 반복되는 부분이나 • 선택적인부분을 간결하게 표현. • meta symbol • 예1) <compound-statement> ::= begin <statement> {;<statement>} end • 예2) <if-statement> ::= if <condition> then <statement> [else <statement>] • 예3) <exp> ::= <exp> + <exp> | <exp> - <exp> • | <exp>  <exp> | <exp> / <exp> • <exp> ::= <exp> (  |  |  | / ) <exp> • 반복되는 부분(repetitive part): { } • 선택적인 부분(optional part): [ ] • 괄호와 택일 연산자(alternative): ( | )

▶ Syntax diagram • - 초보자가 쉽게 이해할 수 있도록 구문 구조를 도식화하는 방법 • - syntax diagram에 사용하는 그래픽 아이템: • 원 : terminal symbol • 사각형 : nonterminal symbol • 화살표 : 흐름 경로 • ▶ syntax diagram을 그리는 방법: • 1. terminal a • 2. nonterminal A a A

3. A ::= X1X2 ... Xn • (1) Xi가 nonterminal인 경우: • A • (2) Xi가 terminal인 경우: • A • 4. A ::= 1| 2|...| n ... X1 X2 Xn ... X1 X2 Xn 1 2 A .. 3

5. EBNF A ::= {} • 6. EBNF A ::= [] • 7. EBNF A ::= (1 | 2) A  A  1 A  2

(예) A ::= a | (B) • B ::= AC • C ::= {+A} A a B ( ) B A C C A +

A a ( A ) A +

a1 a2 . . . an : input tape • Finite state • control Z1 Z2 Zn stack 푸시다운 오토마타 ☞ PDA, context-free 언어와 PDA 언어 V.1 PDA ▶ CFG의 인식기-- push-down list(stack), input tape, finite state control

Text p.172 • ▶ Definition: PDA P = (Q, , , , q0, Z0, F), • where, Q : 상태 심벌의 유한 집합. •  : 입력 알파벳의 유한 집합. •  : 스택 심벌의 유한 집합. •  : 사상 함수 Q  ( ∪{})  Q *, • q0 ∈ Q : 시작 상태(start state), • Z0 ∈ F : 스택의 시작 심벌, • F ⊆ Q : 종결 상태(final state)의 집합이다. • ▶  : Q  ( ∪ {})  Q  * • (q,a,Z) ={(p1, 1), (p2, 2), ... ,(pn, n)} • 의미: 현재 상태가 q이고 입력 심벌이 a이며 스택 top 심벌이 Z일 때, • 다음 상태는 n개 중에 하나이며 만약 (pi,i)가 선택되었다면 • 그 의미는 다음과 같다. • (1) 현재의 q 상태에서 입력 a를 본 다음 상태는 pi이다. • (2) 스택 top 심볼 Z를 i로 대치한다.

▶ P의 configuration : (q, , ) • where, q : current state •  : input symbol •  : contents of stack • ▶ P의 이동(move)ː┣ • 1) a   : (q, a, Z) ┣ ( q', , ) • 2) a =  : (q, , Z) ┣ (q', , ) <===> -move • ※ ┣* : zeroor more moves, ┣+ : one or more moves • ▶ L(P) : the language accepted by PDA P. • - start state : (q0, , Z0) • - final state : (q, , α), where q ∈ F,  ∈ * • L(P) = {ω | (q0, , Z0) ┣* (q, , ), q ∈ F,  ∈ * }.

ex) P = ({q0,q1,q2}, {0, 1}, {Z, 0}, , q0, Z, {q0}), • where, (q0, 0, Z) = {(q1, 0Z)} (q1, 0, 0) = {(q1, 00)} • (q1, 1, 0) = {(q2, )} (q2, 1, 0) = {(q2, )} • (q2, , Z) = {(q0, )} • - 스트링 0011의 인식 과정: • (q0, 0011, Z) ┣ (q1, 011, 0Z) ┣ (q1, 11, 00Z) • ┣ (q2, 1, 0Z) ┣ (q2, , Z) ┣ (q0, , ) • - 스트링 0n1n(n≥1)의 인식 과정: • (q0, 0n1n, Z) ┣ (q1, 0n-11n, 0Z) ┣n-1 (q1, 1n, 0nZ) • ┣ (q2, 1n-1, 0n-1Z) ┣n-1 (q2, , Z) • ┣ (q0, , ) • ∴ L(P) = {0n1n | n  1}.

Text p.176 • ▶ 확장된 PDA • δ : Q × (∪{}) × * → Q × * • - 한번의 move로 stack top 부분에 있는 유한 길이의 string을 다른 • string으로 대치. • (q, a, ) ┣ (q', , ) • - stack이 empty일 때도 move가 발생 • 예) PDA = ({q, p}, {a, b}, {a, b, S, Z}, , q, Z, {p}) • where, (q, a, ) = {(q, a)} • (q, b, ) = {(q, b)} • (q, , ) = {(q, S)} ※ S : center mark • (q, , aSa) = {(q, S)} • (q, , bSb) = {(q, S)} • (q, , SZ ) = {(p, )}

Text p.177 • 스트링 aabbaa의 인식 과정: • (q, aabbaa, Z) ┣ (q, abbaa, aZ) • ┣ (q, bbaa, aaZ) • ┣ (q, baa, baaZ) • ┣ (q, baa, SbaaZ) • ┣ (q, aa, bSbaaZ) • ┣ (q, aa, SaaZ) • ┣ (q, a, aSaaZ) • ┣ (q, a, SaZ) • ┣ (q, , aSaZ) • ┣ (q, , SZ) • ┣ (q, , ) • ∴ L = { R |  ∈ {a, b}+}.

▶ Le(P) : stack을 empty로 만드는 PDA에 의해 인식되는 string의 집합. • 즉, 입력 심벌을 모두 보고 스택이 empty일 때만 인식되는 • string의 집합. • ∴ Le(P) = {  | (q0, , Z0) ┣* (q, , ), q ∈ Q} . • ▶ Le(P') = L(P)인 P'의 구성: (text p.178) • P = (Q, , , , q0, Z0, F) • ===> P' = (Q∪{qe, q'},  , ∪{Z'}, ', q', Z',  ), • where  ' : 1) 모든 q ∈ Q, a ∈ ∪{}, Z ∈ 에 대해, • '(q, a, Z) = (q, a, Z). • 2) '(q', , Z') = {(q0, Z0Z')}. Z' : bottom marker • 3) 모든 q ∈ F, Z ∈ ∪{Z'}에 대해, • '(q, , Z)에 (qe, )를 포함. • 4) 모든 Z ∈ ∪{Z'}에 대해, '(qe, , Z) = {(qe, )}

V.2 Context-free 언어와 PDA 언어 • ☞ a language is accepted by a PDA if it is a context-free language. • L(CFG) = L(PDA) • ▶ CFG <===> PDA (i) CFG ===> PDA(for a given context-free grammar, • we can construct a PDA accepting L(G).) • ① Top-Down Method • --- leftmost derivation, A  • ② Bottom-Up Method • --- rightmost derivation,  ==>A (ii) PDA ===> CFG

Text p.179 • ▶ Top-Down Method • ☞ CFG G로부터 Le(R)=L(G)인 PDA R의 구성 • For a given G = (VN, VT, P, S), • construct R = ({q}, VT, VN∪VT, , q, S,  ), • where  : 1) if A  ∈ P, then (q,,A)에 (q,)를 포함. • 2) a ∈ VT에 대해, (q, a, a) = {(q, )}. • ex) G = ({E, T, F}, {a, , +, (, )}, P, E), • P : E  E + T | T • T  T  F | F • F  ( E ) | a • ===> R = ({q}, , , , q, E, ) • where  : 1) (q, , E) = {(q, E + T), (q, T)} • 2) (q, , T) = {(q, T  F), (q, F)} • 3) (q, , F) = {(q, (E)), (q, a)} • 4) (q, t, t) = {(q, )}, t∈{a, +, , (, )}.

스트링 a + a  a의 인식 과정: • (q, a + a  a, E) ┣ (q, a + a  a, E + T) • ┣ (q, a + a  a, T + T) • ┣ (q, a + a  a, F + T) • ┣ (q, a + a  a, a + T) • ┣ (q, + a  a, + T) • ┣ (q, a  a, T) • ┣ (q, a  a, T  F) • ┣ (q, a  a, F  F) • ┣ (q, a  a, a  F) • ┣ (q,  a,  F) • ┣ (q, a, F) • ┣ (q, a, a) • ┣ (p, , ) • ※ 스택 top은 세 번째 구성 요소의 왼쪽.

▶ Bottom-Up Method • ☞ CFG ===> extended PDA(rightmost derivation) • ex) G = ({E, T, F}, {a, , +, (, )}, P, E), • P : E  E + T | T • T  T  F | F • F  ( E ) | a • ===> R = ({q, r}, VT, VN ∪ VT∪{$}, , q, $, {r}) •  : 1) (q, t, ) = {(q, t)}, t ∈{a, +, , (, )}  shift • 2) (q, , E + T) = {(q, E)} • (q, , T) = {(q, E)} • (q, , T * F) = {(q, T)} • (q, , F) = {(q, T)} • (q, , (E)) = {(q, F)} • (q, , a) = {(q, F)} • 3) (q, , $E) = {(r, )}

스트링 a + a  a의 인식 과정 • (q, a + a  a, $) ┣ (q, + a  a, $ a) • ┣ (q, + a  a, $ F) • ┣ (q, + a  a, $ T) • ┣ (q, + a  a, $ E) • ┣ (q, a  a, $ E +) • ┣ (q,  a, $ E + a) • ┣ (q,  a, $ E + F) • ┣ (q,  a, $ E + T) • ┣ (q, a, $ E + T ) • ┣ (q, , $ E + T  a) • ┣ (q, , $ E + T  F) • ┣ (q, , $ E + T) • ┣ (q, , $ E) • ┣ (r, , ) • ※ 스택 top은 세 번째 구성 요소의 오른쪽.

Text p.183 • ▶ PDA P로부터 L(G) = Le(P)인 CFG G의 구성 • Given PDA P = (Q, , , , q0, Z0, F) • ===> Construct cfg G = (VN, VT, P, S), • where (1) VN = {[qZr] | q, r∈Q, Z∈}∪{S}. • (2) VT = . • (3) P : ① (q, a, Z)가 k  1 에 대해 (r, X1...Xk)를 가지면 • [qZsk] a[rX1s1][s1X2s2] ... [sk-1Xksk]를 P에 추가. • s1, s2, ..., sk ∈ Q. • ② (q, a, Z)가 (r, )를 포함하면 생성규칙 • [qZr]  a를 P에 추가. • ③ 모든 q ∈ Q에 대해 S  [q0Z0q]를 P에 추가. • (4) S : start symbol.

Text p.183 ▶ 결론 1) L은 CFG G에 의해 생성되는 언어 L(G)이다. 2) L은 PDA P에 의해 인식되는 언어 L(P)이다. 3) L은 PDA P에 의해 인식되는 언어 Le(P)이다. 4) L은 extended PDA에 의해 인식되는 언어 L(P)이다.

제 5 장 Context-Free 문법

제 5 장 Context-Free 문법

Presentation Transcript

Context Free Grammars

Pumping Lemma for Context-free Languages

Context-Free Languages

Normal forms for Context-Free Grammars

Context-free Languages

Basic Parsing with Context-Free Grammars

Syntax and Context-Free Grammars

Pushdown Automata Part II: PDAs and CFG

Properties of Context-Free Languages

Normal forms for Context-Free Grammars

Shuga in the Context of The Partnership for an HIV-Free Generation

Logic Programming

Lecture Five: Context Free Grammar (CFG)

Chapter 5 Context-free Languages

제 5 장 Context-Free 문법

Context-Free Grammars for English

Context-Free Grammars

CSI 3104 /Winter 2006 : Introduction to Formal Languages Chapter 12: Context-Free Grammars

Pushdown Automata and Context-Free Grammars

Context-Free Grammar Parsing by Message Passing

Basic Parsing with Context-Free Grammars