1 / 17

제 4 장 어휘 분석

컴파일러 입문. 제 4 장 어휘 분석. Lexical Analysis the process by which the compiler groups certain strings of characters into individual tokens. Lexical Analyzer  Scanner  Lexer. 4.1 서 론. Token 문법적으로 의미 있는 최소 단위 Token - a single syntactic entity(terminal symbol).

zev
Download Presentation

제 4 장 어휘 분석

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 컴파일러 입문 제 4 장 어휘 분석

  2. Lexical Analysis the process by which the compiler groups certain strings of characters into individual tokens. Lexical Analyzer  Scanner  Lexer 4.1 서 론

  3. Token • 문법적으로 의미 있는 최소 단위 Token - a single syntactic entity(terminal symbol). Token Number - string 처리의 효율성 위한 integer number. Token Value - numeric value or string value. ex) if(a>10) ... Token Number : 32 7 4 25 5 8 Token Value : 0 0 ‘a’ 0 10 0

  4. Token classes • Special form - language designer 1. Keyword --- const, else, if, int, ... 2. Operator symbols --- +, -, *, /, ++, -- etc. 3. Delimiters --- ;, ,, (, ), [, ] etc. • General form - programmer 4. identifier --- stk, ptr, sum, ... 5. constant --- 526, 3.0, 0.1234e-10, ‘c’, “string” etc. • Token Structure - represented by regular expression. ex) id = (l + _)( l + d + _)*

  5. Interaction of Lexical Analyzer with Parser • Lexical Analyzer is the procedureof Syntax Analyzer. L.A. Finite Automata. S.A. Pushdown Automata. • Token type • scanner가 parser에게 넘겨주는 토큰 형태.(token number, token value) ex) if ( x > y ) x = 10 ; (32,0) (7,0) (4,x) (25,0) (4,y) (8,0) (4,x) (23,0) (5,10) (20,0)

  6. The reasons for separating the analysis phase of compiling into lexical analysis(scanning) and syntax analysis(parsing). 1. modular construction - simpler design. 2. compiler efficiency is improved. 3. compiler portability is enhanced. • Parsing table • Parser의 행동(Shift, Reduce, Accept, Error)을 결정. • Token number는 Parsing table의 index.

  7. Symbol table의 용도 • L.A와 S.A시 identifier에 관한 정보를 수집하여 저장. • Semantic analysis와 Code generation시에 사용. • name + attributes ex) Hashed symbol table • chapter 12 참조

  8. Specification of token structure - RE Specification of PL - CFG Scanner design steps 1. describe the structure of tokens inre. 2. or, directly design a transition diagram for the tokens. 3. and program a scanner according to the diagram. 4. moreover, we verify the scanner action through regular language theory. Character classification letter : a | b | c... | z | A | B | C |…| Z l digit : 0 | 1 | 2... | 9d special character : + | - | * | / | . | , | ... 4.2 토큰 인식

  9. Transition diagram Regular grammar S  lA | _A A lA | dA | _A | ε Regular expression S = lA + _A = (l + _)A A = lA + dA + _A + ε = (l + d + _)A + ε = (l + d + _)*  S = (l + _)( l + d + _)* 4.2.1 Identifier Recognition

  10. n : non-zero digit o : octal digit h : hexa digit 4.2.2 Integer number Recognition • Form : 10진수, 8진수, 16진수로 구분되어진다. 10진수 : 0이 아닌 수 시작 8진수 : 0으로 시작, 16진수 : 0x, 0X로 시작 • Transition diagram

  11. Regular grammar S  nA | 0B A  dA | ε B  oC | xD | XD | ε C  oC | ε D  hE E  hE | ε • Regular expression E = hE + ε = h*ε = h* D = hE = hh* = h+ C = oC + ε = o* B = oC + xD + XD + ε = o+ + (x + X)D = o+ + (x + X)h+ + ε A = dA + ε = d* S = nA + 0B = nd* + 0(o+ + (x + X)h+ + ε) = nd* + 0 + 0o+ + 0(x + X)h+ ∴ S = nd* + 0 + 0o+ + 0(x + X)h+

  12. 4.2.3 Real number Recognition • Form : Fixed-point number & Floating-point number • Transition diagram • Regular grammar S  dA D  dE | +F | -G A  dA | .B E  dE |ε B  dCF  dE C  dC | eD |ε G  dE

  13. Regular expression E = dE + ε = d* F = dE = dd* = d+ G = dE = dd* = d+ D = dE + '+'F + -G = dd* + '+'d+ + -d + = d+ + '+'d+ + -d+ = (ε+ '+' +-)d + C = dC + eD + ε = dC+e(ε + '+' +-)d+ + e = d*(e(ε + '+' +-) d+ + ε) B = dC=dd*(e(ε + '+' +-)d+ +ε) = d++(e(ε + '+' +-) d+ +ε) A = dA + .B = d*.d+(e(ε + '+' +-)d+ + ε) S = dA = dd*. d+(e(ε + '+' +-) d+ +ε) = d+.d+(e(ε + '+' +-) d+ + ε) = d+.d++ d+.d+e(ε + '+' +-) d+ 참고 Terminal +를 ‘+’로 표기.

  14. Form : a sequence of characters between a pair of double quotes. Transition diagram where, a = char_set - {", \} and c = char_set Regular grammar S  "A A  aA | "B | \C B ε C  cA 4.2.4 String Constant Recognition

  15. Regular expression A = aA + " B + \C = aA + " + \cA = (a + \c)A + " = (a + \c)* " S = " A = "(a + \c)*" • ∴ S ="(a + \c)*"

  16. Transition diagram where, a = char_set - {*} and b = char_set - {*, /}. Regular grammar S  /A A  *B B  aB | *C C  *C | bB | /D D ε 4.2.5 Comment Recognition

  17. Regular expression C = *C + bB + /D = **(bB + /) B = aB + ***(bB + /) = aB + ***bB + ***/ = (a + *** b)B + ***/= (a + ***b)****/ A = *B = *(a + ***b)****/  S = /A =/* (a + ***b)****/ • A program which recognizes a comment statement. do { while (ch !='*') ch = getchar(); ch = getchar(); } while (ch != '/');

More Related