Compiler construction principles implementation techniques
Sponsored Links
This presentation is the property of its rightful owner.
1 / 59

Compiler Construction Principles & Implementation Techniques PowerPoint PPT Presentation


  • 118 Views
  • Uploaded on
  • Presentation posted in: General

Compiler Construction Principles & Implementation Techniques. Dr. Ying JIN Associate Professor Oct. 2007. Role of Parsing in a Compiler. Target Code Generation. Lexical Analysis scanning. Intermediate Code Optimization. Syntax Analysis Parsing. Intermediate Code Generation.

Download Presentation

Compiler Construction Principles & Implementation Techniques

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Compiler Construction Principles & Implementation Techniques

Dr. Ying JIN

Associate Professor

Oct. 2007


Role of Parsing in a Compiler

Target Code Generation

Lexical Analysis scanning

Intermediate Code Optimization

Syntax Analysis Parsing

Intermediate Code Generation

Semantic Analysis

analysis/front end

synthesis/back end


What will be introduced

  • About Parsing

    • General information about a Parser (parsing)

      • Functional requirement (input, output, main function)

      • Process

    • General techniques in developing a scanner

      • How to define the syntax of a programming language?

        • Why not regular expressions? (not enough)

        • Context free grammar (上下文无关文法)

      • How to implement a parser with respect to its definition of syntax?

        • Top-down Parsing

        • Bottom-up Parsing


Knowledge Relation Graph

Top-down

using

Syntax definition

Context Free Grammar

implement

basing on

Bottom-up

Develop a

Parser


§ 3 Context Free Grammar & Parsing

3.1 The Parsing Process (语法分析过程)

3.2 Context-free Grammars (上下文无关文法)

3.3 Parse Trees and Abstract Syntax Tree

(语法分析树和抽象语法树)

3.4 Ambiguous (二义性)

3.5 Syntax of Sample Language (简单语言的语法)


3.1 The Parsing Process

  • Function of a Parser

    • Input : Token / TokenList

    • Output: internal representation of syntactic structure

    • Process

      • Read tokens

      • Establish syntactic structure – parse tree/syntax tree, according to the syntax definition (context free grammar);

      • Check syntactical errors


Syntactic Structures

  • Rules for describing the structure of a well-defined program

  • (1) Program

  • (2) Declaration

    • - Constant declaration

    • - Type declaration

    • - Variable Declaration

    • - Procedure/function declaration

  • (3) Body

  • (4) Statements (assignment, conditional, loop, function call)

  • (5) Expressions (arithmetic, logical, boolean)


Syntax Errors

  • Different Types of Syntax Errors

    • Following token for a syntactic structure is wrong

      (后继单词错)

    • Identifier/constant error (标识符或者常量错)

    • Keyword error(关键字错)

    • Start token for a syntactic structure is wrong(开始单词错)

    • Unbalanced parentheses(括号配对错)


Syntax Errors

后继单词错

int GetMax( int x ; int y)

{ if (x>y) {return x }

else return y;

} }

vod main()

{

real 10, y;

10 + ;

GetMax( x, y) ;

}

括号配对错

关键字错,开始单词错

标识符错

开始单词错


Dealing with Errors

  • Once an error is detected, how to deal with it?

    • Quit immediately, not practical

    • Error recovery

    • Error repair

    • Error correction

    • There is no “perfect” way to do it!


Different Types of Parsing Methods

  • Universal parsing methods for any grammars

    • Cocke-Younger-Kasami algorithm

    • Earley’s algorithm

  • Top-down parsing methods (limited grammars)- predictive

    • Recursive descendent parsing (递归下降法)

    • LL(k) -- k=1

  • Bottom-up parsing methods (limited grammars) – shift-reduce

    • SLR(k)

    • LR(k)

    • LALR(k)

    • Operator-precedence parsing (简单优先关系法)

inefficient

k=1


§ 3 Context Free Grammar & Parsing

3.1 The Parsing Process (语法分析过程)

3.2 Context-free Grammars (上下文无关文法)

3.3 Parse Trees and Abstract Syntax Tree

(语法分析树和抽象语法树)

3.4 Ambiguous (二义性)

3.5 Syntax of Sample Language (简单语言的语法)


3.2 Context-free Grammars

  • What is a Grammar?

  • Chomsky classification of Grammars;

  • Context Free Grammar (some concepts)


What is a Grammar?

  • To define syntactic structure?

  • A grammar G is a quadruple (VT,VN,S,P)

    • VT is a finite set of terminal symbols(有限的终极符集合)

    • VN是is a finite set of non-terminal symbols(有限的非终极符集合)

    • S is start symbol,S VN

    • P is a set of production rules(产生式的集合),

      each production rule has following form:

        ,where , (VTVN )*


文法的分类

  • O型文法: 也称为短语文法,其产生式具有形式:

    →,其中,(VTVN)*,并且至少含一个非终极符;

  • 1型文法: 也称为上下文有关文法。它是0型文法的特例,即要求||  || (S→除外,但S不得出现于产生式右部)

  • 2型文法: 也称为上下文无关文法。它是1型文法的特例,即要求产生式左部是一个非终极符: A→

  • 3型文法: 也称为正则文法。它是2型文法的特例,即其产生式的右部至多有两个符号,而且具有下面形式之一: A →a ,A →a B, 其中A,BVN ,aVT


文法的分类

  • 描述能力

    O型文法 > 1型文法> 2型文法> 3型文法

  • 对应自动机

    • O型文法:图灵机

    • 1型文法:线性有界自动机

    • 2型文法:下推自动机

    • 3型文法:有限自动机


Context Free Grammar (CFG)

  • 定义为四元组(VT,VN,S,P)

  • VT是有限的终极符集合

  • VN是有限的非终极符集合

  • S是开始符,S VN

  • P是产生式的集合,且具有下面的形式:

    AX1X2…Xn

    其中AVN,Xi(VTVN) ,右部可空。


Example

VT = {i, n, +, *, (, )}

P:

E  T

E  E + T

T  F

T  T * F

F  (E)

F  i

F  n

VN = {E, T, F }

S = E


  • Derivation (直接推导):

    if there is a production A, we can have A , where  represents one step derivation (用 A →一步推导). We can say that  is derived from A;

  • 的含义是,使用一条规则,代替左边的某个符号,产生右端的符号串


  •  + : represents one or more steps derivation (通过一步或多步可推导出 )

  •  *  : 表示 通过0步或多步可推导出


Example

P:

(1) E  T

(2) E  E + T

(3) T  F

(4) T  T * F

(5) F  (E)

(6) F  i

(7) F  n

  • (E)  (E+T)

  • E + (i+n)

  • E+E+T * E+i+F


  • G = (VT,VN, S, P)

  • 句型:if S * ,则称符号串为G的句型。我们用SF(G)表示文法G的所有句型的集合;

  • Sentence(句子):只包含终极符的句型被称为G的句子

  • Language(语言):

    L( G ) = { u | S + u , u  VT* }

    the set of all sentences of G;


Example

P:

(1) E  T

(2) E  E + T

(3) T  F

(4) T  T * F

(5) F  (E)

(6) F  i

(7) F  n

  • 句型: E, T, E+T, F, T*F, i, n, (E),

    …….

  • 句子: i, n, (i), (n), i+i, i+n, ……

  • 语言:{i, n, (i), (n), i+i, i+n , ……}


  • Leftmost(rightmost) derivation 最左(右)推导:

    如果进行推导时选择的是句型中的最左(右)非终极符,则称这种推导为最左(右)推导,并用符号lm(r m)表示最左(右)推导。

  • 左(右)句型:

    用最左推导方式导出的句型,称为左句型,而用最右推导方式导出的句型,称为右句型(规范句型)。

  • conclusion:

    each sentence has its rightmost or leftmost derivation(但对句型此结论不成立)


Example

P:

(1) E  T

(2) E  E + T

(3) T  F

(4) T  T * F

(5) F  (E)

(6) F  i

(7) F  n

  • 最左推导:

    i+T*n +F lm i + F*n + F

  • 最右推导:

    i+T*n +Flm i + T*n +i

  • 左句型: i + i*F

  • 右句型: E+(i)

  • 特例: i + (T*i)


Some Notes

  • CFG (VT,VN, S, P) will be used to define syntactic structure of a programming language;

  • Normally VT will be set of tokens of the programming language;

  • one terminal symbol might be one token, one token type, or one symbol representing certain structure;

  • Non-terminal symbols act as intermediate representation of certain structure;

  • Productions are rules on how to derive syntactic structure;


Example(1)

  • Arithmetic Expressions

P:

Exp  Exp + Exp

Exp  Exp – Exp

Exp  Exp * Exp

Exp  Exp / Exp

Exp  (Exp)

Exp  id

Exp  num

VT = {id, num, (, ), +, -, *, /}

VN = {Exp}

S = Exp


Example(2)

  • Program

VT = {VarDec, TypeDec, ConstDec, MainFun, FunDec }

VN = {Program, Dec, Decs}

S = Program

P: Program  Decs MainFun Decs

Dec 

Dec  VarDec Dec  ConstDec

Dec  FunDec Dec TypeDec

Decs  Dec Decs  Dec Decs


Example(3)

  • Variable Declaration

VT = {Type, id, ; , , }

VN = {VarDec, VarDef, VarDefList, idList}

S = VarDec

P: VarDec  

VarDec  VarDef ;

VarDec  VarDef; VarDec

P: 变量声明 

变量声明  一个变量定义;

变量声明  一个变量定义 ; 变量声明

变量定义 类型 标识符序列

标识符序列 一个标识符

标识符序列 一个标识符 , 标识符序列

VarDef  Type idList idList id

idList  id, idList


Extended BNF (扩展巴克斯范式)

  • Extended Notations

    • Optional |

      A   |  | 

      A   A   A  

    • Repetition * or {}

      A  A |  (left recursive) A    *

      A   A |  (right recursive) A   * 


Some algorithms on Grammar Transformation

  • 文法等价变化: L(G1) = L(G2)

  • 增补文法: the start symbol does not appear in the right part of any productions; (定理2.1)

    • Z  S


Some algorithms on Grammar Transformation

  • 文法G中除了S以外的空产生式(A)可以消除; (定理2.2)

    • 找出所有可以推导出的非终极符,记为S;

    • 对于所有产生式

      • A  X1 X2 …… Xi-1XiXi+1……Xn

      • Xi S

      • 增加A  X1 X2 …… Xi-1Xi+1……Xn

      • 直到没有新的产生式产生

      • 删除对应空产生式, 除了S


Example

SA a BB

A BB

Sa BB

S | A a BB

A  B B | a

B   |b

S a BB

A B

S a B

S A a B

SA a B

SA a

{S, B, A}

Sa B

S aB

S a

S a

S Aa


Some algorithms on Grammar Transformation

  • 消除无用产生式(定理2.3)

    • 无用产生式: A , A不出现在任何句型中;

    • How to find such non-terminal symbols?

    • 生成会出现在句型中的非终极符集合SS

      • SS = {S}

      •  Si SS,

      • Si 1 ,……, Si n

      • 把1 ,……, n中的所有非终极符加入SS中;

      • 重复直到没有新的加入;

    • 不属于SS的非终极符,它们的产生式是无用产生式,删除掉;


Example

S | A a BB

A  B B | a

B   |b

C  c

{S}

{S, A, B}

{S, A, B}

C  c是无用产生式


Some algorithms on Grammar Transformation

  • 消除特型产生式: (定理2.4)

  • 特型产生式: A  B

  • Algorithm

    • 对每个非终极符A,构造 SA = {B| A+B, BVN}

    • 如果C  SA,而且C  不是特型产生式,则增加 A ;

    • 删除特型产生式;

    • 删除无用产生式;


Example

S: {A,B}

S  a

S  b

  • S | A

  • A  B | a

  • B  b

A: {B}

A  b

  • S | a | b

  • A  a | b

  • B  b


Some algorithms on Grammar Transformation

  • 消除公共前缀(left factoring)

  • 公共前缀

    • A  1 | … | n| 1|…| m

  • 提取公因子

    • A  A’| 1|…| m

    • A’  1 | … | n


Some algorithms on Grammar Transformation

  • 消除左递归(left recursion)

    • 直接左递归:A  A1 | … | A n|  1|…| m

    • 消除方法:

      A  1A’|…| mA’

      A  1A’ | … | nA’|


Example

P:

(1) E  T

(2) E  E + T

(3) T  F

(4) T  T * F

(5) F  (E)

(6) F  i

(7) F  n

E  T E’

E’  +T E’ | 

E  E + T | T

 = + T

1 = T


Some algorithms on Grammar Transformation

  • 消除左递归(left recursion)

    • 间接左递归:

    • 消除方法:

      • Pre-conditions

      • Algorithm

S  A b

A  S a | b

1:S

2:A

A  Aba | b

A  bA’

A’  baA’ | 


§ 3 Context Free Grammar & Parsing

3.1 The Parsing Process (语法分析过程)

3.2 Context-free Grammars (上下文无关文法)

3.3 Parse Trees and Abstract Syntax Tree

(语法分析树和抽象语法树)

3.4 Ambiguous (二义性)

3.5 Syntax of Sample Language (简单语言的语法)


3.3 Parse Trees & Abstract Syntax Tree

  • Problem: Derivation(推导) is a way to construct a sentence from the start symbol;

    • Many derivation for the same sentence;

    • Not uniquely represent the structure of the sentence

  • Parse tree: one way to represent the structure of a sentence;


Example

sentence : i + i * n

P:

(1) E  T

(2) E  E + T

(3) T  F

(4) T  T * F

(5) F  (E)

(6) F  i

(7) F  n

Several derivations for it

Parse Tree


Parse Tree

  • A labeled tree for a CFG

  • The root must be labeled with the start symbol;

  • Each node has a symbol associated with it;

  • Each leaf must be labeled with a terminal symbol ;

  • For each node which is associated with a non-terminal symbol A, has n sons, from left to right they are associated with symbols B1, …, Bn, then there must be a production

    A  B1 … Bn


Abstract Syntax Tree

  • Problem with Parse tree

    • Includes much more than necessary nodes

  • Abstract Syntax Tree

    • contains only those nodes necessary for compilation

sentence : i + i * n


§ 3 Context Free Grammar & Parsing

3.1 The Parsing Process (语法分析过程)

3.2 Context-free Grammars (上下文无关文法)

3.3 Parse Trees and Abstract Syntax Tree

(语法分析树和抽象语法树)

3.4 Ambiguous (二义性)

3.5 Syntax of Sample Language (简单语言的语法)


3.4 Ambiguous Grammar

  • 二义性文法

    • For a Grammar G, if there exists one sentence which has more than one parse tree, G is called ambiguous Grammar;


Example

P:

Exp  Exp + Exp

Exp  Exp – Exp

Exp  Exp * Exp

Exp  Exp / Exp

Exp  (Exp)

Exp  id

Exp  num

id + id + id


§ 3 Context Free Grammar & Parsing

3.1 The Parsing Process (语法分析过程)

3.2 Context-free Grammars (上下文无关文法)

3.3 Parse Trees and Abstract Syntax Tree

(语法分析树和抽象语法树)

3.4 Ambiguous (二义性)

3.5 Syntax of Sample Language (简单语言的语法)


3.5 Syntax of Sample Language

CFG for ToyL Programming Language


Toy language

var

int x, y;

……

  • Syntax (structure of program)

x := a+b;

if x>0 then …

else … ;

while x<10 {…};

read(x);

write(x+y) ;

variable declaration

{

}

sequence of statements

+, -, *, /, ( , )


CFG for ToyL

VT = { id, num,

+, -, *, /, >, <, =, {, }, : , ; , ( , ),

var, if, then, else, while, read, write,

int, real, bool, ass}

VN = { Prg, VarDec, VarDefList, VarDef, Body, IdList,

Stms, Type, Stm, Exp}

S = Prg


CFG for ToyL

Program: Prg VarDec Body

Variable Declaration:

VarDec  

VarDec  var VarDefList ;

Variable Definition List:

VarDefList  VarDef;

VarDec  VarDef ; VarDec


CFG for ToyL

Variable Definition:

VarDef  Type IdList

IdList  id

IdList  id, IdList

Type  int

Type  real

Type  bool


CFG for ToyL

Body:

Body { Stms }

Stms  Stm

Stm  Stm; Stms


CFG for ToyL

Statement:

Stm  Assig | if-S | while-S | read-S | write-S

Assig  id ass Exp

If-S  if Exp then Stms else Stms

while-S  while Exp Body

read-S  read (id)

write-S  write (Exp)


CFG for ToyL

Expressions:

Exp  E | relE

relE  E > E | E < E | E=E

P:

(1) E  T

(2) E  E + T

(3) E  E - T

(4) T  F

(5) T  T * F

(6) T  T / F

(7) F  (E)

(8) F  i

(9) F  n


Homework

  • Define the syntax of following C statements with CFG

    • Assignment

    • If statement

    • While statement

    • Case statement

  • Assuming that CFGs for expressions and variables have already defined;


  • Login