Compiler construction principles implementation techniques
This presentation is the property of its rightful owner.
Sponsored Links
1 / 59

Compiler Construction Principles & Implementation Techniques PowerPoint PPT Presentation


  • 107 Views
  • Uploaded on
  • Presentation posted in: General

Compiler Construction Principles & Implementation Techniques. Dr. Ying JIN Associate Professor Oct. 2007. Role of Parsing in a Compiler. Target Code Generation. Lexical Analysis scanning. Intermediate Code Optimization. Syntax Analysis Parsing. Intermediate Code Generation.

Download Presentation

Compiler Construction Principles & Implementation Techniques

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Compiler construction principles implementation techniques

Compiler Construction Principles & Implementation Techniques

Dr. Ying JIN

Associate Professor

Oct. 2007


Role of parsing in a compiler

Role of Parsing in a Compiler

Target Code Generation

Lexical Analysis scanning

Intermediate Code Optimization

Syntax Analysis Parsing

Intermediate Code Generation

Semantic Analysis

analysis/front end

synthesis/back end


What will be introduced

What will be introduced

  • About Parsing

    • General information about a Parser (parsing)

      • Functional requirement (input, output, main function)

      • Process

    • General techniques in developing a scanner

      • How to define the syntax of a programming language?

        • Why not regular expressions? (not enough)

        • Context free grammar (上下文无关文法)

      • How to implement a parser with respect to its definition of syntax?

        • Top-down Parsing

        • Bottom-up Parsing


Knowledge relation graph

Knowledge Relation Graph

Top-down

using

Syntax definition

Context Free Grammar

implement

basing on

Bottom-up

Develop a

Parser


3 context free grammar parsing

§ 3 Context Free Grammar & Parsing

3.1 The Parsing Process (语法分析过程)

3.2 Context-free Grammars (上下文无关文法)

3.3 Parse Trees and Abstract Syntax Tree

(语法分析树和抽象语法树)

3.4 Ambiguous (二义性)

3.5 Syntax of Sample Language (简单语言的语法)


3 1 the parsing process

3.1 The Parsing Process

  • Function of a Parser

    • Input : Token / TokenList

    • Output: internal representation of syntactic structure

    • Process

      • Read tokens

      • Establish syntactic structure – parse tree/syntax tree, according to the syntax definition (context free grammar);

      • Check syntactical errors


Syntactic structures

Syntactic Structures

  • Rules for describing the structure of a well-defined program

  • (1) Program

  • (2) Declaration

    • - Constant declaration

    • - Type declaration

    • - Variable Declaration

    • - Procedure/function declaration

  • (3) Body

  • (4) Statements (assignment, conditional, loop, function call)

  • (5) Expressions (arithmetic, logical, boolean)


Syntax errors

Syntax Errors

  • Different Types of Syntax Errors

    • Following token for a syntactic structure is wrong

      (后继单词错)

    • Identifier/constant error (标识符或者常量错)

    • Keyword error(关键字错)

    • Start token for a syntactic structure is wrong(开始单词错)

    • Unbalanced parentheses(括号配对错)


Syntax errors1

Syntax Errors

后继单词错

int GetMax( int x ; int y)

{ if (x>y) {return x }

else return y;

} }

vod main()

{

real 10, y;

10 + ;

GetMax( x, y) ;

}

括号配对错

关键字错,开始单词错

标识符错

开始单词错


Dealing with errors

Dealing with Errors

  • Once an error is detected, how to deal with it?

    • Quit immediately, not practical

    • Error recovery

    • Error repair

    • Error correction

    • There is no “perfect” way to do it!


Different types of parsing methods

Different Types of Parsing Methods

  • Universal parsing methods for any grammars

    • Cocke-Younger-Kasami algorithm

    • Earley’s algorithm

  • Top-down parsing methods (limited grammars)- predictive

    • Recursive descendent parsing (递归下降法)

    • LL(k) -- k=1

  • Bottom-up parsing methods (limited grammars) – shift-reduce

    • SLR(k)

    • LR(k)

    • LALR(k)

    • Operator-precedence parsing (简单优先关系法)

inefficient

k=1


3 context free grammar parsing1

§ 3 Context Free Grammar & Parsing

3.1 The Parsing Process (语法分析过程)

3.2 Context-free Grammars (上下文无关文法)

3.3 Parse Trees and Abstract Syntax Tree

(语法分析树和抽象语法树)

3.4 Ambiguous (二义性)

3.5 Syntax of Sample Language (简单语言的语法)


3 2 context free grammars

3.2 Context-free Grammars

  • What is a Grammar?

  • Chomsky classification of Grammars;

  • Context Free Grammar (some concepts)


What is a grammar

What is a Grammar?

  • To define syntactic structure?

  • A grammar G is a quadruple (VT,VN,S,P)

    • VT is a finite set of terminal symbols(有限的终极符集合)

    • VN是is a finite set of non-terminal symbols(有限的非终极符集合)

    • S is start symbol,S VN

    • P is a set of production rules(产生式的集合),

      each production rule has following form:

        ,where , (VTVN )*


Compiler construction principles implementation techniques

文法的分类

  • O型文法: 也称为短语文法,其产生式具有形式:

    →,其中,(VTVN)*,并且至少含一个非终极符;

  • 1型文法: 也称为上下文有关文法。它是0型文法的特例,即要求||  || (S→除外,但S不得出现于产生式右部)

  • 2型文法: 也称为上下文无关文法。它是1型文法的特例,即要求产生式左部是一个非终极符: A→

  • 3型文法: 也称为正则文法。它是2型文法的特例,即其产生式的右部至多有两个符号,而且具有下面形式之一: A →a ,A →a B, 其中A,BVN ,aVT


Compiler construction principles implementation techniques

文法的分类

  • 描述能力

    O型文法 > 1型文法> 2型文法> 3型文法

  • 对应自动机

    • O型文法:图灵机

    • 1型文法:线性有界自动机

    • 2型文法:下推自动机

    • 3型文法:有限自动机


Context free grammar cfg

Context Free Grammar (CFG)

  • 定义为四元组(VT,VN,S,P)

  • VT是有限的终极符集合

  • VN是有限的非终极符集合

  • S是开始符,S VN

  • P是产生式的集合,且具有下面的形式:

    AX1X2…Xn

    其中AVN,Xi(VTVN) ,右部可空。


Example

Example

VT = {i, n, +, *, (, )}

P:

E  T

E  E + T

T  F

T  T * F

F  (E)

F  i

F  n

VN = {E, T, F }

S = E


Compiler construction principles implementation techniques

  • Derivation (直接推导):

    if there is a production A, we can have A , where  represents one step derivation (用 A →一步推导). We can say that  is derived from A;

  • 的含义是,使用一条规则,代替左边的某个符号,产生右端的符号串


Compiler construction principles implementation techniques

  •  + : represents one or more steps derivation (通过一步或多步可推导出 )

  •  *  : 表示 通过0步或多步可推导出


Example1

Example

P:

(1) E  T

(2) E  E + T

(3) T  F

(4) T  T * F

(5) F  (E)

(6) F  i

(7) F  n

  • (E)  (E+T)

  • E + (i+n)

  • E+E+T * E+i+F


Compiler construction principles implementation techniques

  • G = (VT,VN, S, P)

  • 句型:if S * ,则称符号串为G的句型。我们用SF(G)表示文法G的所有句型的集合;

  • Sentence(句子):只包含终极符的句型被称为G的句子

  • Language(语言):

    L( G ) = { u | S + u , u  VT* }

    the set of all sentences of G;


Example2

Example

P:

(1) E  T

(2) E  E + T

(3) T  F

(4) T  T * F

(5) F  (E)

(6) F  i

(7) F  n

  • 句型: E, T, E+T, F, T*F, i, n, (E),

    …….

  • 句子: i, n, (i), (n), i+i, i+n, ……

  • 语言:{i, n, (i), (n), i+i, i+n , ……}


Compiler construction principles implementation techniques

  • Leftmost(rightmost) derivation 最左(右)推导:

    如果进行推导时选择的是句型中的最左(右)非终极符,则称这种推导为最左(右)推导,并用符号lm(r m)表示最左(右)推导。

  • 左(右)句型:

    用最左推导方式导出的句型,称为左句型,而用最右推导方式导出的句型,称为右句型(规范句型)。

  • conclusion:

    each sentence has its rightmost or leftmost derivation(但对句型此结论不成立)


Example3

Example

P:

(1) E  T

(2) E  E + T

(3) T  F

(4) T  T * F

(5) F  (E)

(6) F  i

(7) F  n

  • 最左推导:

    i+T*n +F lm i + F*n + F

  • 最右推导:

    i+T*n +Flm i + T*n +i

  • 左句型: i + i*F

  • 右句型: E+(i)

  • 特例: i + (T*i)


Some notes

Some Notes

  • CFG (VT,VN, S, P) will be used to define syntactic structure of a programming language;

  • Normally VT will be set of tokens of the programming language;

  • one terminal symbol might be one token, one token type, or one symbol representing certain structure;

  • Non-terminal symbols act as intermediate representation of certain structure;

  • Productions are rules on how to derive syntactic structure;


Example 1

Example(1)

  • Arithmetic Expressions

P:

Exp  Exp + Exp

Exp  Exp – Exp

Exp  Exp * Exp

Exp  Exp / Exp

Exp  (Exp)

Exp  id

Exp  num

VT = {id, num, (, ), +, -, *, /}

VN = {Exp}

S = Exp


Example 2

Example(2)

  • Program

VT = {VarDec, TypeDec, ConstDec, MainFun, FunDec }

VN = {Program, Dec, Decs}

S = Program

P: Program  Decs MainFun Decs

Dec 

Dec  VarDec Dec  ConstDec

Dec  FunDec Dec TypeDec

Decs  Dec Decs  Dec Decs


Example 3

Example(3)

  • Variable Declaration

VT = {Type, id, ; , , }

VN = {VarDec, VarDef, VarDefList, idList}

S = VarDec

P: VarDec  

VarDec  VarDef ;

VarDec  VarDef; VarDec

P: 变量声明 

变量声明  一个变量定义;

变量声明  一个变量定义 ; 变量声明

变量定义 类型 标识符序列

标识符序列 一个标识符

标识符序列 一个标识符 , 标识符序列

VarDef  Type idList idList id

idList  id, idList


Extended bnf

Extended BNF (扩展巴克斯范式)

  • Extended Notations

    • Optional |

      A   |  | 

      A   A   A  

    • Repetition * or {}

      A  A |  (left recursive) A    *

      A   A |  (right recursive) A   * 


Some algorithms on grammar transformation

Some algorithms on Grammar Transformation

  • 文法等价变化: L(G1) = L(G2)

  • 增补文法: the start symbol does not appear in the right part of any productions; (定理2.1)

    • Z  S


Some algorithms on grammar transformation1

Some algorithms on Grammar Transformation

  • 文法G中除了S以外的空产生式(A)可以消除; (定理2.2)

    • 找出所有可以推导出的非终极符,记为S;

    • 对于所有产生式

      • A  X1 X2 …… Xi-1XiXi+1……Xn

      • Xi S

      • 增加A  X1 X2 …… Xi-1Xi+1……Xn

      • 直到没有新的产生式产生

      • 删除对应空产生式, 除了S


Example4

Example

SA a BB

A BB

Sa BB

S | A a BB

A  B B | a

B   |b

S a BB

A B

S a B

S A a B

SA a B

SA a

{S, B, A}

Sa B

S aB

S a

S a

S Aa


Some algorithms on grammar transformation2

Some algorithms on Grammar Transformation

  • 消除无用产生式(定理2.3)

    • 无用产生式: A , A不出现在任何句型中;

    • How to find such non-terminal symbols?

    • 生成会出现在句型中的非终极符集合SS

      • SS = {S}

      •  Si SS,

      • Si 1 ,……, Si n

      • 把1 ,……, n中的所有非终极符加入SS中;

      • 重复直到没有新的加入;

    • 不属于SS的非终极符,它们的产生式是无用产生式,删除掉;


Example5

Example

S | A a BB

A  B B | a

B   |b

C  c

{S}

{S, A, B}

{S, A, B}

C  c是无用产生式


Some algorithms on grammar transformation3

Some algorithms on Grammar Transformation

  • 消除特型产生式: (定理2.4)

  • 特型产生式: A  B

  • Algorithm

    • 对每个非终极符A,构造 SA = {B| A+B, BVN}

    • 如果C  SA,而且C  不是特型产生式,则增加 A ;

    • 删除特型产生式;

    • 删除无用产生式;


Example6

Example

S: {A,B}

S  a

S  b

  • S | A

  • A  B | a

  • B  b

A: {B}

A  b

  • S | a | b

  • A  a | b

  • B  b


Some algorithms on grammar transformation4

Some algorithms on Grammar Transformation

  • 消除公共前缀(left factoring)

  • 公共前缀

    • A  1 | … | n| 1|…| m

  • 提取公因子

    • A  A’| 1|…| m

    • A’  1 | … | n


Some algorithms on grammar transformation5

Some algorithms on Grammar Transformation

  • 消除左递归(left recursion)

    • 直接左递归:A  A1 | … | A n|  1|…| m

    • 消除方法:

      A  1A’|…| mA’

      A  1A’ | … | nA’|


Example7

Example

P:

(1) E  T

(2) E  E + T

(3) T  F

(4) T  T * F

(5) F  (E)

(6) F  i

(7) F  n

E  T E’

E’  +T E’ | 

E  E + T | T

 = + T

1 = T


Some algorithms on grammar transformation6

Some algorithms on Grammar Transformation

  • 消除左递归(left recursion)

    • 间接左递归:

    • 消除方法:

      • Pre-conditions

      • Algorithm

S  A b

A  S a | b

1:S

2:A

A  Aba | b

A  bA’

A’  baA’ | 


3 context free grammar parsing2

§ 3 Context Free Grammar & Parsing

3.1 The Parsing Process (语法分析过程)

3.2 Context-free Grammars (上下文无关文法)

3.3 Parse Trees and Abstract Syntax Tree

(语法分析树和抽象语法树)

3.4 Ambiguous (二义性)

3.5 Syntax of Sample Language (简单语言的语法)


3 3 parse trees abstract syntax tree

3.3 Parse Trees & Abstract Syntax Tree

  • Problem: Derivation(推导) is a way to construct a sentence from the start symbol;

    • Many derivation for the same sentence;

    • Not uniquely represent the structure of the sentence

  • Parse tree: one way to represent the structure of a sentence;


Example8

Example

sentence : i + i * n

P:

(1) E  T

(2) E  E + T

(3) T  F

(4) T  T * F

(5) F  (E)

(6) F  i

(7) F  n

Several derivations for it

Parse Tree


Parse tree

Parse Tree

  • A labeled tree for a CFG

  • The root must be labeled with the start symbol;

  • Each node has a symbol associated with it;

  • Each leaf must be labeled with a terminal symbol ;

  • For each node which is associated with a non-terminal symbol A, has n sons, from left to right they are associated with symbols B1, …, Bn, then there must be a production

    A  B1 … Bn


Abstract syntax tree

Abstract Syntax Tree

  • Problem with Parse tree

    • Includes much more than necessary nodes

  • Abstract Syntax Tree

    • contains only those nodes necessary for compilation

sentence : i + i * n


3 context free grammar parsing3

§ 3 Context Free Grammar & Parsing

3.1 The Parsing Process (语法分析过程)

3.2 Context-free Grammars (上下文无关文法)

3.3 Parse Trees and Abstract Syntax Tree

(语法分析树和抽象语法树)

3.4 Ambiguous (二义性)

3.5 Syntax of Sample Language (简单语言的语法)


3 4 ambiguous grammar

3.4 Ambiguous Grammar

  • 二义性文法

    • For a Grammar G, if there exists one sentence which has more than one parse tree, G is called ambiguous Grammar;


Example9

Example

P:

Exp  Exp + Exp

Exp  Exp – Exp

Exp  Exp * Exp

Exp  Exp / Exp

Exp  (Exp)

Exp  id

Exp  num

id + id + id


3 context free grammar parsing4

§ 3 Context Free Grammar & Parsing

3.1 The Parsing Process (语法分析过程)

3.2 Context-free Grammars (上下文无关文法)

3.3 Parse Trees and Abstract Syntax Tree

(语法分析树和抽象语法树)

3.4 Ambiguous (二义性)

3.5 Syntax of Sample Language (简单语言的语法)


3 5 syntax of sample language

3.5 Syntax of Sample Language

CFG for ToyL Programming Language


Toy language

Toy language

var

int x, y;

……

  • Syntax (structure of program)

x := a+b;

if x>0 then …

else … ;

while x<10 {…};

read(x);

write(x+y) ;

variable declaration

{

}

sequence of statements

+, -, *, /, ( , )


Cfg for toyl

CFG for ToyL

VT = { id, num,

+, -, *, /, >, <, =, {, }, : , ; , ( , ),

var, if, then, else, while, read, write,

int, real, bool, ass}

VN = { Prg, VarDec, VarDefList, VarDef, Body, IdList,

Stms, Type, Stm, Exp}

S = Prg


Cfg for toyl1

CFG for ToyL

Program: Prg VarDec Body

Variable Declaration:

VarDec  

VarDec  var VarDefList ;

Variable Definition List:

VarDefList  VarDef;

VarDec  VarDef ; VarDec


Cfg for toyl2

CFG for ToyL

Variable Definition:

VarDef  Type IdList

IdList  id

IdList  id, IdList

Type  int

Type  real

Type  bool


Cfg for toyl3

CFG for ToyL

Body:

Body { Stms }

Stms  Stm

Stm  Stm; Stms


Cfg for toyl4

CFG for ToyL

Statement:

Stm  Assig | if-S | while-S | read-S | write-S

Assig  id ass Exp

If-S  if Exp then Stms else Stms

while-S  while Exp Body

read-S  read (id)

write-S  write (Exp)


Cfg for toyl5

CFG for ToyL

Expressions:

Exp  E | relE

relE  E > E | E < E | E=E

P:

(1) E  T

(2) E  E + T

(3) E  E - T

(4) T  F

(5) T  T * F

(6) T  T / F

(7) F  (E)

(8) F  i

(9) F  n


Homework

Homework

  • Define the syntax of following C statements with CFG

    • Assignment

    • If statement

    • While statement

    • Case statement

  • Assuming that CFGs for expressions and variables have already defined;


  • Login