compiler structures n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Compiler Structures PowerPoint Presentation
Download Presentation
Compiler Structures

Loading in 2 Seconds...

play fullscreen
1 / 73

Compiler Structures - PowerPoint PPT Presentation


  • 102 Views
  • Uploaded on

Compiler Structures. 241-437 , Semester 1 , 2011-2012. Objective describe general syntax analysis, grammars, parse trees, FIRST and FOLLOW sets. 4. Syntax Analysis. Overview. 1. What is a Syntax Analyzer? 2. What is a Grammar? 3. Parse Trees 4. Types of CFG Parsing

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Compiler Structures' - uttara


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
compiler structures
Compiler Structures

241-437, Semester 1, 2011-2012

  • Objective
    • describe general syntax analysis, grammars, parse trees, FIRST and FOLLOW sets

4. Syntax Analysis

overview
Overview

1. What is a Syntax Analyzer?

2. What is a Grammar?

3. Parse Trees

4. Types of CFG Parsing

5. Syntax Analysis Sets

in this lecture

Source Program

In this lecture

Lexical Analyzer

Front

End

Syntax Analyzer

Semantic Analyzer

Int. Code Generator

Intermediate Code

Code Optimizer

Back

End

Target Code Generator

Target Lang. Prog.

1 what is a syntax analyzer

Lexical Analyzer

if

(

a

==

0

)

a

=

b

;

Syntax Analyzer

1. What is a Syntax Analyzer?

if (a == 0) a = b;

IF

builds a

parse tree

EQ

ASSIGN

a

0

a

b

syntax analyses that we do

sentence

(action)

(object)

(subject)

verb phrase

(indirect

object)

noun phrase

pronoun

verb

proper noun

article

noun

Syntax Analyses that we do
  • - Identify the function of each word
  • - Recognize if a sentence is grammatically correct

grammar

types /

categories

the

card

I

gave

Jim

languages
Languages
  • We use a natural language to communicate
    • its grammar rules are very complex
    • the rules don’t cover important things
  • We use a formal language to define a programming language
    • its grammar rules are fairly simple
    • the rules cover almost everything
2 what is a grammar
2. What is a Grammar?
  • A grammar is a notation for defining a language, and is made from 4 parts:
    • the terminal symbols
    • the syntactic categories (nonterminal symbols)
      • e.g. statement, expression, noun, verb
    • the grammar rules (productions)
      • e,g, A => B1 B2 ... Bn
    • the starting nonterminal
      • the top-most syntactic category for this grammar

continued

slide8
We define a grammar G as a 4-tuple:

G = (T, N, P, S)

    • T = terminal symbols
    • N = nonterminal symbols
    • P = productions/rules
    • S = starting nonterminal
2 1 example 1
2.1. Example 1
  • Consider the grammar:

T = {0, 1}

N = {S, R}

P = { S => 0 S => 0 R R => 1 S }

S is the starting nonterminal

the right hand sides

of productions usually

use a mix of terminals

and nonterminals

is 01010 in the language
Is “01010” in the language?
  • Start with a S rule:
    • Rule String Generated-- SS => 0 R 0 RR => 1 S 0 1 SS => 0 R 0 1 0 RR => 1 S 0 1 0 1 SS => 0 0 1 0 1 0
  • No more rules can be applied since there are no more nonterminals left in the string.

Yes, it

is in the

language.

example 2
Example 2
  • Consider the grammar:

T = {a, b, c, d, z}

N = {S, R, U, V}

P = { S => R U z | z R => a | b R U => d V U | c V => b | c }

S is the starting nonterminal

slide12
The notation:

X => Y | Z

is shorthand for the two rules:

X => YX => Z

  • Read ‘|’ as ‘or’.
is adbdbcz in the language
Is “adbdbcz” in the language?
  • Rule String Generated-- SS => R U z R U zR => a a U zU => d V U a d V U zV => b a d b U zU => d V U a d b d V U zV => b a d b d b U zU => c a d b d b c z

Yes!

This grammar has choices about how to rewrite the string.

example 3 sums
Example 3: Sums

e.g. 5 + 6 - 2

  • The grammar:

T = {+, -, 0, 1, 2, 3, ..., 9}

N = {L, D}

P = { L => L + D | L – D | DD => 0 | 1 | 2 | ... | 9 }

L is the starting nonterminal

example 4 brackets
Example 4: Brackets
  • The grammar:

T = { '(', ')' }

N = {L}

P = { L => '(' L ')' LL => ε}

L is the starting nonterminal

ε means

'nothing'

2 2 derivations
2.2. Derivations

A sequence of the form:

w0 w1 …  wn

is a derivationof wn from w0(or w0* wn)

Example:

L rule L => ( L ) L

( L ) L rule L => e

( ) L rule L => e

( )

L * ( )

This means that the sentence ( ) is a derivation of L

2 3 kinds of grammars
2.3. Kinds of Grammars
  • There are 4 main kinds of grammar, of increasing expressive power:
    • regular (type 3) grammars
    • context-free (type 2) grammars
    • context-sensitive (type 1) grammars
    • unrestricted (type 0) grammars
  • They vary in the kinds of productions they allow.
regular grammars
Regular Grammars

S => wTT => xTT => a

  • Every production is of the form:

A => a | a B | e

    • A, B are nonterminals, a is a terminal
  • These are sometimes called right linear rules because if a nonterminal appears in the rule body, then it must appear last.
  • Regular grammars are equivalent to REs.
example
Example
  • Integer => + UInt | - UInt | 0 Digits | 1 Digits | ... | 9 DigitsUInt => 0 Digits | 1 Digits | ... | 9 DigitsDigits => 0 Digits | 1 Digits | ... | 9 Digits | e
context free grammars cfgs
Context-Free Grammars (CFGs)

A => aA => aBcdB => ae

  • Every production is of the form:

A => d

    • A is a nonterminal, d can be any number of nonterminals or terminals
  • The Syntax Analyzer uses CFGs.
2 4 res for syntax analysis
2.4. REs for Syntax Analysis?
  • Why not use REs to describe the syntax of a programming language?
    • they don’t have enough power
  • Examples:
    • nested blocks, if statements, balanced braces
  • We need the ability to 'count', which can be implemented with CFGs but not REs.
3 parse trees
3. Parse Trees
  • A parse tree is a graphical way of showing how productions are used to generate a string.
  • The syntax analyzer creates a parse tree to store information about the program being compiled.
example1
Example
  • The grammar:

T = { a, b }

N = { S }

P = { S => S S | a S b | a b | b a }

S is the starting nonterminal

parse tree for aabbba
Parse Tree for “aabbba”

expand the

symbol in

the circle

S

The root of the tree is the start symbol S:

Expand using S => S S

S

S

S

Expand using S => a S b

continued

slide26

S

S

S

S

a

b

Expand using S => a b

S

S

S

a

S

b

a

b

Expand using S => b a

continued

slide27
Stop when there are no more nonterminals in leaf positions.

Read off the string by reading the leaves left to right.

S

S

S

a

b

a

S

b

a

b

3 1 ambiguity
3.1. Ambiguity

Two (or more) parse trees for the same string

E => E + EE => E – EE => 0 | … | 9

E

E

or

E + E

E - E

4

2

E + E

E - E

2 – 3 + 4

3

4

2

3

slide29
The two derivations:

EE + E E E – E

E – E + E  2 – E

 2 – E + E  2 – E + E

 2 – 3 + E 2 – 3 + E

 2 – 3 + 4  2 – 3 + 4

fixing ambiguity
Fixing Ambiguity
  • An ambiguous grammar can sometimes be made unambiguous:

E =>E + T | E – T | T

T =>0 | … | 9

  • We'll look at some techniques in chapter 5.
4 types of cfg parsing
4. Types of CFG Parsing
  • Top-down (chapter 5)
    • recursive descent (predictive) parsing
    • LL methods
  • Bottom-up (chapter 6)
    • operator precedence parsing
    • LR methods
    • SLR, canonical LR, LALR
4 1 a statement block grammar
4.1. A Statement Block Grammar
  • The grammar:

T = {begin, end, simplestmt, ;}

N = {B, SS, S}

P = { B => begin SS endSS => S ; SS | εS => simplestmt | begin SS end }

B is the starting nonterminal

parse tree
Parse Tree

begin simplestmt ; simplestmt ; end

B

B => begin SS end

SS => S ; SS

SS => e

S => simplestmt

S => begin SS end

SS

SS

SS

S

S

e

begin simplestmt ; simplestmt ; end

4 2 top down ll parsing
4.2. Top Down (LL) Parsing

B

B => begin SS end

SS => S ; SS

SS => e

S => simplestmt

S => begin SS end

SS

begin simplestmt ; simplestmt ; end

continued

slide35

B

B => begin SS end

SS => S ; SS

SS => e

S => simplestmt

S => begin SS end

SS

SS

S

begin simplestmt ; simplestmt ; end

continued

slide36

B

B => begin SS end

SS => S ; SS

SS => e

S => simplestmt

S => begin SS end

SS

SS

S

begin simplestmt ; simplestmt ; end

continued

slide37

B

B => begin SS end

SS => S ; SS

SS => e

S => simplestmt

S => begin SS end

SS

SS

SS

S

S

begin simplestmt ; simplestmt ; end

continued

slide38

B

B => begin SS end

SS => S ; SS

SS => e

S => simplestmt

S => begin SS end

SS

SS

SS

S

S

begin simplestmt ; simplestmt ; end

continued

slide39

1

B

B => begin SS end

SS => S ; SS

SS => e

S => simplestmt

S => begin SS end

2

SS

4

SS

3

SS

6

S

5

S

e

begin simplestmt ; simplestmt ; end

4 3 bottomup lr parsing
4.3. Bottomup (LR) Parsing

B => begin SS end

SS => S ; SS

SS => e

S => simplestmt

S => begin SS end

S

begin simplestmt ; simplestmt ; end

continued

slide41

B => begin SS end

SS => S ; SS

SS => e

S => simplestmt

S => begin SS end

S

S

begin simplestmt ; simplestmt ; end

continued

slide42

B => begin SS end

SS => S ; SS

SS => e

S => simplestmt

S => begin SS end

SS

S

S

e

begin simplestmt ; simplestmt ; end

continued

slide43

B => begin SS end

SS => S ; SS

SS => e

S => simplestmt

S => begin SS end

SS

SS

S

S

e

begin simplestmt ; simplestmt ; end

continued

slide44

B => begin SS end

SS => S ; SS

SS => e

S => simplestmt

S => begin SS end

SS

SS

SS

S

S

e

begin simplestmt ; simplestmt ; end

continued

slide45

6

B

B => begin SS end

SS => S ; SS

SS => e

S => simplestmt

S => begin SS end

5

SS

4

SS

1

SS

3

S

2

S

e

begin simplestmt ; simplestmt ; end

5 syntax analysis sets
5. Syntax Analysis Sets
  • Syntax analyzers for top-down (LL) and bottom-up (LR) parsing utilize two types of sets:
    • FIRST sets
    • FOLLOW sets
  • These sets are generated from the programming language CFG.
5 1 the first sets
5.1. The FIRST Sets
  • FIRST( <non-terminal> ) =

set of all terminals that start productions for that non-terminal

  • Example:

S => pingS => begin S end

FIRST(S) = { ping, begin }

more mathematically
More Mathematically
  • A is a non-terminal.
  • FIRST(A) =
    • { c | A =>* c w , c is a terminal } { e } if A =>* e
  • w is the rest of the terminals and nonterminals after 'c'
building first sets
Building FIRST Sets
  • For each non-terminal A,FIRST(A) =

FIRST_SEQ(a)  FIRST_SEQ(b)  ...

for all productions A => a, A => b, ...

    • a, b are the bodies of the productions
first seq
FIRST_SEQ()
  • FIRST_SEQ(e) = { e }
  • FIRST_SEQ(c w) = { c }, if c is a terminal
  • FIRST_SEQ(A w)

= FIRST(A), if eFIRST(A)

= (FIRST(A) – {e})  FIRST_SEQ(w), if eFIRST(A)

    • w is a sequence of terminals and non-terminals, and possibly empty
first example 1
S => a S e

S => B

B => b B e

B => C

C => c C e

C => d

FIRST(C) = {c,d}

FIRST(B) =

FIRST(S) =

FIRST() Example 1

Start with FIRST(C) since its

rules only start with terminals

continued

slide52
FIRST(C) = {c,d}

FIRST(B) = {b,c,d}

FIRST(S) =

S => a S e

S => B

B => b B e

B => C

C => c C e

C => d

do FIRST(B) now that

we know FIRST(C)

continued

slide53
FIRST(C) = {c,d}

FIRST(B) = {b,c,d}

FIRST(S) = {a,b,c,d}

S => a S e

S => B

B => b B e

B => C

C => c C e

C => d

do FIRST(S) now that

we know FIRST(B)

first example 2
P => i | c | n T S

Q => P | a S | b S c S T

R => b |e

S => c | R n | e

T => R S q

FIRST(P) = {i,c,n}

FIRST(Q) =

FIRST(R) = {b,e}

FIRST(S) =

FIRST(T) =

FIRST() Example 2

Start with P and R since their

rules only start with terminals or e

continued

slide55
FIRST(P) = {i,c,n}

FIRST(Q) = {i,c,n,a,b}

FIRST(R) = {b,e}

FIRST(S) =

FIRST(T) =

P => i | c | n T S

Q => P | a S | b S c S T

R => b | e

S => c | R n | e

T => R S q

do FIRST(Q) now that we know FIRST(P)

continued

slide56
FIRST(P) = {i,c,n}

FIRST(Q) = {i,c,n,a,b}

FIRST(R) = {b,e}

FIRST(S) = {c,b,n,e}

FIRST(T) =

P => i | c | n T S

Q => P | a S | b S c S T

R => b | e

S => c | R n | e

T => R S q

do FIRST(S) now that we know FIRST(R)

Note:

S  R n  n because R * e

continued

slide57
FIRST(P) = {i,c,n}

FIRST(Q) = {i,c,n,a,b}

FIRST(R) = {b,e}

FIRST(S) = {c,b,n,e}

FIRST(T) = {b,c,n,q}

P => i | c | n T S

Q => P | a S | b S c S T

R => b | e

S => c | R n | e

T => R S q

do FIRST(T) now that we know FIRST(R) and FIRST(S)

Note:

T  R S q  S q  q

because both R and S * e

first example 3
S => a S e | S T S

T => R S e | Q

R => r S r | e

Q => S T | e

FIRST(S) = {a}

FIRST(T) = {r, a, e}

FIRST(R) = {r, e}

FIRST(Q) = {a, e}

FIRST() Example 3

Order

1) R, S

2) Q

3) T

5 2 the follow sets
5.2. The FOLLOW Sets
  • FOLLOW( <non-terminal> ) =
    • set of all the terminals that follow <non-terminal> in productions
    • the set includes $ if nothing follows <non-terminal>
slide60
Example:

S => bing A bong | ping A pong | zing A

A => ha

  • FOLLOW(A) = { bong, pong, $ }
more mathematically1
More Mathematically
  • A is a non-terminal.
  • FOLLOW(A) =

{ c in terminals | S =>+ . . . A c . . . } { $ } if S =>+ . . . A

. . . is a sequence of terminals and non-terminals

=>+ is any number of => expansions

building follow sets
Building FOLLOW() Sets
  • To make the FOLLOW(A) set, apply rules 1-4: 1. for all productions (B=> . . . A ) add FIRST_SEQ()-{} 2. for all (B=> . . . A )and   FIRST_SEQ()add FOLLOW(B) 3. for all (B=> . . . A)add FOLLOW(B) 4. if A is the start symbol then add { $}
  • b is a sequence of termminals and non-terminals
small examples
Small Examples
  • What is in FOLLOW(A) for the productions:

B => A C

C => s

  • FOLLOW(A) gets FIRST_SEQ(C) == FIRST(C) == { s }
    • uses rule 1

continued

slide64
What is in FOLLOW(A) for the productions:

C => B r

B => t A

  • FOLLOW(A) gets FOLLOW(B) == { r }
    • uses rule 3
follow example 1
S => a S e | B

B => b B C f | C

C => c C g | d | e

FIRST(C) = {c,d,e}

FIRST(B) = {b,c,d,e}

FIRST(S) = {a,b,c,d,e}

FOLLOW(C) =

FOLLOW(B) =

FOLLOW(S) = {$, e}

FOLLOW() Example 1

S is the start symbol

continued

slide66
S => a S e | B

B => b B C f | C

C => c C g | d | e

FIRST(C) = {c,d,e}

FIRST(B) = {b,c,d,e}

FIRST(S) = {a,b,c,d,e}

FOLLOW(C) = {f,g}  follow(B)

FOLLOW(B)= FIRST_SEQ(C f) -{e}  FOLLOW(S) = {c, d, f, $, e}

FOLLOW(S) = {$,e}

continued

slide67
S => a S e | B

B => b B C f | C

C => c C g | d | e

FIRST(C) = {c,d,e}

FIRST(B) = {b,c,d,e}

FIRST(S) = {a,b,c,d,e}

FOLLOW(C) = {f,g,c,d,$,e}

FOLLOW(B)= {c, d, f, $, e}

FOLLOW(S) = {$,e}

follow example 2
S => ( A ) | e

A => T E

E => & T E | e

T => ( A ) | a | b | c

FIRST(T) = {( ,a,b,c}

FIRST(E) = {& , e }

FIRST(A) = {( ,a,b,c}

FIRST(S) = {( , e}

FOLLOW(S) = {$}

FOLLOW(A) = {)}

FOLLOW(E) =

FOLLOW(T) =

FOLLOW() Example 2

continued

slide69
S => ( A ) | e

A => T E

E => & T E | e

T => ( A ) | a | b | c

FIRST(T) = {(,a,b,c}

FIRST(E) = {&, e }

FIRST(A) = {(,a,b,c}

FIRST(S) = {(, e}

FOLLOW(S) = { $ }

FOLLOW(A) = { ) }

FOLLOW(E) =

FOLLOW(A)  FOLLOW(E)= { ) }

FOLLOW(T) =

(FIRST_SEQ(E) – {e})  FOLLOW(A)  FOLLOW(E) = {&, )}

follow example 3
S => T E1

E1 => + T E1 | e

T => F T1

T1 => * F T1 | e

F => ( S ) | id

FIRST(F) = FIRST(T) = FIRST(S) = {(,id}

FIRST(T1) = {*,e}

FIRST(E1) = {+,e}

FOLLOW(S) = {$,)}

FOLLOW(E1) =

FOLLOW(T) =

FOLLOW(T1) =

FOLLOW(F) =

FOLLOW() Example 3

continued

slide71
S => T E1

E1 => + T E1 | e

T => F T1

T1 => * F T1 | e

F => ( S ) | id

FIRST(F) = FIRST(T) = FIRST(S) = {(,id}

FIRST(T1) = {*,e}

FIRST(E1) = {+,e}

FOLLOW(S) = {$,)}

FOLLOW(E1) = FOLLOW(S)  Follow(E1) = {$,)}

FOLLOW(T) = FIRST(E1)  FOLLOW(S)  FOLLOW(E1) = {+,$,)}

FOLLOW(T1) = FOLLOW(T) = {+,$,)}

FOLLOW(F) = FIRST(T1)  FOLLOW(T)  FOLLOW(T1) = {*,+,$,)}

follow example 4
S => A B C | A D

A => a | a A

B => b | c | e

C => D a C

D => b b | c c

FIRST(D) = FIRST(C) = {b,c}

FIRST(B) = {b,c,e}

FIRST(A) = FIRST(S) = {a}

FOLLOW(S) = {$}

FOLLOW(D) = {a,$}

FOLLOW(A) =

FOLLOW(B) =

FOLLOW(C) =

FOLLOW() Example 4

continued

slide73
S => A B C | A D

A => a | a A

B => b | c | e

C => D a C

D => b b | c c

FIRST(D) = FIRST(C) = {b,c}

FIRST(B) = {b,c,e}

FIRST(A) = FIRST(S) = {a}

FOLLOW(S) = {$}

FOLLOW(D) = {a,$}

FOLLOW(A) = {b,c}

FOLLOW(B) = {b,c}

FOLLOW(C) = {$}