csc 415 translators and compilers spring 2009
Download
Skip this Video
Download Presentation
CSC 415: Translators and Compilers Spring 2009

Loading in 2 Seconds...

play fullscreen
1 / 95

CSC 415: Translators and Compilers Spring 2009 - PowerPoint PPT Presentation


  • 126 Views
  • Uploaded on

CSC 415: Translators and Compilers Spring 2009. Chapter 4 Syntactic Analysis. Syntactic Analysis. Sub-phases of Syntactic Analysis Grammars Revisited Parsing Abstract Syntax Trees Scanning Case Study: Syntactic Analysis in the Triangle Compiler. Structure of a Compiler.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' CSC 415: Translators and Compilers Spring 2009' - adora


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
syntactic analysis
Syntactic Analysis
  • Sub-phases of Syntactic Analysis
  • Grammars Revisited
  • Parsing
  • Abstract Syntax Trees
  • Scanning
  • Case Study: Syntactic Analysis in the Triangle Compiler
structure of a compiler
Structure of a Compiler

Lexical Analyzer

Source code

Symbol

Table

tokens

Parser & Semantic Analyzer

parse tree

Intermediate Code Generation

intermediate representation

Optimization

intermediate representation

Assembly Code Generation

Assembly code

syntactic analysis1
Syntactic Analysis
  • Main function
    • Parse source program to discover its phrase structure
    • Recursive-descent parsing
    • Constructing an AST
    • Scanning to group characters into tokens
sub phases of syntactic analysis
Sub-phases of Syntactic Analysis
  • Scanning (or lexical analysis)
    • Source program transformed to a stream of tokens
      • Identifiers
      • Literals
      • Operators
      • Keywords
      • Punctuation
    • Comments and blank spaces discarded
  • Parsing
    • To determine the source programs phrase structure
    • Source program is input as a stream of tokens (from the Scanner)
    • Treats each token as a terminal symbol
  • Representation of phrase structure
    • AST
lexical analysis a simple example
Lexical Analysis – A Simple Example

let var y: Integer

in !new year

y := y+1

Note: !new year does not appear in list of tokens. Comments are removed along with white spaces.

  • Scan the file character by character and group characters into words and punctuation (tokens), remove white space and comments
  • Tokens for this example:

let

var

y

:

Integer

in

y

:=

y

+

1

creating tokens mini triangle example

let var y: Integer

in !new year

y := y+1

Buffer

(S

= space)

Creating Tokens – Mini-Triangle Example

Input

Converter

character string

. . . .

S

S

S

S

l

e

t

v

a

r

y

:

I

n

t

e

g

e

r

i

n

Scanner

Ident.

Ident.

becomes

Ident.

op.

Intlit.

eot

Ident.

colon

in

let

var

:=

1

:

y

Integer

y

y

+

in

let

var

tokens in triangle
Tokens in Triangle

// punctuation...

DOT = 21, ".",

COLON = 22, ":",

SEMICOLON = 23, ";",

COMMA = 24, ",",

BECOMES = 25, "~",

IS = 26,

// brackets...

LPAREN = 27, "(",

RPAREN = 28, ")",

LBRACKET = 29, [",

RBRACKET = 30, "]",

LCURLY = 31, "{",

RCURLY = 32, "}",

// special tokens...

EOT = 33, "",

ERROR = 34; "<error>"

// literals, identifiers, operators...

INTLITERAL = 0, "<int>",

CHARLITERAL = 1, "<char>",

IDENTIFIER = 2, "<identifier>",

OPERATOR = 3, "<operator>",

// reserved words - must be in alphabetical order...

ARRAY = 4, "array",

BEGIN = 5, "begin",

CONST = 6, "const",

DO = 7, "do",

ELSE = 8, "else",

END = 9, "end",

FUNC = 10, "func",

IF = 11, "if",

IN = 12, "in",

LET = 13, "let",

OF = 14, "of",

PROC = 15, "proc",

RECORD = 16, "record",

THEN = 17, "then",

TYPE = 18, "type",

VAR = 19, "var",

WHILE = 20, "while",

grammars revisited
Grammars Revisited
  • Context free grammars
    • Generates a set of sentences
    • Each sentence is a string of terminal symbols
    • An unambiguous sentence has a unique phrase structure embodied in its syntax tree
  • Develop parsers from context-free grammars
regular expressions
Regular Expressions
  • A regular expression (RE) is a convenient notation for expressing a set of stings of terminal symbols
  • Main features
    • ‘|’ separates alternatives
    • ‘*’ indicates that the previous item may be represented zero or more times
    • ‘(‘ and ‘)’ are grouping parentheses
  • e The empty string -- a special string of length 0
regular expression basics
Regular Expression Basics
  • Algebraic Properties
    • | is commutative and associative
      • r|s = s|r
      • r|(s|t) = (r|s)|t
    • Concatenation is associative
      • (rs)t = r(st)
    • Concatenation distributes over |
      • r(s|t) = rs|rt
      • (s|t)r = sr|tr
    • e is the identity for concatenation
      • e r = r
      • r e = r
    • * is idempotent
      • r** = r*
      • r* = (r| e)*
regular expression basics1
Regular Expression Basics
  • Common Extensions
    • r+ one or more of expression r, same as rr*
    • rk k repetitions of r
      • r3 = rrr
    • ~r the characters not in the expression r
      • ~[\t\n]
    • r-z range of characters
      • [0-9a-z]
    • r? Zero or one copy of expression (used for fields of an expression that are optional)
regular expression example
Regular Expression Example
  • Regular Expression for Representing Months
    • Examples of legal inputs
      • January represented as 1 or 01
      • October represented as 10
    • First Try: [0|1|e][0-9]0, 1, or e followed by a number between 0 and 9
      • Matches all legal inputs? Yes

1, 2, 3, …, 10, 11, 12, 01, 02, …, 09

      • Matches any illegal inputs? Yes

0, 00, 18

regular expression example1
Regular Expression Example
  • Regular Expression for Representing Months
    • Examples of legal inputs
      • January represented as 1 or 01
      • October represented as 10
    • Second Try: [1-9]|(0[1-9])|(1[0-2])
      • Any number between 1 and 9 or 0 followed by any number between 1 and 9 or 1 followed by any number between 0 and 2
      • Matches all legal inputs? Yes

1, 2, 3, …, 10, 11, 12, 01, 02, …, 09

      • Matches any illegal inputs? No
regular expression example2
Regular Expression Example
  • Regular Expression for Floating Point Numbers
    • Examples of legal inputs
      • 1.0, 0.2, 3.14159, -1.0, 2.7e8, 1.0E-6, -2.5e+5
      • Assume that a 0 is required before numbers less than 1 and does not prevent extra leading zeros, so numbers such as 0011 or 0003.14159 are legal
    • Building the regular expression
      • Assume

digit  0|1|2|3|4|5|6|7|8|9

      • Handle simple decimals such as 1.0, 0.2, 3.14159

digit+.digit+ 1 or more digits followed by . followed by 1 or more decimals

      • Add an optional sign (only minus, no plus)

(-| e)digit+.digit+ or -?digit+.digit+

regular expression example3
Regular Expression Example
  • Regular Expression for Floating Point Numbers (cont.)
    • Building the regular expression (cont.)
      • Format for the exponent

(E|e)(+|-)?(digit+)

      • Adding it as an optional expression to the decimal part

(-| e)digit+.digit+((E|e)(+|-)?(digit+))?

extended bnf
Extended BNF
  • Extended BNF (EBNF)
    • Combination of BNF and RE
    • N::=X, where N is a nonterminal symbol and X is an extended RE, i.e., an RE constructed from both terminal and nonterminal symbols
    • EBNF
      • Right hand side may use |. *, (, )
      • Right hand side may contain both terminal and nonterminal symbols
example ebnf
Example EBNF

Expression ::= primary-Expression (Operator primary-Expression)*

primary-Expression ::= Identifier

| ( Expression )

Identifier ::= a|b|c|d|e

Operator ::= +|-|*|/

Generates

e

a + b

a – b – c

a + (b * c)

a + (b + c) / d

a – (b – (c – (d – e)))

grammar transformations
Grammar Transformations
  • Left Factorization

XY | XZ is equivalent to X(Y | Z)

single-Command ::= V-name := Expression

| if Expression then single-Command

| if Expression then single-Command

else single-Command

single-Command ::= V-name := Expression

| if Expression then single-Command

(e |else single-Command)

grammar transformations1
Grammar Transformations
  • Elimination of left recursion

N::= X | NY is equivalent to N::=X(Y)*

Identifier ::= Letter

| Identifier Letter

| Identifier Digit

Identifier ::= Letter

| Identifier (Letter | Digit)

Identifier ::= Letter(Letter | Digit)*

grammar transformations2
Grammar Transformations
  • Substitution of nonterminal symbols

Given N::=X, we can substitute each occurrence of N with X

iff N::=X is nonrecursive and is the only production rule for N

single-Command ::= for Control-Variable := Expression To-or-Downto

Expression do single-Command

| …

Control-Variable ::= Identifier

To-or-Downto ::= to

| down

single-Command ::= for Identifier := Expression (to|downto)

Expression do single-Command

| …

starter sets
Starter Sets
  • Starter set of an RE X
    • Starters[[X]]
    • Set of terminal symbols that can start a string generated by X
  • Examples
    • Starter[[his | her | its]] = {h, i}
    • Starter[[(re)* set]] = {r, s}
starter sets1
Starter Sets
  • Precise and complete definition of starters:

starters[[e]] = {}

starters[[t]] = {t} where t is a terminal symbol

starters[[X Y]] = starters[[X]]  starters[[Y]] if X generates e

starters[[X Y]] = starters[[X]] if X does not generate e

starters[[X | Y]] = starters[[X]]  starters[[Y]]

starters[[X *]] = starters[[X]]

  • To generalize fo ra starter set of an extended RE add
    • starters[[N]] = starters[[X]] where N is a nonterminal symbol defined production rule N ::= X
example starter set
Example Starter Set

Expression ::= primary-Expression (Operator primary-Expression)*

primary-Expression ::= Identifier

| ( Expression )

Identifier ::= a|b|c|d|e

Operator ::= +|-|*|/

starters[[Expression]] = starters[[primary-Expression (Operator primary-Expression)*]]

= starters[[primany-Expression]]

= starters[[Identifier]]  starters[[ (Expressions ) ]]

= starters[[a | b | c | d | e]]  { ( }

= {a, b, c, d, e, (}

scanning lexical analysis
Scanning (Lexical Analysis)
  • The purpose of scanning is to recognize tokens in the source program. Or, to group input characters (the source program text) into tokens.
  • Difference between parsing and scanning:
    • Parsing groups terminal symbols, which are tokens, into larger phrases such as expressions and commands and analyzes the tokens for correctness and structure
    • Scanning groups individual characters into tokens
structure of a compiler1
Structure of a Compiler

Lexical Analyzer

Source code

Symbol

Table

tokens

Parser & Semantic Analyzer

parse tree

Intermediate Code Generation

intermediate representation

Optimization

intermediate representation

Assembly Code Generation

Assembly code

creating tokens mini triangle example1

let var y: Integer

in !new year

y := y+1

Buffer

(S

= space)

Creating Tokens – Mini-Triangle Example

Input

Converter

character string

. . . .

S

S

S

S

l

e

t

v

a

r

y

:

I

n

t

e

g

e

r

i

n

Scanner

Ident.

Ident.

becomes

Ident.

op.

Intlit.

eot

Ident.

colon

in

let

var

:=

1

y

:

Integer

y

y

+

in

let

var

what does a scanner do
What Does a Scanner Do?
  • Handle keywords (reserve words)
    • Recognizes identifiers and keywords
    • Match explicitly
      • Write regular expression for each keyword
      • Identifier is any alpha numeric string which is not a keyword
    • Match as an identifier, perform lookup
      • No special regular expressions for keywords
      • When an identifier is found, perform lookup into preloaded keyword table

How does Triangle handle keywords?

Discuss in terms of efficiency and ease to code.

what does a scanner do1
What Does a Scanner Do?
  • Remove white space
    • Tabs, spaces, new lines
  • Remove comments
    • Single line

-- Ada comment

    • Multi-line, start and end delimiters

{ Pascal comment }

/* c comment */

    • Nested
    • Runaway comments
      • Nonterminated comments can’t be detected till end of file
what does a scanner do2
What Does a Scanner Do?
  • Perform look ahead
    • Multi-character tokens

1..10 vs. 1.10

&, &&

<, <=

etc

  • Challenging input languages
    • FORTRAN
      • Keywords not reserved
      • Blanks are not a delimiter
      • Example (comma vs. decimal)

DO10I=1,5 start of a do loop (equivalent to a C for loop)

DO10I=1.5 an assignment statement, assignment to variable DO10I

what does a scanner do3
What Does a Scanner Do?
  • Challenging input languages (cont.)
    • PL/I, keywords not reserved

IF THEN THEN THEN = ELSE; ELSE ELSE = THEN;

what does a scanner do4
What Does a Scanner Do?
  • Error Handling
    • Error token passed to parser which reports the error
    • Recovery
      • Delete characters from current token which have been read so far, restart scanning at next unread character
      • Delete the first character of the current lexeme and resume scanning from next character.
    • Examples of lexical errors:
      • 3.25e bad format for a constant
      • Var#1 illegal character
    • Some errors that are not lexical errors
      • Mistyped keywords
        • Begim
      • Mismatched parenthesis
      • Undeclared variables
scanner implementation
Scanner Implementation
  • Issues
    • Simpler design – parser doesn’t have to worry about white space, etc.
    • Improve compiler efficiency – allows the construction of a specialized and potentially more efficient processor
    • Compiler portability is enhanced – input alphabet peculiarities and other device-specific anomalies can be restricted to the scanner
scanner implementation1
Scanner Implementation
  • What are the keywords in Triangle?
  • How are keywords and identifiers implemented in Triangles?
  • Is look ahead implemented in Triangle?
    • If so, how?
structure of a compiler2
Structure of a Compiler

Lexical Analyzer

Source code

Symbol

Table

tokens

Semantic Analyzer

Parser

parse tree

Intermediate Code Generation

intermediate representation

Optimization

intermediate representation

Assembly Code Generation

Assembly code

parsing
Parsing
  • Given an unambiguous, context free grammar, parsing is
    • Recognition of an input string, i.e., deciding whether or not the input string is a sentence of the grammar
    • Parsing of an input string, i.e., recognition of the input string plus determination of its phrase structure. The phrase structure can be represented by a syntax tree, or otherwise.

Unambiguous is necessary so that every sentence of the grammar will form exactly one syntax tree.

parsing1
Parsing
  • The syntax of programming language constructs are described by context-free grammars.
  • Advantages of unambiguous, context-free grammars
    • A precise, yet easy-to understand, syntactic specification of the programming language
    • For certain classes of grammars we can automatically construct an efficient parser that determines if a source program is syntactically well formed.
    • Imparts a structure to a programming language that is useful for the translation of source programs into correct object code and for the detection of errors.
    • Easier to add new constructs to the language if the implementation is based on a grammatical description of the language
parsing2

parser

sequence of tokens

syntax tree

Parsing
  • Check the syntax (structure) of a program and create a tree representation of the program
  • Programming languages have non-regular constructs
    • Nesting
    • Recursion
  • Context-free grammars are used to express the syntax for programming languages
context free grammars
Context-Free Grammars
  • Comprised of
    • A set of tokens or terminal symbols
    • A set of non-terminal symbols
    • A set of rules or productions which express the legal relationships between symbols
    • A start or goal symbol
  • Example:
    • expr  expr – digit
    • expr  expr + digit
    • expr  digit
    • digit 0|1|2|…|9
  • Tokens: -,+,0,1,2,…,9
  • Non-terminals: expr, digit
  • Start symbol: expr
context free grammars1
Context-Free Grammars

expr

  • expr  expr – digit
  • expr  expr + digit
  • expr  digit
  • digit 0|1|2|…|9

expr

-

digit

expr

digit

+

2

Example input:

3 + 8 - 2

digit

8

3

checking for correct syntax
Checking for Correct Syntax
  • Given a grammar for a language and a program, how do you know if the syntax of the program is legal?
  • A legal program can be derived from the start symbol of the grammar

Grammar must be unambiguous and context-free

deriving a string

expr  expr – digit

  • expr  expr + digit
  • expr  digit
  • digit 0|1|2|…|9

Example input:

3 + 8 - 2

Deriving a String
  • The derivation begins with the start symbol
  • At each step of a derivation the right hand side of a grammar rule is used to replace a non-terminal symbol
  • Continue replacing non-terminals until only terminal symbols remain

Rule 2

Rule 1

Rule 4

expr  expr – digit  expr – 2 expr + digit - 2

Rule 3

Rule 4

Rule 4

 expr + 8-2 digit + 8-23+8 -2

rightmost derivation

Rule 1

expr  expr – digit

  • expr  expr – digit
  • expr  expr + digit
  • expr  digit
  • digit 0|1|2|…|9

Example input:

3 + 8 - 2

Rightmost Derivation
  • The rightmost non-terminal is replaced in each step

Rule 4

expr – digit  expr – 2

Rule 2

expr – 2 expr + digit - 2

Rule 4

expr + digit - 2  expr + 8-2

Rule 3

expr + 8-2 digit + 8-2

Rule 4

digit + 8-23+8 -2

leftmost derivation

Rule 1

expr  expr – digit

  • expr  expr – digit
  • expr  expr + digit
  • expr  digit
  • digit 0|1|2|…|9

Example input:

3 + 8 - 2

Leftmost Derivation
  • The leftmost non-terminal is replaced in each step

Rule 2

expr – digit  expr + digit – digit

Rule 3

expr + digit – digit  digit + digit – digit

Rule 4

digit + digit – digit3 + digit – digit

Rule 4

3 + digit – digit 3 + 8 – digit

Rule 4

3 + 8 – digit 3 + 8 – 2

leftmost derivation1

Rule 1

expr  expr – digit

Leftmost Derivation
  • The leftmost non-terminal is replaced in each step

expr

1

1

Rule 2

expr – digit  expr + digit – digit

6

2

2

expr

-

digit

Rule 3

expr + digit – digit  digit + digit – digit

3

3

5

expr

digit

+

Rule 4

4

digit + digit – digit3 + digit – digit

2

Rule 4

3 + digit – digit 3 + 8 – digit

5

4

digit

8

Rule 4

3 + 8 – digit 3 + 8 – 2

6

3

bottom up parsing
Bottom-Up Parsing
  • Parser examines terminal symbols of the input string, in order from left to right
  • Reconstructs the syntax tree from the bottom (terminal nodes) up (toward the root node)
  • Bottom-up parsing reduces a string w to the start symbol of the grammar.
    • At each reduction step a particular sub-string matching the right side of a production is replaced by the symbol on the left of that production, and if the sub-string is chosen correctly at each step, a rightmost derivation is traced out in reverse.
bottom up parsing1
Bottom-Up Parsing
  • Types of bottom-up parsing algorithms
    • Shift-reduce parsing
      • At each reduction step a particular sub-string matching the right side of a production is replaced by the symbol on the left of that production, and if the sub-string is chosen correctly at each step, a rightmost derivation is traced out in reverse.
    • LR(k) parsing
      • L is for left-to-right scanning of the input, the R is for constructing a right-most derivation in reverse, and the k is for the number of input symbols of look-ahead that are used in making parsing decisions.
bottom up parsing example 3 8 2

expr  expr – digit

  • expr  expr + digit
  • expr  digit
  • digit 0|1|2|…|9

-

3

8

2

+

digit

Example input:

3 + 8 - 2

-

3

8

2

+

digit

digit

digit

digit

-

3

8

2

+

expr

-

3

8

2

+

Bottom-Up Parsing Example3+8-2
bottom up parsing example 3 8 21

expr

-

3

8

2

+

expr

digit

digit

digit

digit

digit

digit

digit

digit

-

3

8

2

+

expr

expr

-

3

8

2

+

Bottom-Up Parsing Example3+8-2
bottom up parsing example abbcde

S  aABe

  • A  Abc | b
  • B  d

a

b

b

c

d

e

Example input:

abbcde

A

a

b

b

c

d

e

Abbcde  aAbcde

A

a

b

b

c

d

e

aAbcde

Bottom-Up Parsing Exampleabbcde
bottom up parsing example abbcde1

S  aABe

  • A  Abc | b
  • B  d

A

A

Example input:

abbcde

a

b

b

c

d

e

aAbcde  aAde

A

A

a

b

b

c

d

e

aAde

Bottom-Up Parsing Exampleabbcde
bottom up parsing example abbcde2

S  aABe

  • A  Abc | b
  • B  d

Example input:

abbcde

A

B

A

a

b

b

c

d

e

aABe

Bottom-Up Parsing Exampleabbcde

A

B

A

a

b

b

c

d

e

aAde  aABe

bottom up parsing example abbcde3

S  aABe

  • A  Abc | b
  • B  d

Example input:

abbcde

Bottom-Up Parsing Exampleabbcde

S

A

B

A

a

b

b

c

d

e

aABe  S

bottom up parsing example the cat sees a rat

Sentence  Subject Verb Object.

  • Subject  I | a Noun | the Noun
  • Object  me | a Noun | the Noun
  • Noun  cat | mat | rat
  • Verb  like | is | see | sees

Noun

Example input:

the cat sees a rat

.

the

cat

sees

a

rat

the cat sees a rat.  the Noun sees a rat.

Noun

the

cat

sees

a

rat

.

the Noun sees a rat.

Bottom-Up Parsing Examplethe cat sees a rat.

the

cat

sees

a

rat

.

bottom up parsing example the cat sees a rat1

Sentence  Subject Verb Object.

  • Subject  I | a Noun | the Noun
  • Object  me | a Noun | the Noun
  • Noun  cat | mat | rat
  • Verb  like | is | see | sees

Subject

Noun

the

cat

sees

a

rat

.

Example input:

the cat sees a rat

the Noun sees a rat.  Subject sees a rat.

Subject

Noun

.

the

cat

sees

a

rat

Subject sees a rat.

Bottom-Up Parsing Examplethe cat sees a rat.
bottom up parsing example the cat sees a rat2

Subject

  • Sentence  Subject Verb Object.
  • Subject  I | a Noun | the Noun
  • Object  me | a Noun | the Noun
  • Noun  cat | mat | rat
  • Verb  like | is | see | sees

Noun

Verb

.

the

cat

sees

a

rat

Example input:

the cat sees a rat

Subject sees a rat.  Subject Verb a rat.

Subject

Noun

Verb

.

the

cat

sees

a

rat

Subject Verb a rat.

Bottom-Up Parsing Examplethe cat sees a rat.
bottom up parsing example the cat sees a rat3

Sentence  Subject Verb Object.

  • Subject  I | a Noun | the Noun
  • Object  me | a Noun | the Noun
  • Noun  cat | mat | rat
  • Verb  like | is | see | sees

Subject

Noun

Noun

Verb

.

the

cat

sees

a

rat

Example input:

the cat sees a rat

Subject

Noun

Noun

Verb

.

the

cat

sees

a

rat

Subject Verb a Noun.

Bottom-Up Parsing Examplethe cat sees a rat.

Subject Verb a rat.  Subject Verb a Noun.

bottom up parsing example the cat sees a rat4

Sentence  Subject Verb Object.

  • Subject  I | a Noun | the Noun
  • Object  me | a Noun | the Noun
  • Noun  cat | mat | rat
  • Verb  like | is | see | sees

Subject

Object

Noun

Noun

Verb

.

the

cat

sees

a

rat

Example input:

the cat sees a rat

Subject Verb a Noun.  Subject Verb Object.

Subject

Object

Noun

Noun

Verb

.

the

cat

sees

a

rat

Subject Verb Object.

Bottom-Up Parsing Examplethe cat sees a rat.

What would happened if we choose

‘Subject a Noun’ instead of ‘Object  a Noun’?

bottom up parsing example the cat sees a rat5

Sentence

  • Sentence  Subject Verb Object.
  • Subject  I | a Noun | the Noun
  • Object  me | a Noun | the Noun
  • Noun  cat | mat | rat
  • Verb  like | is | see | sees

Subject

Object

Example input:

the cat sees a rat

Noun

Noun

Verb

.

the

cat

sees

a

rat

Subject Verb Object.

Bottom-Up Parsing Examplethe cat sees a rat.
top down parsing
Top-Down Parsing
  • The parser examines the terminal symbols of the input string, in order from left to right.
  • The parser reconstructs its syntax tree from the top (root node) down (towards the terminal nodes).

An attempt to find the leftmost derivation for an input string

top down parsers
Top-Down Parsers
  • General rules for top-down parsers
    • Start with just a stub for the root node
    • At each step the parser takes the left most stub
    • If the stub is labeled by terminal symbol t, the parser connects it to the next input terminal symbol, which must be t. (If not, the parser has detected a syntactic error.)
    • If the stub is labeled by nonterminal symbol N, the parser chooses one of the production rules N::= X1…Xn, and grows branches from the node labeled by N to new stubs labeled X1,…, Xn (in order from left to right).
    • Parsing succeeds when and if the whole input string is connected up to the syntax tree.
top down parsing1
Top-Down Parsing
  • Two forms
    • Backtracking parsers
      • Guesses which rule to apply, back up, and changes choices if it can not proceed
    • Predictive Parsers
      • Predicts which rule to apply by using look-ahead tokens

Backtracking parsers are not very efficient. We will cover Predictive parsers

predictive parsers
Predictive Parsers
  • Many types
    • LL(1) parsing
      • First L is scanning the input form left to right; second L is for producing a left-most derivation; 1 is for using one input symbol of look-ahead
      • Table driven with an explicit stack to maintain the parse tree
    • Recursive decent parsing
      • Uses recursive subroutines to traverse the parse tree
predictive parsers lookahead

term

term

  • term  num term’
  • term’  ‘+’ num term’ | ‘-’ num term’ | e
  • num  ‘0’|’1’|’2’|…|’9’

num

num

term’

term’

Example input:

7 + 3 - 2

Predictive Parsers (Lookahead)
  • Lookahead in predictive parsing
    • The lookahead token (next token in the input) is used to determine which rule should be used next
    • For example:

7

num

term’

+

predictive parsers lookahead1

term

term

  • term  num term’
  • term’  ‘+’ num term’ | ‘-’ num term’ | e
  • num  ‘0’|’1’|’2’|…|’9’

num

num

term’

term’

Example input:

7 + 3 - 2

Predictive Parsers (Lookahead)

7

num

term’

+

3

7

num

term’

+

num

3

-

term’

predictive parsers lookahead2

term

term

  • term  num term’
  • term’  ‘+’ num term’ | ‘-’ num term’ | e
  • num  ‘0’|’1’|’2’|…|’9’

num

num

term’

term’

Example input:

7 + 3 - 2

Predictive Parsers (Lookahead)

num

term’

+

7

3

num

-

term’

2

num

term’

+

7

3

num

-

term’

e

2

recursive decent parsing
Recursive-Decent Parsing
  • Top-down parsing algorithm
    • Consists of a group of methods (programs) parseN, one for each nonterminal symbol N of the grammar.
    • The task of each method parseN is to parse a single N-phrase
    • These parsing methods cooperate to parse complete sentences
recursive decent parsing1

Sentence  Subject Verb Object.

  • Subject  I | a Noun | the Noun
  • Object  me | a Noun | the Noun
  • Noun  cat | mat | rat
  • Verb  like | is | see | sees

Example input:

the cat sees a rat

Recursive-Decent Parsing

Sentence

.

Verb

Subject

Object

the

cat

sees

a

rat

.

  • Decide which production rule to apply. Only one, #1.
  • This step created four stubs.
recursive decent parsing2

Sentence  Subject Verb Object.

  • Subject  I | a Noun | the Noun
  • Object  me | a Noun | the Noun
  • Noun  cat | mat | rat
  • Verb  like | is | see | sees

Example input:

the cat sees a rat

Recursive-Decent Parsing

Sentence

.

Verb

Subject

Object

Noun

cat

sees

a

rat

the

recursive decent parsing3

Sentence  Subject Verb Object.

  • Subject  I | a Noun | the Noun
  • Object  me | a Noun | the Noun
  • Noun  cat | mat | rat
  • Verb  like | is | see | sees

Example input:

the cat sees a rat

Recursive-Decent Parsing

Sentence

.

Verb

Subject

Object

Noun

cat

sees

a

rat

the

recursive decent parsing4

Sentence  Subject Verb Object.

  • Subject  I | a Noun | the Noun
  • Object  me | a Noun | the Noun
  • Noun  cat | mat | rat
  • Verb  like | is | see | sees

Example input:

the cat sees a rat

Recursive-Decent Parsing

Sentence

.

Verb

Subject

Object

Noun

cat

sees

a

rat

the

recursive decent parsing5

Sentence  Subject Verb Object.

  • Subject  I | a Noun | the Noun
  • Object  me | a Noun | the Noun
  • Noun  cat | mat | rat
  • Verb  like | is | see | sees

Example input:

the cat sees a rat

Recursive-Decent Parsing

Sentence

.

Verb

Subject

Object

Noun

Noun

cat

sees

a

rat

the

recursive decent parsing6

Sentence  Subject Verb Object.

  • Subject  I | a Noun | the Noun
  • Object  me | a Noun | the Noun
  • Noun  cat | mat | rat
  • Verb  like | is | see | sees

Example input:

the cat sees a rat

Recursive-Decent Parsing

Sentence

.

Verb

Subject

Object

Noun

Noun

cat

sees

a

rat

the

recursive decent parsing7

Sentence  Subject Verb Object.

  • Subject  I | a Noun | the Noun
  • Object  me | a Noun | the Noun
  • Noun  cat | mat | rat
  • Verb  like | is | see | sees

Example input:

the cat sees a rat

Recursive-Decent Parsing

Sentence

.

Verb

Subject

Object

Noun

Noun

cat

sees

a

rat

the

recursive descent parser for micro english
Recursive-Descent Parser for Micro-English
  • Sentence  Subject Verb Object.
  • Subject  I | a Noun | the Noun
  • Object  me | a Noun | the Noun
  • Noun  cat | mat | rat
  • Verb  like | is | see | sees

ParseSentence

ParseSubject

ParseObject

ParseVerb

ParseNoun

recursive descent parser for micro english1
Recursive-Descent Parser for Micro-English
  • Sentence  Subject Verb Object.
  • Subject  I | a Noun | the Noun
  • Object  me | a Noun | the Noun
  • Noun  cat | mat | rat
  • Verb  like | is | see | sees

ParseSentence

parseSubject

parseVerb

parseObject

parseEnd

Sentence 

Subject

Verb

Object

.

recursive descent parser for micro english2
Recursive-Descent Parser for Micro-English
  • Sentence  Subject Verb Object.
  • Subject  I | a Noun | the Noun
  • Object  me | a Noun | the Noun
  • Noun  cat | mat | rat
  • Verb  like | is | see | sees

Subject 

ParseSubject

if input = “I”

accept

else if input =“a”

accept

parseNoun

else if input = “the”

accept

parseNoun

else error

I

|

a

Noun

|

the

Noun

recursive descent parser for micro english3
Recursive-Descent Parser for Micro-English
  • Sentence  Subject Verb Object.
  • Subject  I | a Noun | the Noun
  • Object  me | a Noun | the Noun
  • Noun  cat | mat | rat
  • Verb  like | is | see | sees

ParseNoun

if input = “cat”

accept

else if input =“mat”

accept

else if input = “rat”

accept

else error

Noun 

cat

|

mat

|

rat

recursive descent parser for micro english4
Recursive-Descent Parser for Micro-English

Object 

  • Sentence  Subject Verb Object.
  • Subject  I | a Noun | the Noun
  • Object  me | a Noun | the Noun
  • Noun  cat | mat | rat
  • Verb  like | is | see | sees

ParseObject

if input = “me”

accept

else if input =“a”

accept

parseNoun

else if input = “the”

accept

parseNoun

else error

me

|

a

Noun

|

the

Noun

recursive descent parser for micro english5
Recursive-Descent Parser for Micro-English

Verb 

  • Sentence  Subject Verb Object.
  • Subject  I | a Noun | the Noun
  • Object  me | a Noun | the Noun
  • Noun  cat | mat | rat
  • Verb  like | is | see | sees

ParseVerb

if input = “like”

accept

else if input =“is”

accept

else if input = “see”

accept

else if input = “sees”

accept

else error

like

|

is

|

see

|

sees

recursive descent parser for micro english6
Recursive-Descent Parser for Micro-English
  • Sentence  Subject Verb Object.
  • Subject  I | a Noun | the Noun
  • Object  me | a Noun | the Noun
  • Noun  cat | mat | rat
  • Verb  like | is | see | sees

.

ParseEnd

if input = “.”

accept

else error

systematic development of a recursive descent parser
Systematic Development of a Recursive-Descent Parser
  • Given a (suitable) context-free grammar
    • Express the grammar in EBNF, with a single production rule for each nonterminal symbol, and perform any necessary grammar transformations
      • Always eliminate left recursion
      • Always left-factorize whenever possible
    • Transcribe each EBNF production rule N::=X to a parsing method parseN, whose body is determined by X
    • Make the parser consist of:
      • A private variable currentToken;
      • Private parsing methods developed in previous step
      • Private auxiliary methods accept and acceptIt, both of which call the scanner
      • A public parse method that calls parseS, where S is the start symbol of the grammar), having first called the scanner to store the first input token in currentToken
quote of the week
Quote of the Week
  • “C makes it easy to shoot yourself in the foot; C++ makes it harder, but when you do, it blows away your whole leg.”
    • Bjarne Stroustrup
quote of the week1
Quote of the Week

Did you really say that?

Dr. Bjarne Stroustrup:

Yes, I did say something along the lines of C makes it easy to shoot yourself in the foot; C++ makes it harder, but when you do, it blows your whole leg off. What people tend to miss is that what I said about C++ is to a varying extent true for all powerful languages. As you protect people from simple dangers, they get themselves into new and less obvious problems. Someone who avoids the simple problems may simply be heading for a not-so-simple one. One problem with very supporting and protective environments is that the hard problems may be discovered too late or be too hard to remedy once discovered. Also, a rare problem is harder to find than a frequent one because you don\'t suspect it.

I also said, "Within C++, there is a much smaller and cleaner language struggling to get out." For example, that quote can be found on page 207 of The Design and Evolution of C++. And no, that smaller and cleaner language is not Java or C#. The quote occurs in a section entitled "Beyond Files and Syntax". I was pointing out that the C++ semantics is much cleaner than its syntax. I was thinking of programming styles, libraries and programming environments that emphasized the cleaner and more effective practices over archaic uses focused on the low-level aspects of C.

converting ebnf production rules to parsing methods
Converting EBNF Production Rules to Parsing Methods
  • For production rule N::=X
    • Convert production rule to parsing method named parseN
      • Private void parseN () {
      • Parse X
      • }
    • Refine parseE to a dummy statement
    • Refine parse t (where t is a terminal symbol) to accept(t) or acceptIt()
    • Refine parse N (where N is a non terminal symbol) to a call of the corresponding parsing method

parseN()

    • Refine parse X Y to

{

parseX

parseY

}}

    • Refine parse X|Y

Switch (currentToken.kind) {

Cases in starter[[X]]

Parse X

Break;

Cases in starters[[Y]]:

Parse Y

Break

Default:

Report a syntax error

}

converting ebnf production rules to parsing methods1
Converting EBNF Production Rules to Parsing Methods
  • For X | Y
    • Choose parse X only if the current token is one that can start an X-phrase
    • Choose parse Y only if the current token is one that can start an Y-phrase
      • starters[[X]] and starters[[Y]] must be disjoint
  • For X*
    • Choose

while (currentToken.kind is in starters[[X]])

      • starter[[X]] must be disjoint from the set of tokens that can follow X* in this particular context
converting ebnf production rules to parsing methods2
Converting EBNF Production Rules to Parsing Methods
  • A grammar that satisfies both these conditions is called an LL(1) grammar
  • Recursive-descent parsing is suitable only for LL(1) grammars
error repair
Error Repair
  • Good programming languages are designed with a relatively large “distance” between syntactically correct programs, to increase the likelihood that conceptual mistakes are caught on syntactic errors.
  • Error repair usually occurs at two levels:
    • Local: repairs mistakes with little global import, such as missing semicolons and undeclared variables.
    • Scope: repairs the program text so that scopes are correct. Errors of this kind include unbalanced parentheses and begin/end blocks.
error repair1
Error Repair
  • Repair actions can be divided into insertions and deletions. Typically the compiler will use some look ahead and backtracking in attempting to make progress in the parse. There is great variation among compilers, though some languages (PL/C) carry a tradition of good error repair. Goals of error repair are:
    • No input should cause the compiler to collapse
    • Illegal constructs are flagged
    • Frequently occurring errors are repaired gracefully
    • Minimal stuttering or cascading of errors.

LL-Style parsing lends itself well to error repair, since the compiler uses the grammar’s rules to predict what should occur next in the input

triangle single command
Triangle single-Command

Single-Command ::= 

| V-name := Expression

| Identifier ( Actual-Parameter-Sequence )

| begin Command end

| let Declaration in single-Command

| if Expression then single-Command

else single-Command

| while Expression do single-Command

V-name ::= Identifier

| V-name . Identifier

| V-name [ Expression ]

Identifier :: = Letter (Letter | Digit)*

Letter ::= a|b|c|d|e|f|g|h|I|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z

|A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z

Digit :: = 0|1|2|3|4|5|6|7|8|9

starter sets2
Starter Sets
  • Starter Set for RE
    • starters[[X]] is the string of terminal symbols that can start a string generated by X
  • Example

starters[[single-Command]] =

starters[[:=, (, begin, let, if, while]]

      • What about Vname vs Identifier?
        • Use the look ahead when encounter Identifier to look for := or (.
mini triangle production rules
Mini-Triangle Production Rules

Program ::= Command Program (1.14)

Command ::= V-name := Expression AssignCommand (1.15a)

| Identifier ( Expression ) CallCommand (1.15b)

| Command ; Command SequentialCommand (1.15c)

| if Expression then Command IfCommand (15.d)

else Command

| while Expression do Command WhileCommand (1.15e

| let Declaration in Command LetCommand (1.15f)

Expression ::= Integer-Literal IntegerExpression (1.16a)

| V-name VnameExpression (1.16b)

| Operator Expression UnaryExpression (1.16c)

| Expression Operator Expression BinaryExpressioiun (1.16d)

V-name ::= Identifier SimpelVname (1.17)

Declaration ::= const Identifier ~ Expression ConstDeclaration (1.18a)

| var Identifier : Typoe-denoter VarDeclaration (1.18b)

| Declaration ; Declaration SequentialDeclaration (1.18c)

Type-denoter ::= Identifier SimpleTypeDenoter (1.19)

abstract syntax trees
Abstract Syntax Trees
  • An explicit representation of the source program’s phrase structure
  • AST for Mini-Triangle
abstract syntax trees1
Abstract Syntax Trees

Program ::= Command Program (1.14

  • Program ASTs (P):

Program

C

  • Command ASTs (C):

AssignCommand

CallCommand

SequentialCommand

E

V

E

Identifier

C2

C1

(1.15a)

(1.15b)

(1.15c)

spelling

Command ::= V-name := Expression AssignCommand (1.15a)

| Identifier ( Expression ) CallCommand (1.15b)

| Command ; Command SequentialCommand (1.15c)

abstract syntax trees2
Abstract Syntax Trees
  • Command ASTs (C):

WhileCommand

LetCommand

SequentialCommand

E

C

V

D

C2

E

C1

(1.15e)

(1.15f)

(1.15d)

Command ::= | if Expression then Command IfCommand (15.d)

else Command

| while Expression do Command WhileCommand (1.15e

| let Declaration in Command LetCommand (1.15f)

ad