Csc 415 translators and compilers spring 2009
This presentation is the property of its rightful owner.
Sponsored Links
1 / 95

CSC 415: Translators and Compilers Spring 2009 PowerPoint PPT Presentation


  • 97 Views
  • Uploaded on
  • Presentation posted in: General

CSC 415: Translators and Compilers Spring 2009. Chapter 4 Syntactic Analysis. Syntactic Analysis. Sub-phases of Syntactic Analysis Grammars Revisited Parsing Abstract Syntax Trees Scanning Case Study: Syntactic Analysis in the Triangle Compiler. Structure of a Compiler.

Download Presentation

CSC 415: Translators and Compilers Spring 2009

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Csc 415 translators and compilers spring 2009

CSC 415: Translators and CompilersSpring 2009

Chapter 4

Syntactic Analysis


Syntactic analysis

Syntactic Analysis

  • Sub-phases of Syntactic Analysis

  • Grammars Revisited

  • Parsing

  • Abstract Syntax Trees

  • Scanning

  • Case Study: Syntactic Analysis in the Triangle Compiler


Structure of a compiler

Structure of a Compiler

Lexical Analyzer

Source code

Symbol

Table

tokens

Parser & Semantic Analyzer

parse tree

Intermediate Code Generation

intermediate representation

Optimization

intermediate representation

Assembly Code Generation

Assembly code


Syntactic analysis1

Syntactic Analysis

  • Main function

    • Parse source program to discover its phrase structure

    • Recursive-descent parsing

    • Constructing an AST

    • Scanning to group characters into tokens


Sub phases of syntactic analysis

Sub-phases of Syntactic Analysis

  • Scanning (or lexical analysis)

    • Source program transformed to a stream of tokens

      • Identifiers

      • Literals

      • Operators

      • Keywords

      • Punctuation

    • Comments and blank spaces discarded

  • Parsing

    • To determine the source programs phrase structure

    • Source program is input as a stream of tokens (from the Scanner)

    • Treats each token as a terminal symbol

  • Representation of phrase structure

    • AST


Lexical analysis a simple example

Lexical Analysis – A Simple Example

let var y: Integer

in !new year

y := y+1

Note: !new year does not appear in list of tokens. Comments are removed along with white spaces.

  • Scan the file character by character and group characters into words and punctuation (tokens), remove white space and comments

  • Tokens for this example:

    let

    var

    y

    :

    Integer

    in

    y

    :=

    y

    +

    1


Creating tokens mini triangle example

let var y: Integer

in !new year

y := y+1

Buffer

(S

= space)

Creating Tokens – Mini-Triangle Example

Input

Converter

character string

. . . .

S

S

S

S

l

e

t

v

a

r

y

:

I

n

t

e

g

e

r

i

n

Scanner

Ident.

Ident.

becomes

Ident.

op.

Intlit.

eot

Ident.

colon

in

let

var

:=

1

:

y

Integer

y

y

+

in

let

var


Tokens in triangle

Tokens in Triangle

// punctuation...

DOT= 21,".",

COLON= 22,":",

SEMICOLON= 23, ";",

COMMA= 24, ",",

BECOMES= 25, "~",

IS= 26,

// brackets...

LPAREN= 27, "(",

RPAREN= 28,")",

LBRACKET= 29,[",

RBRACKET= 30, "]",

LCURLY= 31, "{",

RCURLY= 32, "}",

// special tokens...

EOT= 33,"",

ERROR= 34; "<error>"

// literals, identifiers, operators...

INTLITERAL= 0, "<int>",

CHARLITERAL= 1, "<char>",

IDENTIFIER= 2, "<identifier>",

OPERATOR= 3, "<operator>",

// reserved words - must be in alphabetical order...

ARRAY= 4,"array",

BEGIN= 5, "begin",

CONST= 6, "const",

DO= 7, "do",

ELSE= 8, "else",

END= 9, "end",

FUNC= 10, "func",

IF= 11, "if",

IN= 12, "in",

LET= 13,"let",

OF= 14, "of",

PROC= 15, "proc",

RECORD= 16, "record",

THEN= 17, "then",

TYPE= 18, "type",

VAR= 19, "var",

WHILE= 20, "while",


Grammars revisited

Grammars Revisited

  • Context free grammars

    • Generates a set of sentences

    • Each sentence is a string of terminal symbols

    • An unambiguous sentence has a unique phrase structure embodied in its syntax tree

  • Develop parsers from context-free grammars


Regular expressions

Regular Expressions

  • A regular expression (RE) is a convenient notation for expressing a set of stings of terminal symbols

  • Main features

    • ‘|’ separates alternatives

    • ‘*’ indicates that the previous item may be represented zero or more times

    • ‘(‘ and ‘)’ are grouping parentheses

  • e The empty string -- a special string of length 0


Regular expression basics

Regular Expression Basics

  • Algebraic Properties

    • | is commutative and associative

      • r|s = s|r

      • r|(s|t) = (r|s)|t

    • Concatenation is associative

      • (rs)t = r(st)

    • Concatenation distributes over |

      • r(s|t) = rs|rt

      • (s|t)r = sr|tr

    • e is the identity for concatenation

      • e r = r

      • r e = r

    • * is idempotent

      • r** = r*

      • r* = (r| e)*


Regular expression basics1

Regular Expression Basics

  • Common Extensions

    • r+one or more of expression r, same as rr*

    • rkk repetitions of r

      • r3 = rrr

    • ~rthe characters not in the expression r

      • ~[\t\n]

    • r-zrange of characters

      • [0-9a-z]

    • r?Zero or one copy of expression (used for fields of an expression that are optional)


Regular expression example

Regular Expression Example

  • Regular Expression for Representing Months

    • Examples of legal inputs

      • January represented as 1 or 01

      • October represented as 10

    • First Try: [0|1|e][0-9]0, 1, or e followed by a number between 0 and 9

      • Matches all legal inputs? Yes

        1, 2, 3, …, 10, 11, 12, 01, 02, …, 09

      • Matches any illegal inputs? Yes

        0, 00, 18


Regular expression example1

Regular Expression Example

  • Regular Expression for Representing Months

    • Examples of legal inputs

      • January represented as 1 or 01

      • October represented as 10

    • Second Try: [1-9]|(0[1-9])|(1[0-2])

      • Any number between 1 and 9 or 0 followed by any number between 1 and 9 or 1 followed by any number between 0 and 2

      • Matches all legal inputs? Yes

        1, 2, 3, …, 10, 11, 12, 01, 02, …, 09

      • Matches any illegal inputs? No


Regular expression example2

Regular Expression Example

  • Regular Expression for Floating Point Numbers

    • Examples of legal inputs

      • 1.0, 0.2, 3.14159, -1.0, 2.7e8, 1.0E-6, -2.5e+5

      • Assume that a 0 is required before numbers less than 1 and does not prevent extra leading zeros, so numbers such as 0011 or 0003.14159 are legal

    • Building the regular expression

      • Assume

        digit  0|1|2|3|4|5|6|7|8|9

      • Handle simple decimals such as 1.0, 0.2, 3.14159

        digit+.digit+1 or more digits followed by . followed by 1 or more decimals

      • Add an optional sign (only minus, no plus)

        (-| e)digit+.digit+or-?digit+.digit+


Regular expression example3

Regular Expression Example

  • Regular Expression for Floating Point Numbers (cont.)

    • Building the regular expression (cont.)

      • Format for the exponent

        (E|e)(+|-)?(digit+)

      • Adding it as an optional expression to the decimal part

        (-| e)digit+.digit+((E|e)(+|-)?(digit+))?


Extended bnf

Extended BNF

  • Extended BNF (EBNF)

    • Combination of BNF and RE

    • N::=X, where N is a nonterminal symbol and X is an extended RE, i.e., an RE constructed from both terminal and nonterminal symbols

    • EBNF

      • Right hand side may use |. *, (, )

      • Right hand side may contain both terminal and nonterminal symbols


Example ebnf

Example EBNF

Expression::=primary-Expression (Operator primary-Expression)*

primary-Expression::=Identifier

|( Expression )

Identifier::=a|b|c|d|e

Operator::=+|-|*|/

Generates

e

a + b

a – b – c

a + (b * c)

a + (b + c) / d

a – (b – (c – (d – e)))


Grammar transformations

Grammar Transformations

  • Left Factorization

    XY | XZ is equivalent to X(Y | Z)

    single-Command::= V-name := Expression

    |if Expression then single-Command

    |if Expression then single-Command

    else single-Command

    single-Command::=V-name := Expression

    |if Expression then single-Command

    (e |else single-Command)


Grammar transformations1

Grammar Transformations

  • Elimination of left recursion

    N::= X | NY is equivalent to N::=X(Y)*

    Identifier::= Letter

    |Identifier Letter

    |Identifier Digit

    Identifier::=Letter

    |Identifier (Letter | Digit)

    Identifier::=Letter(Letter | Digit)*


Grammar transformations2

Grammar Transformations

  • Substitution of nonterminal symbols

    Given N::=X, we can substitute each occurrence of N with X

    iff N::=X is nonrecursive and is the only production rule for N

    single-Command::=for Control-Variable := Expression To-or-Downto

    Expression do single-Command

    |…

    Control-Variable::=Identifier

    To-or-Downto::=to

    |down

    single-Command::=for Identifier := Expression (to|downto)

    Expression do single-Command

    |…


Starter sets

Starter Sets

  • Starter set of an RE X

    • Starters[[X]]

    • Set of terminal symbols that can start a string generated by X

  • Examples

    • Starter[[his | her | its]] = {h, i}

    • Starter[[(re)* set]] = {r, s}


Starter sets1

Starter Sets

  • Precise and complete definition of starters:

    starters[[e]] = {}

    starters[[t]] = {t}where t is a terminal symbol

    starters[[X Y]] = starters[[X]]  starters[[Y]]if X generates e

    starters[[X Y]] = starters[[X]]if X does not generate e

    starters[[X | Y]] = starters[[X]]  starters[[Y]]

    starters[[X *]] = starters[[X]]

  • To generalize fo ra starter set of an extended RE add

    • starters[[N]] = starters[[X]]where N is a nonterminal symbol defined production rule N ::= X


Example starter set

Example Starter Set

Expression::=primary-Expression (Operator primary-Expression)*

primary-Expression::=Identifier

|( Expression )

Identifier::=a|b|c|d|e

Operator::=+|-|*|/

starters[[Expression]] = starters[[primary-Expression (Operator primary-Expression)*]]

= starters[[primany-Expression]]

= starters[[Identifier]]  starters[[ (Expressions ) ]]

= starters[[a | b | c | d | e]]  { ( }

= {a, b, c, d, e, (}


Scanning lexical analysis

Scanning (Lexical Analysis)

  • The purpose of scanning is to recognize tokens in the source program. Or, to group input characters (the source program text) into tokens.

  • Difference between parsing and scanning:

    • Parsing groups terminal symbols, which are tokens, into larger phrases such as expressions and commands and analyzes the tokens for correctness and structure

    • Scanning groups individual characters into tokens


Structure of a compiler1

Structure of a Compiler

Lexical Analyzer

Source code

Symbol

Table

tokens

Parser & Semantic Analyzer

parse tree

Intermediate Code Generation

intermediate representation

Optimization

intermediate representation

Assembly Code Generation

Assembly code


Creating tokens mini triangle example1

let var y: Integer

in !new year

y := y+1

Buffer

(S

= space)

Creating Tokens – Mini-Triangle Example

Input

Converter

character string

. . . .

S

S

S

S

l

e

t

v

a

r

y

:

I

n

t

e

g

e

r

i

n

Scanner

Ident.

Ident.

becomes

Ident.

op.

Intlit.

eot

Ident.

colon

in

let

var

:=

1

y

:

Integer

y

y

+

in

let

var


What does a scanner do

What Does a Scanner Do?

  • Handle keywords (reserve words)

    • Recognizes identifiers and keywords

    • Match explicitly

      • Write regular expression for each keyword

      • Identifier is any alpha numeric string which is not a keyword

    • Match as an identifier, perform lookup

      • No special regular expressions for keywords

      • When an identifier is found, perform lookup into preloaded keyword table

How does Triangle handle keywords?

Discuss in terms of efficiency and ease to code.


What does a scanner do1

What Does a Scanner Do?

  • Remove white space

    • Tabs, spaces, new lines

  • Remove comments

    • Single line

      -- Ada comment

    • Multi-line, start and end delimiters

      { Pascal comment }

      /* c comment */

    • Nested

    • Runaway comments

      • Nonterminated comments can’t be detected till end of file


What does a scanner do2

What Does a Scanner Do?

  • Perform look ahead

    • Multi-character tokens

      1..10 vs. 1.10

      &, &&

      <, <=

      etc

  • Challenging input languages

    • FORTRAN

      • Keywords not reserved

      • Blanks are not a delimiter

      • Example (comma vs. decimal)

        DO10I=1,5 start of a do loop (equivalent to a C for loop)

        DO10I=1.5 an assignment statement, assignment to variable DO10I


What does a scanner do3

What Does a Scanner Do?

  • Challenging input languages (cont.)

    • PL/I, keywords not reserved

      IF THEN THEN THEN = ELSE; ELSE ELSE = THEN;


What does a scanner do4

What Does a Scanner Do?

  • Error Handling

    • Error token passed to parser which reports the error

    • Recovery

      • Delete characters from current token which have been read so far, restart scanning at next unread character

      • Delete the first character of the current lexeme and resume scanning from next character.

    • Examples of lexical errors:

      • 3.25ebad format for a constant

      • Var#1illegal character

    • Some errors that are not lexical errors

      • Mistyped keywords

        • Begim

      • Mismatched parenthesis

      • Undeclared variables


Scanner implementation

Scanner Implementation

  • Issues

    • Simpler design – parser doesn’t have to worry about white space, etc.

    • Improve compiler efficiency – allows the construction of a specialized and potentially more efficient processor

    • Compiler portability is enhanced – input alphabet peculiarities and other device-specific anomalies can be restricted to the scanner


Scanner implementation1

Scanner Implementation

  • What are the keywords in Triangle?

  • How are keywords and identifiers implemented in Triangles?

  • Is look ahead implemented in Triangle?

    • If so, how?


Structure of a compiler2

Structure of a Compiler

Lexical Analyzer

Source code

Symbol

Table

tokens

Semantic Analyzer

Parser

parse tree

Intermediate Code Generation

intermediate representation

Optimization

intermediate representation

Assembly Code Generation

Assembly code


Parsing

Parsing

  • Given an unambiguous, context free grammar, parsing is

    • Recognition of an input string, i.e., deciding whether or not the input string is a sentence of the grammar

    • Parsing of an input string, i.e., recognition of the input string plus determination of its phrase structure. The phrase structure can be represented by a syntax tree, or otherwise.

Unambiguous is necessary so that every sentence of the grammar will form exactly one syntax tree.


Parsing1

Parsing

  • The syntax of programming language constructs are described by context-free grammars.

  • Advantages of unambiguous, context-free grammars

    • A precise, yet easy-to understand, syntactic specification of the programming language

    • For certain classes of grammars we can automatically construct an efficient parser that determines if a source program is syntactically well formed.

    • Imparts a structure to a programming language that is useful for the translation of source programs into correct object code and for the detection of errors.

    • Easier to add new constructs to the language if the implementation is based on a grammatical description of the language


Parsing2

parser

sequence of tokens

syntax tree

Parsing

  • Check the syntax (structure) of a program and create a tree representation of the program

  • Programming languages have non-regular constructs

    • Nesting

    • Recursion

  • Context-free grammars are used to express the syntax for programming languages


Context free grammars

Context-Free Grammars

  • Comprised of

    • A set of tokens or terminal symbols

    • A set of non-terminal symbols

    • A set of rules or productions which express the legal relationships between symbols

    • A start or goal symbol

  • Example:

    • expr  expr – digit

    • expr  expr + digit

    • expr  digit

    • digit 0|1|2|…|9

  • Tokens: -,+,0,1,2,…,9

  • Non-terminals: expr, digit

  • Start symbol: expr


Context free grammars1

Context-Free Grammars

expr

  • expr  expr – digit

  • expr  expr + digit

  • expr  digit

  • digit 0|1|2|…|9

expr

-

digit

expr

digit

+

2

Example input:

3 + 8 - 2

digit

8

3


Checking for correct syntax

Checking for Correct Syntax

  • Given a grammar for a language and a program, how do you know if the syntax of the program is legal?

  • A legal program can be derived from the start symbol of the grammar

Grammar must be unambiguous and context-free


Deriving a string

  • expr  expr – digit

  • expr  expr + digit

  • expr  digit

  • digit 0|1|2|…|9

Example input:

3 + 8 - 2

Deriving a String

  • The derivation begins with the start symbol

  • At each step of a derivation the right hand side of a grammar rule is used to replace a non-terminal symbol

  • Continue replacing non-terminals until only terminal symbols remain

Rule 2

Rule 1

Rule 4

expr  expr – digit  expr – 2 expr + digit - 2

Rule 3

Rule 4

Rule 4

 expr + 8-2 digit + 8-23+8 -2


Rightmost derivation

Rule 1

expr  expr – digit

  • expr  expr – digit

  • expr  expr + digit

  • expr  digit

  • digit 0|1|2|…|9

Example input:

3 + 8 - 2

Rightmost Derivation

  • The rightmost non-terminal is replaced in each step

Rule 4

expr – digit  expr – 2

Rule 2

expr – 2 expr + digit - 2

Rule 4

expr + digit - 2  expr + 8-2

Rule 3

expr + 8-2 digit + 8-2

Rule 4

digit + 8-23+8 -2


Leftmost derivation

Rule 1

expr  expr – digit

  • expr  expr – digit

  • expr  expr + digit

  • expr  digit

  • digit 0|1|2|…|9

Example input:

3 + 8 - 2

Leftmost Derivation

  • The leftmost non-terminal is replaced in each step

Rule 2

expr – digit  expr + digit – digit

Rule 3

expr + digit – digit  digit + digit – digit

Rule 4

digit + digit – digit3 + digit – digit

Rule 4

3 + digit – digit 3 + 8 – digit

Rule 4

3 + 8 – digit 3 + 8 – 2


Leftmost derivation1

Rule 1

expr  expr – digit

Leftmost Derivation

  • The leftmost non-terminal is replaced in each step

expr

1

1

Rule 2

expr – digit  expr + digit – digit

6

2

2

expr

-

digit

Rule 3

expr + digit – digit  digit + digit – digit

3

3

5

expr

digit

+

Rule 4

4

digit + digit – digit3 + digit – digit

2

Rule 4

3 + digit – digit 3 + 8 – digit

5

4

digit

8

Rule 4

3 + 8 – digit 3 + 8 – 2

6

3


Bottom up parsing

Bottom-Up Parsing

  • Parser examines terminal symbols of the input string, in order from left to right

  • Reconstructs the syntax tree from the bottom (terminal nodes) up (toward the root node)

  • Bottom-up parsing reduces a string w to the start symbol of the grammar.

    • At each reduction step a particular sub-string matching the right side of a production is replaced by the symbol on the left of that production, and if the sub-string is chosen correctly at each step, a rightmost derivation is traced out in reverse.


Bottom up parsing1

Bottom-Up Parsing

  • Types of bottom-up parsing algorithms

    • Shift-reduce parsing

      • At each reduction step a particular sub-string matching the right side of a production is replaced by the symbol on the left of that production, and if the sub-string is chosen correctly at each step, a rightmost derivation is traced out in reverse.

    • LR(k) parsing

      • L is for left-to-right scanning of the input, the R is for constructing a right-most derivation in reverse, and the k is for the number of input symbols of look-ahead that are used in making parsing decisions.


Bottom up parsing example 3 8 2

  • expr  expr – digit

  • expr  expr + digit

  • expr  digit

  • digit 0|1|2|…|9

-

3

8

2

+

digit

Example input:

3 + 8 - 2

-

3

8

2

+

digit

digit

digit

digit

-

3

8

2

+

expr

-

3

8

2

+

Bottom-Up Parsing Example3+8-2


Bottom up parsing example 3 8 21

expr

-

3

8

2

+

expr

digit

digit

digit

digit

digit

digit

digit

digit

-

3

8

2

+

expr

expr

-

3

8

2

+

Bottom-Up Parsing Example3+8-2


Bottom up parsing example abbcde

  • S  aABe

  • A  Abc | b

  • B  d

a

b

b

c

d

e

Example input:

abbcde

A

a

b

b

c

d

e

Abbcde  aAbcde

A

a

b

b

c

d

e

aAbcde

Bottom-Up Parsing Exampleabbcde


Bottom up parsing example abbcde1

  • S  aABe

  • A  Abc | b

  • B  d

A

A

Example input:

abbcde

a

b

b

c

d

e

aAbcde  aAde

A

A

a

b

b

c

d

e

aAde

Bottom-Up Parsing Exampleabbcde


Bottom up parsing example abbcde2

  • S  aABe

  • A  Abc | b

  • B  d

Example input:

abbcde

A

B

A

a

b

b

c

d

e

aABe

Bottom-Up Parsing Exampleabbcde

A

B

A

a

b

b

c

d

e

aAde  aABe


Bottom up parsing example abbcde3

  • S  aABe

  • A  Abc | b

  • B  d

Example input:

abbcde

Bottom-Up Parsing Exampleabbcde

S

A

B

A

a

b

b

c

d

e

aABe  S


Bottom up parsing example the cat sees a rat

  • Sentence  Subject Verb Object.

  • Subject  I | a Noun | the Noun

  • Object  me | a Noun | the Noun

  • Noun  cat | mat | rat

  • Verb  like | is | see | sees

Noun

Example input:

the cat sees a rat

.

the

cat

sees

a

rat

the cat sees a rat.  the Noun sees a rat.

Noun

the

cat

sees

a

rat

.

the Noun sees a rat.

Bottom-Up Parsing Examplethe cat sees a rat.

the

cat

sees

a

rat

.


Bottom up parsing example the cat sees a rat1

  • Sentence  Subject Verb Object.

  • Subject  I | a Noun | the Noun

  • Object  me | a Noun | the Noun

  • Noun  cat | mat | rat

  • Verb  like | is | see | sees

Subject

Noun

the

cat

sees

a

rat

.

Example input:

the cat sees a rat

the Noun sees a rat.  Subject sees a rat.

Subject

Noun

.

the

cat

sees

a

rat

Subject sees a rat.

Bottom-Up Parsing Examplethe cat sees a rat.


Bottom up parsing example the cat sees a rat2

Subject

  • Sentence  Subject Verb Object.

  • Subject  I | a Noun | the Noun

  • Object  me | a Noun | the Noun

  • Noun  cat | mat | rat

  • Verb  like | is | see | sees

Noun

Verb

.

the

cat

sees

a

rat

Example input:

the cat sees a rat

Subject sees a rat.  Subject Verb a rat.

Subject

Noun

Verb

.

the

cat

sees

a

rat

Subject Verb a rat.

Bottom-Up Parsing Examplethe cat sees a rat.


Bottom up parsing example the cat sees a rat3

  • Sentence  Subject Verb Object.

  • Subject  I | a Noun | the Noun

  • Object  me | a Noun | the Noun

  • Noun  cat | mat | rat

  • Verb  like | is | see | sees

Subject

Noun

Noun

Verb

.

the

cat

sees

a

rat

Example input:

the cat sees a rat

Subject

Noun

Noun

Verb

.

the

cat

sees

a

rat

Subject Verb a Noun.

Bottom-Up Parsing Examplethe cat sees a rat.

Subject Verb a rat.  Subject Verb a Noun.


Bottom up parsing example the cat sees a rat4

  • Sentence  Subject Verb Object.

  • Subject  I | a Noun | the Noun

  • Object  me | a Noun | the Noun

  • Noun  cat | mat | rat

  • Verb  like | is | see | sees

Subject

Object

Noun

Noun

Verb

.

the

cat

sees

a

rat

Example input:

the cat sees a rat

Subject Verb a Noun.  Subject Verb Object.

Subject

Object

Noun

Noun

Verb

.

the

cat

sees

a

rat

Subject Verb Object.

Bottom-Up Parsing Examplethe cat sees a rat.

What would happened if we choose

‘Subject a Noun’ instead of ‘Object  a Noun’?


Bottom up parsing example the cat sees a rat5

Sentence

  • Sentence  Subject Verb Object.

  • Subject  I | a Noun | the Noun

  • Object  me | a Noun | the Noun

  • Noun  cat | mat | rat

  • Verb  like | is | see | sees

Subject

Object

Example input:

the cat sees a rat

Noun

Noun

Verb

.

the

cat

sees

a

rat

Subject Verb Object.

Bottom-Up Parsing Examplethe cat sees a rat.


Top down parsing

Top-Down Parsing

  • The parser examines the terminal symbols of the input string, in order from left to right.

  • The parser reconstructs its syntax tree from the top (root node) down (towards the terminal nodes).

An attempt to find the leftmost derivation for an input string


Top down parsers

Top-Down Parsers

  • General rules for top-down parsers

    • Start with just a stub for the root node

    • At each step the parser takes the left most stub

    • If the stub is labeled by terminal symbol t, the parser connects it to the next input terminal symbol, which must be t. (If not, the parser has detected a syntactic error.)

    • If the stub is labeled by nonterminal symbol N, the parser chooses one of the production rules N::= X1…Xn, and grows branches from the node labeled by N to new stubs labeled X1,…, Xn (in order from left to right).

    • Parsing succeeds when and if the whole input string is connected up to the syntax tree.


Top down parsing1

Top-Down Parsing

  • Two forms

    • Backtracking parsers

      • Guesses which rule to apply, back up, and changes choices if it can not proceed

    • Predictive Parsers

      • Predicts which rule to apply by using look-ahead tokens

Backtracking parsers are not very efficient. We will cover Predictive parsers


Predictive parsers

Predictive Parsers

  • Many types

    • LL(1) parsing

      • First L is scanning the input form left to right; second L is for producing a left-most derivation; 1 is for using one input symbol of look-ahead

      • Table driven with an explicit stack to maintain the parse tree

    • Recursive decent parsing

      • Uses recursive subroutines to traverse the parse tree


Predictive parsers lookahead

term

term

  • term  num term’

  • term’  ‘+’ num term’ | ‘-’ num term’ | e

  • num  ‘0’|’1’|’2’|…|’9’

num

num

term’

term’

Example input:

7 + 3 - 2

Predictive Parsers (Lookahead)

  • Lookahead in predictive parsing

    • The lookahead token (next token in the input) is used to determine which rule should be used next

    • For example:

7

num

term’

+


Predictive parsers lookahead1

term

term

  • term  num term’

  • term’  ‘+’ num term’ | ‘-’ num term’ | e

  • num  ‘0’|’1’|’2’|…|’9’

num

num

term’

term’

Example input:

7 + 3 - 2

Predictive Parsers (Lookahead)

7

num

term’

+

3

7

num

term’

+

num

3

-

term’


Predictive parsers lookahead2

term

term

  • term  num term’

  • term’  ‘+’ num term’ | ‘-’ num term’ | e

  • num  ‘0’|’1’|’2’|…|’9’

num

num

term’

term’

Example input:

7 + 3 - 2

Predictive Parsers (Lookahead)

num

term’

+

7

3

num

-

term’

2

num

term’

+

7

3

num

-

term’

e

2


Recursive decent parsing

Recursive-Decent Parsing

  • Top-down parsing algorithm

    • Consists of a group of methods (programs) parseN, one for each nonterminal symbol N of the grammar.

    • The task of each method parseN is to parse a single N-phrase

    • These parsing methods cooperate to parse complete sentences


Recursive decent parsing1

  • Sentence  Subject Verb Object.

  • Subject  I | a Noun | the Noun

  • Object  me | a Noun | the Noun

  • Noun  cat | mat | rat

  • Verb  like | is | see | sees

Example input:

the cat sees a rat

Recursive-Decent Parsing

Sentence

.

Verb

Subject

Object

the

cat

sees

a

rat

.

  • Decide which production rule to apply. Only one, #1.

  • This step created four stubs.


Recursive decent parsing2

  • Sentence  Subject Verb Object.

  • Subject  I | a Noun | the Noun

  • Object  me | a Noun | the Noun

  • Noun  cat | mat | rat

  • Verb  like | is | see | sees

Example input:

the cat sees a rat

Recursive-Decent Parsing

Sentence

.

Verb

Subject

Object

Noun

cat

sees

a

rat

the


Recursive decent parsing3

  • Sentence  Subject Verb Object.

  • Subject  I | a Noun | the Noun

  • Object  me | a Noun | the Noun

  • Noun  cat | mat | rat

  • Verb  like | is | see | sees

Example input:

the cat sees a rat

Recursive-Decent Parsing

Sentence

.

Verb

Subject

Object

Noun

cat

sees

a

rat

the


Recursive decent parsing4

  • Sentence  Subject Verb Object.

  • Subject  I | a Noun | the Noun

  • Object  me | a Noun | the Noun

  • Noun  cat | mat | rat

  • Verb  like | is | see | sees

Example input:

the cat sees a rat

Recursive-Decent Parsing

Sentence

.

Verb

Subject

Object

Noun

cat

sees

a

rat

the


Recursive decent parsing5

  • Sentence  Subject Verb Object.

  • Subject  I | a Noun | the Noun

  • Object  me | a Noun | the Noun

  • Noun  cat | mat | rat

  • Verb  like | is | see | sees

Example input:

the cat sees a rat

Recursive-Decent Parsing

Sentence

.

Verb

Subject

Object

Noun

Noun

cat

sees

a

rat

the


Recursive decent parsing6

  • Sentence  Subject Verb Object.

  • Subject  I | a Noun | the Noun

  • Object  me | a Noun | the Noun

  • Noun  cat | mat | rat

  • Verb  like | is | see | sees

Example input:

the cat sees a rat

Recursive-Decent Parsing

Sentence

.

Verb

Subject

Object

Noun

Noun

cat

sees

a

rat

the


Recursive decent parsing7

  • Sentence  Subject Verb Object.

  • Subject  I | a Noun | the Noun

  • Object  me | a Noun | the Noun

  • Noun  cat | mat | rat

  • Verb  like | is | see | sees

Example input:

the cat sees a rat

Recursive-Decent Parsing

Sentence

.

Verb

Subject

Object

Noun

Noun

cat

sees

a

rat

the


Recursive descent parser for micro english

Recursive-Descent Parser for Micro-English

  • Sentence  Subject Verb Object.

  • Subject  I | a Noun | the Noun

  • Object  me | a Noun | the Noun

  • Noun  cat | mat | rat

  • Verb  like | is | see | sees

ParseSentence

ParseSubject

ParseObject

ParseVerb

ParseNoun


Recursive descent parser for micro english1

Recursive-Descent Parser for Micro-English

  • Sentence  Subject Verb Object.

  • Subject  I | a Noun | the Noun

  • Object  me | a Noun | the Noun

  • Noun  cat | mat | rat

  • Verb  like | is | see | sees

ParseSentence

parseSubject

parseVerb

parseObject

parseEnd

Sentence 

Subject

Verb

Object

.


Recursive descent parser for micro english2

Recursive-Descent Parser for Micro-English

  • Sentence  Subject Verb Object.

  • Subject  I | a Noun | the Noun

  • Object  me | a Noun | the Noun

  • Noun  cat | mat | rat

  • Verb  like | is | see | sees

Subject 

ParseSubject

if input = “I”

accept

else if input =“a”

accept

parseNoun

else if input = “the”

accept

parseNoun

else error

I

|

a

Noun

|

the

Noun


Recursive descent parser for micro english3

Recursive-Descent Parser for Micro-English

  • Sentence  Subject Verb Object.

  • Subject  I | a Noun | the Noun

  • Object  me | a Noun | the Noun

  • Noun  cat | mat | rat

  • Verb  like | is | see | sees

ParseNoun

if input = “cat”

accept

else if input =“mat”

accept

else if input = “rat”

accept

else error

Noun 

cat

|

mat

|

rat


Recursive descent parser for micro english4

Recursive-Descent Parser for Micro-English

Object 

  • Sentence  Subject Verb Object.

  • Subject  I | a Noun | the Noun

  • Object  me | a Noun | the Noun

  • Noun  cat | mat | rat

  • Verb  like | is | see | sees

ParseObject

if input = “me”

accept

else if input =“a”

accept

parseNoun

else if input = “the”

accept

parseNoun

else error

me

|

a

Noun

|

the

Noun


Recursive descent parser for micro english5

Recursive-Descent Parser for Micro-English

Verb 

  • Sentence  Subject Verb Object.

  • Subject  I | a Noun | the Noun

  • Object  me | a Noun | the Noun

  • Noun  cat | mat | rat

  • Verb  like | is | see | sees

ParseVerb

if input = “like”

accept

else if input =“is”

accept

else if input = “see”

accept

else if input = “sees”

accept

else error

like

|

is

|

see

|

sees


Recursive descent parser for micro english6

Recursive-Descent Parser for Micro-English

  • Sentence  Subject Verb Object.

  • Subject  I | a Noun | the Noun

  • Object  me | a Noun | the Noun

  • Noun  cat | mat | rat

  • Verb  like | is | see | sees

.

ParseEnd

if input = “.”

accept

else error


Systematic development of a recursive descent parser

Systematic Development of a Recursive-Descent Parser

  • Given a (suitable) context-free grammar

    • Express the grammar in EBNF, with a single production rule for each nonterminal symbol, and perform any necessary grammar transformations

      • Always eliminate left recursion

      • Always left-factorize whenever possible

    • Transcribe each EBNF production rule N::=X to a parsing method parseN, whose body is determined by X

    • Make the parser consist of:

      • A private variable currentToken;

      • Private parsing methods developed in previous step

      • Private auxiliary methods accept and acceptIt, both of which call the scanner

      • A public parse method that calls parseS, where S is the start symbol of the grammar), having first called the scanner to store the first input token in currentToken


Quote of the week

Quote of the Week

  • “C makes it easy to shoot yourself in the foot; C++ makes it harder, but when you do, it blows away your whole leg.”

    • Bjarne Stroustrup


Quote of the week1

Quote of the Week

Did you really say that?

Dr. Bjarne Stroustrup:

Yes, I did say something along the lines of C makes it easy to shoot yourself in the foot; C++ makes it harder, but when you do, it blows your whole leg off. What people tend to miss is that what I said about C++ is to a varying extent true for all powerful languages. As you protect people from simple dangers, they get themselves into new and less obvious problems. Someone who avoids the simple problems may simply be heading for a not-so-simple one. One problem with very supporting and protective environments is that the hard problems may be discovered too late or be too hard to remedy once discovered. Also, a rare problem is harder to find than a frequent one because you don't suspect it.

I also said, "Within C++, there is a much smaller and cleaner language struggling to get out." For example, that quote can be found on page 207 of The Design and Evolution of C++. And no, that smaller and cleaner language is not Java or C#. The quote occurs in a section entitled "Beyond Files and Syntax". I was pointing out that the C++ semantics is much cleaner than its syntax. I was thinking of programming styles, libraries and programming environments that emphasized the cleaner and more effective practices over archaic uses focused on the low-level aspects of C.


Converting ebnf production rules to parsing methods

Converting EBNF Production Rules to Parsing Methods

  • For production rule N::=X

    • Convert production rule to parsing method named parseN

      • Private void parseN () {

      • Parse X

      • }

    • Refine parseE to a dummy statement

    • Refine parse t (where t is a terminal symbol) to accept(t) or acceptIt()

    • Refine parse N (where N is a non terminal symbol) to a call of the corresponding parsing method

      parseN()

    • Refine parse X Y to

      {

      parseX

      parseY

      }}

    • Refine parse X|Y

      Switch (currentToken.kind) {

      Cases in starter[[X]]

      Parse X

      Break;

      Cases in starters[[Y]]:

      Parse Y

      Break

      Default:

      Report a syntax error

      }


Converting ebnf production rules to parsing methods1

Converting EBNF Production Rules to Parsing Methods

  • For X | Y

    • Choose parse X only if the current token is one that can start an X-phrase

    • Choose parse Y only if the current token is one that can start an Y-phrase

      • starters[[X]] and starters[[Y]] must be disjoint

  • For X*

    • Choose

      while (currentToken.kind is in starters[[X]])

      • starter[[X]] must be disjoint from the set of tokens that can follow X* in this particular context


Converting ebnf production rules to parsing methods2

Converting EBNF Production Rules to Parsing Methods

  • A grammar that satisfies both these conditions is called an LL(1) grammar

  • Recursive-descent parsing is suitable only for LL(1) grammars


Error repair

Error Repair

  • Good programming languages are designed with a relatively large “distance” between syntactically correct programs, to increase the likelihood that conceptual mistakes are caught on syntactic errors.

  • Error repair usually occurs at two levels:

    • Local: repairs mistakes with little global import, such as missing semicolons and undeclared variables.

    • Scope: repairs the program text so that scopes are correct. Errors of this kind include unbalanced parentheses and begin/end blocks.


Error repair1

Error Repair

  • Repair actions can be divided into insertions and deletions. Typically the compiler will use some look ahead and backtracking in attempting to make progress in the parse. There is great variation among compilers, though some languages (PL/C) carry a tradition of good error repair. Goals of error repair are:

    • No input should cause the compiler to collapse

    • Illegal constructs are flagged

    • Frequently occurring errors are repaired gracefully

    • Minimal stuttering or cascading of errors.

LL-Style parsing lends itself well to error repair, since the compiler uses the grammar’s rules to predict what should occur next in the input


Triangle single command

Triangle single-Command

Single-Command ::= 

| V-name := Expression

| Identifier ( Actual-Parameter-Sequence )

| begin Command end

| let Declaration in single-Command

| if Expression then single-Command

else single-Command

| while Expression do single-Command

V-name ::= Identifier

| V-name . Identifier

| V-name [ Expression ]

Identifier :: = Letter (Letter | Digit)*

Letter ::= a|b|c|d|e|f|g|h|I|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z

|A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z

Digit :: = 0|1|2|3|4|5|6|7|8|9


Starter sets2

Starter Sets

  • Starter Set for RE

    • starters[[X]] is the string of terminal symbols that can start a string generated by X

  • Example

    starters[[single-Command]] =

    starters[[:=, (, begin, let, if, while]]

    • What about Vname vs Identifier?

      • Use the look ahead when encounter Identifier to look for := or (.


Mini triangle production rules

Mini-Triangle Production Rules

Program::=CommandProgram(1.14)

Command::=V-name := ExpressionAssignCommand(1.15a)

|Identifier ( Expression )CallCommand(1.15b)

|Command ; CommandSequentialCommand(1.15c)

|if Expression then CommandIfCommand(15.d)

else Command

|while Expression do CommandWhileCommand(1.15e

|let Declaration in CommandLetCommand(1.15f)

Expression::=Integer-LiteralIntegerExpression(1.16a)

|V-nameVnameExpression(1.16b)

|Operator ExpressionUnaryExpression(1.16c)

|Expression Operator ExpressionBinaryExpressioiun(1.16d)

V-name::=IdentifierSimpelVname(1.17)

Declaration::=const Identifier ~ ExpressionConstDeclaration(1.18a)

|var Identifier : Typoe-denoterVarDeclaration(1.18b)

|Declaration ; DeclarationSequentialDeclaration(1.18c)

Type-denoter::=IdentifierSimpleTypeDenoter(1.19)


Abstract syntax trees

Abstract Syntax Trees

  • An explicit representation of the source program’s phrase structure

  • AST for Mini-Triangle


Abstract syntax trees1

Abstract Syntax Trees

Program::=CommandProgram(1.14

  • Program ASTs (P):

Program

C

  • Command ASTs (C):

AssignCommand

CallCommand

SequentialCommand

E

V

E

Identifier

C2

C1

(1.15a)

(1.15b)

(1.15c)

spelling

Command::=V-name := ExpressionAssignCommand(1.15a)

|Identifier ( Expression )CallCommand(1.15b)

|Command ; CommandSequentialCommand(1.15c)


Abstract syntax trees2

Abstract Syntax Trees

  • Command ASTs (C):

WhileCommand

LetCommand

SequentialCommand

E

C

V

D

C2

E

C1

(1.15e)

(1.15f)

(1.15d)

Command::=|if Expression then CommandIfCommand(15.d)

else Command

|while Expression do CommandWhileCommand(1.15e

|let Declaration in CommandLetCommand(1.15f)


  • Login