Parsing

1 / 33

# Parsing - PowerPoint PPT Presentation

Parsing. Programming Language Concepts Lecture 6. Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida. Context-Free Grammars.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Parsing' - kuper

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Parsing

Programming Language Concepts

Lecture 6

Prepared by

Manuel E. Bermúdez, Ph.D.

Associate Professor

University of Florida

Context-Free Grammars
• Definition: A context-free grammar (CFG) is a quadrupleG = (, , P, S),where all productions are of the formA →, for A   and   (u )*.
• Re-writing using grammar rules:
• βAγ => βγif A → (derivation).
String Derivations
• Left-most derivation: At each step, the left-most nonterminal is re-written.
• Right-most derivation: At each step, the right-most nonterminal is re-written.
Derivation Trees

Derivation trees:

Describe re-writes, independently of the order (left-most or right-most).

• Each tree branch matches a production rule in the grammar.
Derivation Trees

Notes:

• Leaves are terminals.
• Bottom contour is the sentence.
• Left recursion causes left branching.
• Right recursion causes right branching.
Goal of Parsing
• Examine input string, determine whether it\'s legal.
• Equivalent to building derivation tree.
• Added benefit: tree embodies syntactic structure of input.
• Therefore, tree should be unique.
Ambiguous Grammars
• Definition: A CFG is ambiguous if there exist two different right-most (or left-most, but not both) derivations for some sentence z.
• (Equivalent) Definition: A CFG is ambiguous if there exist two different derivation trees for some sentence z.
Ambiguous Grammars

Classic ambiguities:

• Simultaneous left/right recursion:

E → E + E

→ i

• Dangling else problem:

S → if E then S

→ if E then S else S

Operator Precedence and Associativity
• Let’s build a CFG for expressions consisting of:
• elementary identifier i.
• +and - (binary ops) have lowest precedence, and are left associative .
• * and / (binary ops) have middle precedence, and are right associative.
• + and - (unary ops) have highest precedence, and are right associative.
Corresponding Grammar for Expressions

E → E + TE consists of T\'s,

→ E - Tseparated by –’s and +\'s

→ T(lowest precedence).

T → F * TT consists of F\'s,

→ F / Tseparated by *\'s and /\'s

→ F(next precedence).

F → - FF consists of a single P,

→ + Fpreceded by +\'s and -\'s.

→ P(next precedence).

P → \'(\' E \')\'P consists of a parenthesized E,

→ i or a single i(highest precedence).

Operator Precedence and Associativity
• Operator precedence:
• The lower in the grammar, the higher the precedence.
• Operator Associativity:
• Tie breaker for precedence.
• Left recursion in the grammar means
• left associativity of the operator,
• left branching in the tree.
• Right recursion in the grammar means
• right associativity of the operator,
• right branching in the tree.
Building Derivation Trees

Sample Input :

- + i - i * ( i + i ) / i + i

(Human) derivation tree construction:

• Bottom-up.
• On each pass, scan entire expression, process operators with highest precedence (parentheses are highest).
• Lowest precedence operators are last, at the top of tree.
Abstract Syntax Trees
• AST is a condensed version of the derivation tree.
• No noise (intermediate nodes).
• String-to-tree transduction grammar:
• rules of the form A → ω => \'s\'.
• Build \'s\' tree node, with one child per tree from each nonterminal in ω.
Example

E → E + T => +

→ E - T => -

→ T

T → F * T => *

→ F / T => /

→ F

F → - F => neg

→ + F => +

→ P

P → \'(\' E \')\'

→ i => i

String-to-Tree Transduction
• We transduce from vocabulary of input symbols, to vocabulary of tree node names.
• Could eliminate construction of unary + node, anticipating semantics.

F → - F => neg

→ + F // no more unary +node

→ P

The Game of Syntactic Dominoes
• The grammar:

E → E+T T → P*T P → (E)

→ T → P →i

• The playing pieces: An arbitrary supply of each piece (one per grammar rule).
• The game board:
• Start domino at the top.
• Bottom dominoes are the "input."
The Game of Syntactic Dominoes
• Game rules:
• Add game pieces to the board.
• Match the flat parts and the symbols.
• Lines are infinitely elastic.
• Object of the game:
• Connect start domino with the input dominoes.
• Leave no unmatched flat parts.
Parsing Strategies
• Same as for the game of syntactic dominoes.
• “Top-down” parsing: start at the start symbol, work toward the input string.
• “Bottom-up” parsing: start at the input string, work towards the goal symbol.
• In either strategy, can process the input left-to-right  or right-to-left 
Top-Down Parsing
• Attempt a left-most derivation, by predicting the re-write that will match the remaining input.
• Use a string (a stack, really) from which the input can be derived.
Top-Down Parsing

At every step, two alternatives:

•  (the stack) begins with a terminal t. Match t against the first input symbol.
•  begins with a nonterminal A. Consult an OPF (Omniscient Parsing Function) to determine which production for A would lead to a match with the first symbol of the input.

The OPF does the “predicting” in such a predictive parser.

Classical Top-Down Parsing Algorithm

Push (Stack, S);

while not Empty (Stack) do

if Top(Stack) 

then input := tail(input)

Pop(Stack)

else error (Stack, input)

else P:= OPF (Stack, input)

Push (Pop(Stack), RHS(P))

od

Top-Down Parsing
• Most parsing methods impose bounds on the amount of stack lookback and input lookahead. For programming languages, a common choice is (1,1).
• We must define OPF (A,t), where A is the top element of the stack, and t is the first symbol on the input.
• Storage requirements: O(n2), where n is the size of the grammar vocabulary

(a few hundred).

LL(1) Grammars

Definition:

A CFG G is LL(1) (Left-to-right, Left-most, one-symbol lookahead)

iff for all A, and for allA→, A→,   ,

Select (A → ) ∩ Select (A → ) = 

• Previous example: Grammar is not LL(1).
• More later on why, and what do to about it.
Example:

S → A {b,}

→ {d, }

Disjoint!

Grammar is LL(1)!

(At most) one production per entry.

### Parsing

Programming Language Concepts

Lecture 6

Prepared by

Manuel E. Bermúdez, Ph.D.

Associate Professor

University of Florida