- By
**kuper** - Follow User

- 100 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Parsing' - kuper

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Parsing

### Parsing

Programming Language Concepts

Lecture 6

Prepared by

Manuel E. Bermúdez, Ph.D.

Associate Professor

University of Florida

Context-Free Grammars

- Definition: A context-free grammar (CFG) is a quadrupleG = (, , P, S),where all productions are of the formA →, for A and (u )*.
- Re-writing using grammar rules:
- βAγ => βγif A → (derivation).

String Derivations

- Left-most derivation: At each step, the left-most nonterminal is re-written.
- Right-most derivation: At each step, the right-most nonterminal is re-written.

Derivation Trees

Derivation trees:

Describe re-writes, independently of the order (left-most or right-most).

- Each tree branch matches a production rule in the grammar.

Derivation Trees

Notes:

- Leaves are terminals.
- Bottom contour is the sentence.
- Left recursion causes left branching.
- Right recursion causes right branching.

Goal of Parsing

- Examine input string, determine whether it's legal.
- Equivalent to building derivation tree.
- Added benefit: tree embodies syntactic structure of input.
- Therefore, tree should be unique.

Ambiguous Grammars

- Definition: A CFG is ambiguous if there exist two different right-most (or left-most, but not both) derivations for some sentence z.
- (Equivalent) Definition: A CFG is ambiguous if there exist two different derivation trees for some sentence z.

Ambiguous Grammars

Classic ambiguities:

- Simultaneous left/right recursion:
E → E + E

→ i

- Dangling else problem:
S → if E then S

→ if E then S else S

→

Operator Precedence and Associativity

- Let’s build a CFG for expressions consisting of:
- elementary identifier i.
- +and - (binary ops) have lowest precedence, and are left associative .
- * and / (binary ops) have middle precedence, and are right associative.
- + and - (unary ops) have highest precedence, and are right associative.

Corresponding Grammar for Expressions

E → E + TE consists of T's,

→ E - Tseparated by –’s and +'s

→ T(lowest precedence).

T → F * TT consists of F's,

→ F / Tseparated by *'s and /'s

→ F(next precedence).

F → - FF consists of a single P,

→ + Fpreceded by +'s and -'s.

→ P(next precedence).

P → '(' E ')'P consists of a parenthesized E,

→ i or a single i(highest precedence).

Operator Precedence and Associativity

- Operator precedence:
- The lower in the grammar, the higher the precedence.

- Operator Associativity:
- Tie breaker for precedence.
- Left recursion in the grammar means
- left associativity of the operator,
- left branching in the tree.

- Right recursion in the grammar means
- right associativity of the operator,
- right branching in the tree.

Building Derivation Trees

Sample Input :

- + i - i * ( i + i ) / i + i

(Human) derivation tree construction:

- Bottom-up.
- On each pass, scan entire expression, process operators with highest precedence (parentheses are highest).
- Lowest precedence operators are last, at the top of tree.

Abstract Syntax Trees

- AST is a condensed version of the derivation tree.
- No noise (intermediate nodes).
- String-to-tree transduction grammar:
- rules of the form A → ω => 's'.

- Build 's' tree node, with one child per tree from each nonterminal in ω.

Example

E → E + T => +

→ E - T => -

→ T

T → F * T => *

→ F / T => /

→ F

F → - F => neg

→ + F => +

→ P

P → '(' E ')'

→ i => i

Sample Input :- + i - i * ( i + i ) / i + i

String-to-Tree Transduction

- We transduce from vocabulary of input symbols, to vocabulary of tree node names.
- Could eliminate construction of unary + node, anticipating semantics.
F → - F => neg

→ + F // no more unary +node

→ P

The Game of Syntactic Dominoes

- The grammar:
E → E+T T → P*T P → (E)

→ T → P →i

- The playing pieces: An arbitrary supply of each piece (one per grammar rule).
- The game board:
- Start domino at the top.
- Bottom dominoes are the "input."

The Game of Syntactic Dominoes

- Game rules:
- Add game pieces to the board.
- Match the flat parts and the symbols.
- Lines are infinitely elastic.

- Object of the game:
- Connect start domino with the input dominoes.
- Leave no unmatched flat parts.

Parsing Strategies

- Same as for the game of syntactic dominoes.
- “Top-down” parsing: start at the start symbol, work toward the input string.
- “Bottom-up” parsing: start at the input string, work towards the goal symbol.

- In either strategy, can process the input left-to-right or right-to-left

Top-Down Parsing

- Attempt a left-most derivation, by predicting the re-write that will match the remaining input.
- Use a string (a stack, really) from which the input can be derived.

Top-Down Parsing

Start with S on the stack.

At every step, two alternatives:

- (the stack) begins with a terminal t. Match t against the first input symbol.
- begins with a nonterminal A. Consult an OPF (Omniscient Parsing Function) to determine which production for A would lead to a match with the first symbol of the input.
The OPF does the “predicting” in such a predictive parser.

Classical Top-Down Parsing Algorithm

Push (Stack, S);

while not Empty (Stack) do

if Top(Stack)

then if Top(Stack) = Head(input)

then input := tail(input)

Pop(Stack)

else error (Stack, input)

else P:= OPF (Stack, input)

Push (Pop(Stack), RHS(P))

od

Top-Down Parsing

- Most parsing methods impose bounds on the amount of stack lookback and input lookahead. For programming languages, a common choice is (1,1).
- We must define OPF (A,t), where A is the top element of the stack, and t is the first symbol on the input.
- Storage requirements: O(n2), where n is the size of the grammar vocabulary
(a few hundred).

LL(1) Grammars

Definition:

A CFG G is LL(1) (Left-to-right, Left-most, one-symbol lookahead)

iff for all A, and for allA→, A→, ,

Select (A → ) ∩ Select (A → ) =

- Previous example: Grammar is not LL(1).
- More later on why, and what do to about it.

Example:

S → A {b,}

A → bAd {b}

→ {d, }

Disjoint!

Grammar is LL(1)!

(At most) one production per entry.

Programming Language Concepts

Lecture 6

Prepared by

Manuel E. Bermúdez, Ph.D.

Associate Professor

University of Florida

Download Presentation

Connecting to Server..