context free grammars n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Context-Free Grammars PowerPoint Presentation
Download Presentation
Context-Free Grammars

Loading in 2 Seconds...

play fullscreen
1 / 38

Context-Free Grammars - PowerPoint PPT Presentation


  • 152 Views
  • Uploaded on

Context-Free Grammars. Lecture 7. http://webwitch.dreamhost.com/grammar.girl/. Outline . Scanner vs. parser Why regular expressions are not enough Grammars (context-free grammars) grammar rules derivations parse trees ambiguous grammars useful examples Reading: Sections 4.1 and 4.2.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Context-Free Grammars' - oni


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
context free grammars

Context-Free Grammars

Lecture 7

http://webwitch.dreamhost.com/grammar.girl/

outline
Outline
  • Scanner vs. parser
    • Why regular expressions are not enough
  • Grammars (context-free grammars)
    • grammar rules
    • derivations
    • parse trees
    • ambiguous grammars
    • useful examples
  • Reading:
    • Sections 4.1 and 4.2

CS 536 Fall 2000

the functionality of the parser
The Functionality of the Parser
  • Input: sequence of tokens from lexer
  • Output: parse tree of the program
    • parse tree is generated if the input is a legal program
    • if input is an illegal program, syntax errors are issued
  • Note:
    • Instead of parse tree, some parsers produce directly:
      • abstract syntax tree (AST) + symbol table (as in P3), or
      • intermediate code, or
      • object code
    • In the following lectures, we’ll assume that parse tree is generated.

CS 536 Fall 2000

example
Example
  • The program:
    • x * y + z
  • Input to parser:
    • ID TIMES ID PLUS ID
    • we’ll write tokens as follows:
    • id * id + id
  • Output of parser:
    • the parse tree 

E

E

+

E

E

*

E

id

id

id

CS 536 Fall 2000

why are regular expressions not enough
Why are regular expressions not enough?

TEST YOURSELF #1

  • Write an automaton that accepts strings
    • “a”, “(a)”, “((a))”, and “(((a)))”
    • “a”, “(a)”, “((a))”, “(((a)))”, … “(ka)k”

CS 536 Fall 2000

why are regular expressions not enough1
Why are regular expressions not enough?

TEST YOURSELF #2

  • What programs are generated by?

digit+ ( ( “+” | “-” | “*” | “/” ) digit+ )*

  • What important properties this regular expression fails to express?

CS 536 Fall 2000

what must parser do
What must parser do?
  • Recognizer: not all strings of tokens are programs
    • must distinguish between valid and invalid strings of tokens
  • Translator: must expose program structure
    • e.g., associativity and precedence
    • hence must return the parse tree

We need:

    • A language for describing valid strings of tokens
      • context-free grammars
      • (analogous to regular expressions in the scanner)
    • A method for distinguishing valid from invalid strings of tokens (and for building the parse tree)
      • the parser
      • (analogous to the state machine in the scanner)

CS 536 Fall 2000

we need context free grammars cfgs
We need context-free grammars (CFGs)
  • Example: Simple Arithmetic Expressions
    • In English:
      • An integer is an arithmetic expression.
      • If exp1 and exp2 are arithmetic expressions, then so are the following:

exp1 - exp2

exp1 / exp2

( exp1 )

  • the corresponding CFG: we’ll write tokens as follows:

exp  INTLITERAL E  intlit

exp  exp MINUS exp E  E - E

exp  exp DIVIDE exp E  E / E

exp  LPAREN exp RPAREN E  ( E )

CS 536 Fall 2000

reading the cfg
Reading the CFG
  • The grammar has five terminal symbols:
    • intlit, -, /, (, )
    • terminals of a grammar = tokens returned by the scanner.
  • The grammar has one non-terminal symbol:
    • E
    • non-terminals describe valid sequences of tokens
  • The grammar has four productions or rules,
    • each of the form: E 
      • left-hand side = a single non-terminal.
      • right-hand side = either
        • a sequence of one or more terminals and/or non-terminals, or
        •  (an empty production); again, the book uses symbol 

CS 536 Fall 2000

example revisited
Example, revisited
  • Note:
    • a more compact way to write previous grammar:

E  intlit | E - E | E / E | ( E )

or

E  intlit

| E - E

| E / E

| ( E )

CS 536 Fall 2000

a formal definition of cfgs
A formal definition of CFGs
  • A CFG consists of
    • A set of terminals T
    • A set of non-terminals N
    • A start symbolS (a non-terminal)
    • A set of productions:

CS 536 Fall 2000

notational conventions
Notational Conventions
  • In these lecture notes
    • Non-terminals are written upper-case
    • Terminals are written lower-case
    • The start symbol is the left-hand side of the first production

CS 536 Fall 2000

the language of a cfg
The Language of a CFG

The language defined by a CFG is the set of strings that can be derived from the start symbol of the grammar.

Derivation: Read productions as rules:

Means can be replaced by

CS 536 Fall 2000

derivation key idea
Derivation: key idea
  • Begin with a string consisting of the start symbol “S”
  • Replace any non-terminal Xin the string by a the right-hand side of some production
  • Repeat (2) until there are no non-terminals in the string

CS 536 Fall 2000

derivation an example
Derivation: an example

CFG:

E  id

E  E + E

E  E * E

E  ( E )

Is string id * id + id in the

language defined by the grammar?

derivation:

CS 536 Fall 2000

terminals
Terminals
  • Terminals are called because there are no rules for replacing them
  • Once generated, terminals are permanent
  • Therefore, terminals are the tokens of the language

CS 536 Fall 2000

the language of a cfg cont
The Language of a CFG (Cont.)

More formally, write

if there is a production

CS 536 Fall 2000

the language of a cfg cont1
The Language of a CFG (Cont.)

Write

if

in 0 or more steps

CS 536 Fall 2000

the language of a cfg1
The Language of a CFG

Let Gbe a context-free grammar with start symbol S. Then the language of G is:

CS 536 Fall 2000

examples
Examples

Strings of balanced parentheses

The grammar:

sameas

CS 536 Fall 2000

arithmetic example
Arithmetic Example

Simple arithmetic expressions:

Some elements of the language:

CS 536 Fall 2000

notes
Notes

The idea of a CFG is a big step. But:

  • Membership in a language is “yes” or “no”
    • we also need parse tree of the input!
    • furthermore, we must handle errors gracefully
  • Need an “implementation” of CFG’s,
    • i.e. the parser
    • we’ll create the parser using a parser generator
      • available generators: CUP, bison, yacc

CS 536 Fall 2000

more notes
More Notes
  • Form of the grammar is important
    • Many grammars generate the same language
    • Parsers are sensitive to the form of the grammar
  • Example:

E  E + E

| E – E

| intlit

is not suitable for an LL(1) parser (a common kind of parser).

Stay tuned, you will soon understand why.

CS 536 Fall 2000

derivations and parse trees
Derivations and Parse Trees

A derivation is a sequence of productions

A derivation can be drawn as a tree

  • Start symbol is the tree’s root
  • For a production add children to node

CS 536 Fall 2000

derivation example
Derivation Example
  • Grammar
  • String

CS 536 Fall 2000

derivation example cont
Derivation Example (Cont.)

E

E

+

E

E

*

E

id

id

id

CS 536 Fall 2000

derivation in detail 1
Derivation in Detail (1)

id * id + id

E

CS 536 Fall 2000

notes on derivations
Notes on Derivations
  • A parse tree has
    • Terminals at the leaves
    • Non-terminals at the interior nodes
  • An in-order traversal of the leaves is the original input
  • The parse tree shows the association of operations, the input string does not

CS 536 Fall 2000

left most and right most derivations
Left-most and Right-most Derivations
  • The example is a left-most derivation
    • At each step, replace the left-most non-terminal
  • There is an equivalent notion of a right-most derivation

CS 536 Fall 2000

right most derivation in detail 1
Right-most Derivation in Detail (1)

id * id + id

E

CS 536 Fall 2000

derivations and parse trees1
Derivations and Parse Trees
  • Note that right-most and left-most derivations have the same parse tree
  • The difference is the order in which branches are added

CS 536 Fall 2000

summary of derivations
Summary of Derivations
  • We are not just interested in whether s e L(G)
    • We need a parse tree for s, (because we need to build the AST)
  • A derivation defines a parse tree
    • But one parse tree may have many derivations
  • Left-most and right-most derivations are important in parser implementation

CS 536 Fall 2000

ambiguity
Ambiguity
  • Grammar
  • String

CS 536 Fall 2000

ambiguity cont
Ambiguity (Cont.)

This string has two parse trees

E

E

E

+

E

E

E

*

E

E

id

id

E

+

E

*

id

id

id

id

CS 536 Fall 2000

test yourself 3
TEST YOURSELF #3

Question 1:

  • for each of the two parse trees, find the corresponding left-most derivation

Question 2:

  • for each of the two parse trees, find the corresponding right-most derivation

CS 536 Fall 2000

ambiguity cont1
Ambiguity (Cont.)
  • A grammar is ambiguous if for some string(the following three conditions are equivalent)
    • it has more than one parse tree
    • if there is more than one right-most derivation
    • if there is more than one left-most derivation
  • Ambiguity is BAD
    • Leaves meaning of some programs ill-defined

CS 536 Fall 2000

dealing with ambiguity
Dealing with Ambiguity
  • There are several ways to handle ambiguity
  • Most direct method is to rewrite grammar unambiguously
  • Enforces precedence of * over +

CS 536 Fall 2000