grammars constituency and order n.
Skip this Video
Loading SlideShow in 5 Seconds..
Grammars, constituency and order PowerPoint Presentation
Download Presentation
Grammars, constituency and order

Loading in 2 Seconds...

play fullscreen
1 / 32

Grammars, constituency and order - PowerPoint PPT Presentation

  • Uploaded on

Grammars, constituency and order. A gramma r describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment of English might say that a legal sentence consists of a noun phrase (subject), followed by a verb phrase (predicate).

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Grammars, constituency and order' - myra

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
grammars constituency and order
Grammars, constituency and order
  • A grammar describes the legal strings of a language in terms of constituency and order.
  • For example, a grammar for a fragment of English might say that a legal sentence consists of
    • a noun phrase (subject),
    • followed by a verb phrase (predicate).
  • This rule is commonly written as
    • S → NP VP
constituents of constituents
Constituents of constituents
  • The constituents of constituents may be described by other rules.
  • They would refine, for example, the initial decomposition
    • [the dog] [chased a cat]
  • into a complete decomposition
    • [[the] [dog]] [[chased] [[a] [cat]]]
  • according to the following rules:
grammar rules for a fragment of english
Grammar rules for a fragment of English
  • S → NP VP
  • NP → Det N
  • VP → V NP
  • N → dog
  • N → cat
  • Det → the
  • Det → a
  • V → chased
parse trees derivation trees
Parse trees (derivation trees)
  • Hierarchical decomposition of sentences are more commonly expressed by special trees, known as parse trees or derivation trees.
  • For our sample sentence, we would have the parse tree below
parse tree for an english sentence
Parse tree for an English sentence


/ \


/ \ / \

Det N V NP

| | | / \

the dog chased Det N

| |

a cat

sentences generated by grammars
Sentences generated by grammars
  • The grammar with the rules above would also allow, or generate, sentences like
    • a dog chased a cat
    • the dog chased a dog
    • a cat chased the dog

since parse trees could be constructed for

these sentences.

context free grammars cfgs
Context-free grammars (CFGs)
  • In the example above, the alphabet Σ consisted of the set of English words.
  • A grammar also needs to specify symbols aside from Σ, and rules, so more precisely …
cfgs defined
CFGs defined
  • … a context-free grammar (CFG) consists of
    • a set T of terminal symbols (analogous to Σ)
    • a set V of other variables (or nonterminals)
    • a start symbol S, which is a member of V
    • a collection P of rules (or productions), each with
      • a left-hand side (LHS) from V, and
      • a right-hand side (RHS) from (V U T)*
context freedom
Context freedom
  • The notion of context freedom means that any category may be expanded in accordance with the rules no matter where it appears.
  • So for example, the noun phrases that are legal subjects are the same as those that are legal objects
    • that is, are NPs in the V → NP rule.
rules for a cfg for l 0 1 2
Rules for a CFG for L(0(1+2)*)
  • S → 0X
  • X → l
  • X → YX
  • Y → 1
  • Y → 2
cfgs for palindromes
CFGs for palindromes
  • A CFG for even-length palindromes over {0,1}:
    • S → l | 0S0 | 1S1
  • A CFG for odd-length palindromes over {0,1}:
    • S → 0 | 1 | 0S0 | 1S1
  • Here we use the common convention allowing several rules with the same LHS to be combined into one, with vertical bars separating the RHSs.
rules for other 1 variable cfgs
Rules for other 1-variable CFGs
  • for all palindromes over {0,1}:
    • S → l | 0 | 1 | 0S0 | 1S1
  • for nonempty sequences of balanced parentheses:
    • S → ( ) | ( S ) | SS
  • for {0n1n | n ≥ 0}
    • S → l | 0S1
  • for { x e {0,1} | x has as many 0's as 1's}
    • S → l | 0S1 | 1S0 | SS
parse trees and grammars
Parse trees and grammars
  • A parse tree is legal for a CFG iff it satisfies each correspondence:
    • root ↔ start symbol
    • parent node ↔ LHS of a grammar rule
    • child node ↔ symbol from the RHS of a rule whose LHS is the parent node
    • leaf ↔ terminal symbol (or l)
  • Also, the ordering of children of a node must match the ordering of the RHS symbols in the corresponding rule.
partial parse derivation trees
Partial parse (derivation) trees
  • It's convenient to allow representation of the progress of a parse by allowing leaves to be labeled by a nonterminal symbol (and perhaps ignoring the constraint on roots)
  • In any case, the left to right sequence of leaf labels (ignoring those labeled by l) is called the yield of the parse tree
    • so the yield is a string of terminals
notational conventions
Notational conventions
  • Lower case letters are interpreted as for DFAs
    • those near the beginning of the alphabet represent terminals; those near the end of the alphabet represent strings
  • Capital letters represent nonterminals (variables)
  • Greek letters represent strings of variables and terminals
    • so a generic rule looks like A → g
derivations and rewrite rules
Derivations and rewrite rules
  • CFG rules are also rewrite rules.
  • Here the rule S → NP VP would allow rewriting of S as NP VP
  • Intuitively, G generates a string x iff x can be derived from S by repeated rewriting
  • For example, we get the legal derivation

S => NP VP => Det N VP => the N VP =>

the dog VP => the dog V NP => the dog chased NP

=> the dog chased Det N => the dog chased a N

=> the dog chased a cat

leftmost and rightmost derivations
Leftmost and rightmost derivations
  • For every parse tree there are unique leftmost and rightmost derivations
  • The rightmost derivation corresponding to the parse tree above is
    • S => NP VP => NP V NP => NP V Det N =>

NP V Det cat => NP V a cat => NP chased a cat => Det N chased a cat => Det dog chased a cat => the dog chased a cat

derivations and parse trees
Derivations and parse trees
  • All but the simplest parse trees will have other associated derivations besides the leftmost and rightmost.
  • For every derivation there is a unique associated parse tree.
derivations and sentential forms
Derivations and sentential forms
  • The => relation used above can be defined precisely by saying that
    • aAb => agb iff there is a rule A -> g in G
      • we may subscript the => symbol by G if there’s doubt about which grammar is being used.
  • Then using the symbol =>* for the (recursive) transitive closure of the => relation, we say
    • a sentential form for G is a string a from V U T such that S =>* a
context free languages cfls
Context-free languages (CFLs)
  • Fact: A CFG G with start symbol S licenses a parse tree for w iff S =>* w
  • Def) L(G) (the language generated by G) is

{x | G generates x}, or equivalently

{x | G’s start symbol derives x}, or

{x ε T* | x is a sentential form for G},

  • A language generated by a context-free grammar is called a context-free language
ambiguous grammars
Ambiguous grammars
  • Here’s a 1-variable CFG for a subset of algebraic expressions:
    • E → x | y | E+E | E*E | (E)
  • Note that this grammar allows multiple parse trees for some strings, like x+y*y.
  • A grammar with this property is said to be ambiguous.
an unambiguous grammar for algebraic expressions
An unambiguous grammar for algebraic expressions
  • Rules for an unambiguous grammar for the above language are given below:
    • E → E + T | T
    • T → T * F | F
    • F → x | y | ( E )
inherent ambiguity
Inherent ambiguity
  • Ambiguity is common in natural languages.
    • But we don't want it in programming languages!
  • Often ambiguity can be removed.
    • i.e., a grammar can be replaced by an unambiguous one, as seen above
  • But there are languages for which all grammars are ambiguous.
  • These languages are said to be inherently ambiguous.
regular languages and cfls
Regular languages and CFLs
  • We’ve already seen examples of CFLs that aren’t regular languages
  • But it's fairly easy to show that all regular languages are context-free.
  • The languages {a}, {l}, and f have grammars with respective productions
    • S → a
    • S → l
    • [no productions]
all regular languages are cfls
All regular languages are CFLs
  • Suppose L1 and L2 have respective start symbols S1 and S2.
  • Then we may get grammars with start symbol S for their union, for their concatenation, and for L1* by adding the respective productions
    • S → S1 | S2
    • S → S1S2
    • S → l | S1S2
  • So all regular languages are CFLs
grammars for regular languages
Grammars for regular languages
  • Any regular language can be generated by a special type of CFG.
  • Def) A right-linear grammar is a CFG where the RHS of each rule has the form xB or x,
    • for x ε T* and B ε V
  • Fact: Right-linear grammars generate all and only regular languages
finding a grammar for a regular language
Finding a grammar for a regular language
  • For a DFA M, consider the grammar G with
    • T = S, V = Q and S = q0
    • a rule qi → ajqk for each aj move from qi to qk
    • a rule qi → aj for each aj move from qi to qk where qk ε F
  • An easy induction shows that d*(q,x) = p iff

q =>* xp

    • and that d*(q,x) = p and p ε F iff q =>* x
  • So L(G) = L(M)
dfas for right linear grammars
DFAs for right-linear grammars
  • Conversely, let G be a right-linear grammar
  • If all strings x on RHSs have length 1, then the construction above can be reversed
    • and the proof above still holds
  • If not, then the construction can be modified by adding extra states as in Linz, pp. 91-2
  • In either case a DFA can be obtained for L(G)
regular grammars
Regular grammars
  • Left-linear grammars may be defined by analogy with right-linear grammars
    • every rule must have a RHS of the form Bx or x
  • Fact: Left-linear grammars generate all and only regular languages
  • A CFG is a regular grammar iff it is right-linear or left-linear
    • so a language has a regular grammar iff it is regular
backus naur form bnf
Backus-Naur form (BNF)
  • Grammars for programming languages generally use a variant of our CFG notation called BNF.
  • In BNF the symbol ::= is used instead of the rightward pointing arrow.
  • In BNF, terminal symbols may be given in bold face, or nonterminals may be delimited by angle brackets, e.g.
    • <identifier> ::= <letter> <digits>
common bnf conventions
Common BNF conventions
  • The vertical bar convention
  • [ ] brackets
    • for optionality (0 or 1 times)
  • { } braces
    • for indefinite repetition (0 or more times)
  • ( ) parentheses
    • for removing ambiguity, e.g., (a|b)c vs. a | bc
a sample grammar in bnf
A sample grammar in BNF
  • <conditional> ::=
  • if <test> then <block> [ else <block> ] endif
  • <block> ::= begin [<statements>] end
  • <statements> ::= { <statement> }
  • <test> ::= <var> <op> <var>
  • <statement> ::= <var> = <var>
  • <var> ::= x | y
  • <op> ::= = | /=