- By
**myra** - Follow User

- 115 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Grammars, constituency and order' - myra

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Grammars, constituency and order

- A grammar describes the legal strings of a language in terms of constituency and order.
- For example, a grammar for a fragment of English might say that a legal sentence consists of
- a noun phrase (subject),
- followed by a verb phrase (predicate).
- This rule is commonly written as
- S → NP VP

Constituents of constituents

- The constituents of constituents may be described by other rules.
- They would refine, for example, the initial decomposition
- [the dog] [chased a cat]
- into a complete decomposition
- [[the] [dog]] [[chased] [[a] [cat]]]
- according to the following rules:

Grammar rules for a fragment of English

- S → NP VP
- NP → Det N
- VP → V NP
- N → dog
- N → cat
- Det → the
- Det → a
- V → chased

Parse trees (derivation trees)

- Hierarchical decomposition of sentences are more commonly expressed by special trees, known as parse trees or derivation trees.
- For our sample sentence, we would have the parse tree below

Parse tree for an English sentence

S

/ \

NP VP

/ \ / \

Det N V NP

| | | / \

the dog chased Det N

| |

a cat

Sentences generated by grammars

- The grammar with the rules above would also allow, or generate, sentences like
- a dog chased a cat
- the dog chased a dog
- a cat chased the dog

since parse trees could be constructed for

these sentences.

Context-free grammars (CFGs)

- In the example above, the alphabet Σ consisted of the set of English words.
- A grammar also needs to specify symbols aside from Σ, and rules, so more precisely …

CFGs defined

- … a context-free grammar (CFG) consists of
- a set T of terminal symbols (analogous to Σ)
- a set V of other variables (or nonterminals)
- a start symbol S, which is a member of V
- a collection P of rules (or productions), each with
- a left-hand side (LHS) from V, and
- a right-hand side (RHS) from (V U T)*

Context freedom

- The notion of context freedom means that any category may be expanded in accordance with the rules no matter where it appears.
- So for example, the noun phrases that are legal subjects are the same as those that are legal objects
- that is, are NPs in the V → NP rule.

Rules for a CFG for L(0(1+2)*)

- S → 0X
- X → l
- X → YX
- Y → 1
- Y → 2

CFGs for palindromes

- A CFG for even-length palindromes over {0,1}:
- S → l | 0S0 | 1S1
- A CFG for odd-length palindromes over {0,1}:
- S → 0 | 1 | 0S0 | 1S1
- Here we use the common convention allowing several rules with the same LHS to be combined into one, with vertical bars separating the RHSs.

Rules for other 1-variable CFGs

- for all palindromes over {0,1}:
- S → l | 0 | 1 | 0S0 | 1S1
- for nonempty sequences of balanced parentheses:
- S → ( ) | ( S ) | SS
- for {0n1n | n ≥ 0}
- S → l | 0S1
- for { x e {0,1} | x has as many 0's as 1's}
- S → l | 0S1 | 1S0 | SS

Parse trees and grammars

- A parse tree is legal for a CFG iff it satisfies each correspondence:
- root ↔ start symbol
- parent node ↔ LHS of a grammar rule
- child node ↔ symbol from the RHS of a rule whose LHS is the parent node
- leaf ↔ terminal symbol (or l)
- Also, the ordering of children of a node must match the ordering of the RHS symbols in the corresponding rule.

Partial parse (derivation) trees

- It's convenient to allow representation of the progress of a parse by allowing leaves to be labeled by a nonterminal symbol (and perhaps ignoring the constraint on roots)
- In any case, the left to right sequence of leaf labels (ignoring those labeled by l) is called the yield of the parse tree
- so the yield is a string of terminals

Notational conventions

- Lower case letters are interpreted as for DFAs
- those near the beginning of the alphabet represent terminals; those near the end of the alphabet represent strings
- Capital letters represent nonterminals (variables)
- Greek letters represent strings of variables and terminals
- so a generic rule looks like A → g

Derivations and rewrite rules

- CFG rules are also rewrite rules.
- Here the rule S → NP VP would allow rewriting of S as NP VP
- Intuitively, G generates a string x iff x can be derived from S by repeated rewriting
- For example, we get the legal derivation

S => NP VP => Det N VP => the N VP =>

the dog VP => the dog V NP => the dog chased NP

=> the dog chased Det N => the dog chased a N

=> the dog chased a cat

Leftmost and rightmost derivations

- For every parse tree there are unique leftmost and rightmost derivations
- The rightmost derivation corresponding to the parse tree above is
- S => NP VP => NP V NP => NP V Det N =>

NP V Det cat => NP V a cat => NP chased a cat => Det N chased a cat => Det dog chased a cat => the dog chased a cat

Derivations and parse trees

- All but the simplest parse trees will have other associated derivations besides the leftmost and rightmost.
- For every derivation there is a unique associated parse tree.

Derivations and sentential forms

- The => relation used above can be defined precisely by saying that
- aAb => agb iff there is a rule A -> g in G
- we may subscript the => symbol by G if there’s doubt about which grammar is being used.
- Then using the symbol =>* for the (recursive) transitive closure of the => relation, we say
- a sentential form for G is a string a from V U T such that S =>* a

Context-free languages (CFLs)

- Fact: A CFG G with start symbol S licenses a parse tree for w iff S =>* w
- Def) L(G) (the language generated by G) is

{x | G generates x}, or equivalently

{x | G’s start symbol derives x}, or

{x ε T* | x is a sentential form for G},

- A language generated by a context-free grammar is called a context-free language

Ambiguous grammars

- Here’s a 1-variable CFG for a subset of algebraic expressions:
- E → x | y | E+E | E*E | (E)
- Note that this grammar allows multiple parse trees for some strings, like x+y*y.
- A grammar with this property is said to be ambiguous.

An unambiguous grammar for algebraic expressions

- Rules for an unambiguous grammar for the above language are given below:
- E → E + T | T
- T → T * F | F
- F → x | y | ( E )

Inherent ambiguity

- Ambiguity is common in natural languages.
- But we don't want it in programming languages!
- Often ambiguity can be removed.
- i.e., a grammar can be replaced by an unambiguous one, as seen above
- But there are languages for which all grammars are ambiguous.
- These languages are said to be inherently ambiguous.

Regular languages and CFLs

- We’ve already seen examples of CFLs that aren’t regular languages
- But it's fairly easy to show that all regular languages are context-free.
- The languages {a}, {l}, and f have grammars with respective productions
- S → a
- S → l
- [no productions]

All regular languages are CFLs

- Suppose L1 and L2 have respective start symbols S1 and S2.
- Then we may get grammars with start symbol S for their union, for their concatenation, and for L1* by adding the respective productions
- S → S1 | S2
- S → S1S2
- S → l | S1S2
- So all regular languages are CFLs

Grammars for regular languages

- Any regular language can be generated by a special type of CFG.
- Def) A right-linear grammar is a CFG where the RHS of each rule has the form xB or x,
- for x ε T* and B ε V
- Fact: Right-linear grammars generate all and only regular languages

Finding a grammar for a regular language

- For a DFA M, consider the grammar G with
- T = S, V = Q and S = q0
- a rule qi → ajqk for each aj move from qi to qk
- a rule qi → aj for each aj move from qi to qk where qk ε F
- An easy induction shows that d*(q,x) = p iff

q =>* xp

- and that d*(q,x) = p and p ε F iff q =>* x
- So L(G) = L(M)

DFAs for right-linear grammars

- Conversely, let G be a right-linear grammar
- If all strings x on RHSs have length 1, then the construction above can be reversed
- and the proof above still holds
- If not, then the construction can be modified by adding extra states as in Linz, pp. 91-2
- In either case a DFA can be obtained for L(G)

Regular grammars

- Left-linear grammars may be defined by analogy with right-linear grammars
- every rule must have a RHS of the form Bx or x
- Fact: Left-linear grammars generate all and only regular languages
- A CFG is a regular grammar iff it is right-linear or left-linear
- so a language has a regular grammar iff it is regular

Backus-Naur form (BNF)

- Grammars for programming languages generally use a variant of our CFG notation called BNF.
- In BNF the symbol ::= is used instead of the rightward pointing arrow.
- In BNF, terminal symbols may be given in bold face, or nonterminals may be delimited by angle brackets, e.g.
- <identifier> ::= <letter> <digits>

Common BNF conventions

- The vertical bar convention
- [ ] brackets
- for optionality (0 or 1 times)
- { } braces
- for indefinite repetition (0 or more times)
- ( ) parentheses
- for removing ambiguity, e.g., (a|b)c vs. a | bc

A sample grammar in BNF

- <conditional> ::=
- if <test> then <block> [ else <block> ] endif
- <block> ::= begin [<statements>] end
- <statements> ::= { <statement> }
- <test> ::= <var> <op> <var>
- <statement> ::= <var> = <var>
- <var> ::= x | y
- <op> ::= = | /=

Download Presentation

Connecting to Server..