1 / 66

Context-Free Grammars

Context-Free Grammars. Strings. A string is a finite sequence, possibly empty, of symbols drawn from some alphabet . •  or  or “” is the empty string. •  * is the set of all possible strings over an alphabet . Languages.

galvans
Download Presentation

Context-Free Grammars

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Context-Free Grammars

  2. Strings A string is a finite sequence, possibly empty, of symbols drawn from some alphabet . •  or  or “” is the empty string. • * is the set of all possible strings over an alphabet .

  3. Languages A language is a (possibly infinite) set of finite length strings over a finite alphabet.

  4. Languages and Machines

  5. Regular Languages A Finite State Machine to accept a*b*: An FSM to accept AnBn = {anbn : n 0}

  6. Regular Expressions (a | b)* abba (a | b)* \b[A-Za-z0-9_%-]+@[A-Za-z0-9_%-]+ (\.[A-Za-z]+){1,4}\b

  7. Floating Point Numbers Example strings: +3.0, 3.0, 0.3E1, 0.3E+1, -0.3E+1, -3E8 The language is accepted by the FSM: A regular expression:

  8. When We Need More Power (())()  ()( 

  9. Rewrite Systems and Grammars A rewrite system (or production system or rule-based system) is: ● a list of rules, and ● an algorithm for applying them. Each rule has a left-hand side (lhs) and a right hand side (rhs).

  10. Grammars A grammar is a production system designed to describe a language. So it has: ● a list of rules, and ● an algorithm for applying them. Each rule has a left-hand side (lhs) and a right hand side (rhs). Example rules: SaSb aS aSbbSabSa

  11. A Grammar Interpreter generate(G: grammar, w: initial string) = 1. Set working-string to w. 2. Until no working symbols remain do: Match the lhs of some rule against some part of working-string. Replace the matched part of working-string with the rhs of the rule that was matched. 3. Return working-string.

  12. Context-Free Grammars No restrictions on the form of the right hand sides. SabDeFGab But require single non-terminal on left hand side. S not ASB Generally a unique starting symbol is included as part of the grammar definition. Also, generally, some symbols are marked as terminal symbols.

  13. AnBn

  14. AnBn SaSb S

  15. Balanced Parentheses

  16. Balanced Parentheses S (S) S SSS

  17. Context-Free Grammars A context-free grammar G is a quadruple, (V, , R, S), where: ● V is the rule alphabet, which contains nonterminals and terminals. ●  (the set of terminals) is a subset of V, ● R (the set of rules) is a finite subset of (V - ) V*, ● S (the start symbol) is an element of V - . Example: ({S, a, b}, {a, b}, {SaSb, S}, S)

  18. Definition of a Context-Free Language A language L is context-free iff it is generated by some context-free grammar G.

  19. Where Context-Free Grammars Get Their Power Self-embedding rules: SaSb is self-embedding S  SaS is recursive but not self-embedding S  SaT is self-embedding TSa

  20. PalEven = {wwR : w {a, b}*}

  21. PalEven = {wwR : w {a, b}*} G = {{S, a, b}, {a, b}, R, S}, where: R =

  22. PalEven = {wwR : w {a, b}*} G = {{S, a, b}, {a, b}, R, S}, where: R = { SaSa SbSb S }.

  23. WcW = {wcw : w {a, b}*}

  24. WcW = {wcw : w {a, b}*} Why might we care?

  25. WcW = {wcw : w {a, b}*} Why might we care? string winniethepooh; winniethepooh = “bearofverylittlebrain”; Type Checking

  26. Arithmetic Expressions E E + E E E E E id E  (E)

  27. BNF A notation for writing practical context-free grammars • The symbol | should be read as “or”. Example: SaSb | bSa | SS |  • Allow a nonterminal symbol to be any sequence of characters surrounded by angle brackets. Examples of nonterminals: <program> <variable>

  28. BNF for a Java Fragment <block> ::= {<stmt-list>} | {} <stmt-list> ::= <stmt> | <stmt-list> <stmt> <stmt> ::= <block> | while (<cond>) <stmt> | if (<cond>) <stmt> | do <stmt> while (<cond>); | <assignment-stmt>; | return | return <expression> | <method-invocation>;

  29. Grammar vs Prose Why use a grammar? Why not just write prose? http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2

  30. Spam Generation These production rules yield 1,843,200 possible spellings. How Many Ways Can You Spell V1@gra? By Brian Hayes American Scientist, July-August 2007 http://www.americanscientist.org/issues/pub/how-many-ways-can-you-spell-v1gra

  31. HTML <ul> <li>Item 1, which will include a sublist</li> <ul> <li>First item in sublist</li> <li>Second item in sublist</li> </ul> <li>Item 2</li> </ul> A grammar: /* Text is a sequence of elements. HTMLtextElementHTMLtext |  Element UL | LI| … (and other kinds of elements that are allowed in the body of an HTML document) /* The <ul> and </ul> tags must match. UL <ul>HTMLtext</ul> /* The <li> and </li> tags must match. LI <li>HTMLtext</li>

  32. English S  NPVP NP  theNominal | aNominal | Nominal | ProperNoun | NPPP Nominal  N | Adjs N N  cat | dogs | bear | girl | chocolate | rifle ProperNoun  Chris | Fluffy Adjs  Adj Adjs | Adj Adj  young | older | smart VP  V | V NP | VPPP V  like | likes | thinks | shots | smells PP  Prep NP Prep  with

  33. A Bigger English Grammar Generating fake conference papers: SCIgen

  34. Outside-In Structure and RNA Folding

  35. A Grammar for RNA Folding <family>  <tail> <stemloop> [1] <tail>  <base> <base> <base> [1] <stemloop>  C <stemloop-5> G [.23] <stemloop>  G <stemloop-5> C [.23] <stemloop>  A <stemloop-5> U [.23] <stemloop>  U <stemloop-5> A [.23] <stemloop> G <stemloop-5> U [.03] <stemloop> U <stemloop-5> G [.03] <stemloop-5> …

  36. Proving the Correctness of a Grammar AnBn = {anbn : n 0} G = ({S, a, b}, {a, b}, R, S), R = { S aSb S  } ● Prove that G generates only strings in L. ● Prove that G generates all the strings in L.

  37. Proving the Correctness of a Grammar To prove that G generates only strings in L: Imagine the process by which G generates a string as the following loop: • st := S. • Until no nonterminals are left in st do: 2.1. Apply some rule in R to st. 3. Output st. Then we construct a loop invariant I and show that: ●I is true when the loop begins, ●I is maintained at each step through the loop, and ●I  (st contains only terminal symbols) st L.

  38. Proving the Correctness of a Grammar AnBn = {anbn : n 0}.G = ({S, a, b}, {a, b}, R, S), R = { S aSb S  }. ● Prove that G generates only strings in L: Let I = (#a(st) = #b(st))  (st a*(S ) b*). ● Prove that G generates all strings in L: By induction on the length of st.

  39. Structure (Parse Trees) Context free languages: We care about structure. E E + E idE * E 3 idid 5 7 3 + 5 * 7

  40. Structure in English S NP VP Nominal VNP AdjsNNominal AdjN the smart cat smells chocolate

  41. Parsing • Top down • Bottom up

  42. Generative Capacity Because parse trees matter, it makes sense, given a grammar G, to distinguish between: ● G’s weak generative capacity, defined to be the set of strings, L(G), that G generates, and ● G’s strong generative capacity, defined to be the set of parse trees that G generates.

  43. Ambiguity A grammar is ambiguous iff there is at least one string in L(G) for which G produces more than one parse tree. For most applications of context-free grammars, this is a problem.

  44. An Arithmetic Expression Grammar EE + E (1) EEE (2) E (E) (3) Eid (4) 2 + 3 * 5

  45. An Arithmetic Expression Grammar EE + E (1) EEE (2) E (E) (3) Eid (4)

  46. Inherent Ambiguity Both of the following problems are undecidable: • Given an arbitrary context-free grammar G, is G ambiguous? • Given an arbitrary context-free language L, is L inherently ambiguous?

  47. Arithmetic Expressions EE + E EEE E (E) Eid } Problem 1: Associativity E E EE EE EE EE id  id  id id  id  id

  48. Arithmetic Expressions EE + E EEE E (E) Eid } Problem 2: Precedence E E EE EE EE EE id  id + id id  id + id

  49. Arithmetic Expressions - A Better Way EE + T ET TT * F TF F (E) F id Examples: id + id * id id * id * id

  50. Arithmetic Expressions - A Better Way EE + T ET TT * F TF F (E) F id

More Related