1 / 32

CSC 3130: Automata theory and formal languages

Fall 2009. The Chinese University of Hong Kong. CSC 3130: Automata theory and formal languages. Context-free languages. Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130. Context-free grammar. This is an a different model for describing languages

peony
Download Presentation

CSC 3130: Automata theory and formal languages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fall 2009 The Chinese University of Hong Kong CSC 3130: Automata theory and formal languages Context-free languages Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130

  2. Context-free grammar • This is an a different model for describing languages • The language is specified by productions (substitution rules) that tell how strings can be obtained, e.g. • Using these rules, we can derive strings like this: A → 0A1A → BB → # A, B are variables 0, 1, # are terminals A is the start variable A  0A1  00A11  000A111  000B111  000#111

  3. ART NOUN PREP ART NOUN VERB ART NOUN CMPLX-NOUN CMPLX-NOUN CMPLX-NOUN NOUN-PHRASE PREP-PHRASE CMPLX-VERB VERB-PHRASE NOUN-PHRASE SENTENCE Natural languages • CFGs were first used for natural languages a girl with a flower likes the boy

  4. Some examples • We can describe parts of English like this: ARTICLE → aARTICLE → theNOUN → boyNOUN → girlNOUN → flowerVERB → likesVERB → touchesVERB → seesPREP → with SENTENCE → NOUN-PHRASE VERB-PHRASENOUN-PHRASE → CMPLX-NOUNNOUN-PHRASE → CMPLX-NOUN PREP-PHRASEVERB-PHRASE → CMPLX-VERBVERB-PHRASE → CMPLX-VERB PREP-PHRASEPREP-PHRASE → PREP CMPLX-NOUNCMPLX-NOUN → ARTICLE NOUNCMPLX-VERB → VERB NOUN-PHRASECMPLX-VERB → VERB variables: SENTENCE, NOUN-PHRASE, … terminals: a, the, boy, girl, flower, likes, touches, sees, with start variable: SENTENCE

  5. Derivations (10) (11) (12) (13) (14) (15) (16) (17) (18) ARTICLE → aARTICLE → theNOUN → boyNOUN → girlNOUN → flowerVERB → likesVERB → touchesVERB → seesPREP → with SENTENCE → NOUN-PHRASE VERB-PHRASENOUN-PHRASE → CMPLX-NOUNNOUN-PHRASE → CMPLX-NOUN PREP-PHRASEVERB-PHRASE → CMPLX-VERBVERB-PHRASE → CMPLX-VERB PREP-PHRASEPREP-PHRASE → PREP CMPLX-NOUNCMPLX-NOUN → ARTICLE NOUNCMPLX-VERB → VERB NOUN-PHRASECMPLX-VERB → VERB (1) (2) (3) (4) (5) (6) (7) (8) (9) SENTENCE  NOUN-PHRASE VERB-PHRASE (1)  CPLX-NOUN VERB-PHRASE (2)  ARTICLE NOUN VERB-PHRASE (7)  a NOUN VERB-PHRASE (10)  a boy VERB-PHRASE (12)  a boy CPLX-VERB (4)  a boy VERB (9)  a boy sees (17)

  6. Programming languages • Context-free grammars are also used to describe (parts of) programming languages • For instance, expressions like (2 + 3) * 5 or 3 + 8 + 2 * 7 can be described by the CFG E  E + E E  E * E E (E) E 0 E  1 … E  9 Variables:E Terminals: + * ( ) 0 1 2 3 4 5 6 7 8 9

  7. Motivation for studying CFGs • Context-free grammars are essential for understanding the meaning of computer programs code: (2 + 3) * 5 E  E * E  (E) * E  (E + E) * E  (2 + E) * E  (2 + 3) * E  (2 + 3) * 5 meaning: “add 2 and 3, and then multiply by 5”

  8. Definition of context-free grammar • A context-free grammar (CFG) is a 4-tuple (V, S, R, S) where • V is a finite set of variables or non-terminals • S is a finite set of terminals (VS = ) • R is a set of productions or substitution rules of the formwhere A is a variable V and a is a string with variables and terminals • S is a variable called the start variable A → a

  9. Shorthand notation for productions • When we have multiple productions with the same variable on the left likewe can write this in shorthand as E  E + E E  E * E E  (E) E N N  0N N  1N N  0 N  1 Variables: E, N Terminals: +, *, (, ), 0, 1 Start variable: E E  E + E | E * E | (E)| 0 | 1 N  0N | 1N | 0 | 1

  10. Derivation • A derivation is a sequential application of productions: E  E * E  (E) * E  (E) * N  (E + E ) * N  (E + E ) * 1  (E + N) * 1  (N + N) * 1  (N + 1N) * 1  (N + 10) * 1  (1 + 10) * 1 a  b means b can be obtainedfrom a with one production derivation * a  b means b can be obtainedfrom a after zero or moreproductions

  11. Language of a CFG • The language of a CFG is the set of all strings of terminals that can be derived from the start variable • Such languages are called context-free * L(G) = { |   S* and S   }

  12. Example 1 • Can you derive: • What is the language of this CFG? A → 0A1 | BB → # variables: A, Bterminals: 0, 1, # start variable: A 00#11 00#111 00##11 L = {0n#1n: n ≥ 0}

  13. Example 2 S  SS | (S) |  • Give derivations of (), (()()) • How about ())? convention: variables in uppercase, terminals in lowercase, start variable first • S  (S) (2) •  () (3) • S  (S) •  (SS) •  ((S)S) •  ((S)(S)) •  (()(S)) •  (()())

  14. Design examples L = {0n1n | n 0} These strings have recursive structure: 000000111111 0000011111 00001111 000111 0011 01 e 0S1|  S 

  15. Design examples L = numbers without leading zeros 0, 109, 2, 23 e, 01, 003 allowed not allowed S → 0|LN 1052870032 N → NA|e any number A → 0|L leading digit L → 1|2|3|4|5|6|7|8|9

  16. Design examples L = {0n1n0m1m | n 0, m 0} These strings have two parts: L = L1L2 L1 = {0n1n | n 0} S  S1S1 L2 = {0m1m | m 0} S1 0S11|  rules for L1: S1 0S11|  L2 is the same asL1

  17. Design examples L = {0n1m0m1n | n 0, m 0} These strings have nested structure: outer part: 0n1n inner part: 1m0m S  0S1|I I  1I0| 

  18. Design examples L = {x: xhas two 0-blocks with same number of 0s} 01011, 001011001, 10010101001 01001000, 01111 allowed not allowed 10010011010010110 A: e, or ends in 1 initial part middle part final part C: e, or begins with 1 A B C

  19. Design examples 10010011010010110 A: e, or ends in 1 C: e, or begins with 1 A B C S → ABC B has recursive structure: A → e | U1 00110100 U → 0U | 1U | e D C → e | 1U B → 0D0 | 0B0 at least one 0 D → 1U1 | 1

  20. Context-free versus regular • Write a CFG for the language (0 + 1)*111 • Can you do so for every regular language? • Proof: • S  A111A  e | 0A | 1A Every regular language is context-free regularexpression NFA DFA

  21. From regular to context-free CFG regular expression Æ grammar with no rules e S→ e S →a a (alphabet symbol) S→ S1 | S2 E1 + E2 S→ S1S2 E1E2 S→ SS1 | e E1* In all cases, S becomes the newstart symbol

  22. Context-free versus regular • Is every context-free language regular? • No! We already saw some examples: • This language is context-free but not regular L = {0n1n: n ≥ 0} S → 0S1 | e

  23. Parse tree • Derivations can also be represented using parse trees E E  E + E | E - E | (E) | V V  x | y | z E + E E  E + E  V + E  x + E  x + (E)  x + (E - E)  x + (V - E)  x + (y - E)  x + (y - V)  x + (y - z) V ( E ) x - E E V V y z

  24. Left derivation • Always derive the leftmost variable first: • Corresponds to a left-to-right traversal of parse tree E E  E + E  V + E  x + E  x + (E)  x + (E - E)  x + (V - E)  x + (y - E)  x + (y - V)  x + (y - z) E + E V ( E ) - x E E V V y z

  25. Ambiguity • A grammar is ambiguous if some strings have more than one parse tree • Example: E  E + E | E - E | (E) | V V  x | y | z E E E + E E + E x + y + z E + E V V E + E V V z x V V x y y z

  26. Why ambiguity matters • The parse tree represents the intended meaning: E E E + E E + E x + y + z E + E V V E + E V V z x V V x y y z “first add y and z, and then add this to x” “first add x and y, and then add z to this”

  27. Why ambiguity matters • Suppose we also had multiplication: E  E + E | E - E | E  E | (E) | V V  x | y | z  E E E + E E * E x  y + z  E E V V E + E V V z x V V x y y z “first y + z, then x ” “first x  y, then + z”

  28. Examples • Which of these are ambiguous: L = numbers without leading 0s L = {0n1n | n 0} S → 0|LN S  0S1|  N → NA|e A → 0|P L = {0n1n0m1m | n, m 0} L → 1|2|3|4|5|6|7|8|9 S  S1S1 S1 0S11|  L = {x: xhas two 0-blocks with same number of 0s} L = {0n1m0m1n | n, m 0} B → 0D0 | 0B0 S → ABC D → 1U1 | 1 A → e | U1 S  0S1|I U → 0U | 1U | e C → e | 1U I  1I0| 

  29. Disambiguation • Sometimes we can rewrite the grammar to remove the ambiguity • Rewrite grammar so  cannot be broken by +: E  E + E | E - E | E  E | (E) | V V  x | y | z E  T | E + T | E - TT  F | T  F F  (E) | V V  x | y | z T stands for term:x * (y + z)F stands for factor: x, (y + z) A term always splits into factors A factor is either a variable or a parenthesized expression

  30. Disambiguation • Example E E  T | E + T | E - TT  F | T  F F  (E) | V V  x | y | z E T T F F T F V V V x  y + z

  31. Disambiguation • In general, disambiguation is not possible • There exist inherently ambiguous languages:Every CFG for this language is ambiguous • There is no general procedure that can tell if a grammar is ambiguous • However, grammars used in programming languages can typically be disambiguated

  32. Analysis example • What is the language? • Is it ambiguous? e ab baba abbbaa a bba • S  aB | bA • A  a | aS | bAA • B  b | bS | aBB

More Related