1 / 41

CS240 Language Theory and Automata Fall 2008

Context-Free Grammars. We have seen finite automata and regular expressions as equivalent means to describe languagesSimple languages onlyCan't do e.g., {anbn}Context-free grammars are a more powerful way to describe languagesCan describe recursive structure. CFGs. First used to describe human languagesNatural recursion among things like noun, verb, preposition and their phrases because e.g. noun phrases can appear inside verb phrases and prepositional phrases, and vice versaImportant ap31980

vadin
Download Presentation

CS240 Language Theory and Automata Fall 2008

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. CS240 • Language Theory and Automata • Fall 2008

    2. Context-Free Grammars We have seen finite automata and regular expressions as equivalent means to describe languages Simple languages only Can’t do e.g., {anbn} Context-free grammars are a more powerful way to describe languages Can describe recursive structure

    3. CFGs First used to describe human languages Natural recursion among things like noun, verb, preposition and their phrases because e.g. noun phrases can appear inside verb phrases and prepositional phrases, and vice versa Important application of CFGs: specification and compilation of programming languages Parser describes and analyzes syntax of a program Algorithms exist to construct parser from CFG

    4. Context-free Languages and Pushdown Automata CFLs are described by CFGs Include the regular languages plus others Pushdown automata are to CFLs as finite automata are to RLs Equivalent: recognize same languages

    5. Example Production rules: substitutions Non-terminals: variable that can have a substitutions Terminals: symbols that are part of the alphabet, no substitutions Start variable: left side of top-most rule

    6. Generating strings Use a grammar to describe a language by generating each string as follows: Write down the start variable Find a variable that is written down and a rule that starts with that variable (I.e. the variable is on the left of the arrow). Replace the variable with the right side for that rule Repeat step 2 until no variables remain

    7. Example Derivation Sequence of substitutions is called a derivation Derivation of aaacbbb: A ? aAb ?aaAbb ?aaaAbbb ? aaaBbbb ? aaacbbb

    8. CFG for English <sentence> ? <noun-phrase> <verb-phrase> <noun-phrase> ? <cmplx-noun> | <cmplx-noun> <prep-phrase> <verb-phrase> ? <cmplx-verb> | <cmplx-verb> <prep-phrase> <prep-phrase> ? <prep> <cmplx-noun> <cmplx-noun> ? <article> <noun> <cmplx-verb> ? <verb> | <verb> <noun-phrase> <article> ? a | the <noun> ? boy | girl | flower <verb> ? touches | likes | sees <prep> ? with

    9. Formal CFG Notation Productions = rules of the form head ? body head is a variable. body is a string of zero or more variables and/or terminals. Start Symbol = variable that represents "the language." Notation: G = (V, ?, P, S) V = variables ? = terminals P = productions S = start symbol

    10. Derivations ?A? ? ??? whenever there is a production A ? ? (Subscript with name of grammar, e.g., ?G, if necessary.) Example: abbAS ? abbaAbS ? ? ? means string can become ? in zero or more derivation steps. Examples: abbAS ? abbAS (zero steps) abbAS ? abbaAbS (one step) abbAS ? abbaabb (three steps)

    11. Language of a CFG L(G) = set of terminal strings w such that S ?G w, where S is the start symbol. Notation a, b,… are terminals, … y, z are strings of terminals. Greek letters are strings of variables and/or terminals, often called sentential forms. A,B,… are variables. …, Y, Z are variables or terminals. S is typically the start symbol.

    12. Leftmost/Rightmost Derivations We have a choice of variable to replace at each step. Derivations may appear different only because we make the same replacements in a different order. To avoid such differences, we may restrict the choice. A leftmost derivation always replaces the leftmost variable in a sentential form. Yields left-sentential forms. Rightmost derivation defined analogously. ?, ?, etc., used to indicate derivations are leftmost or rightmost.

    13. Example A simple example generates strings of a's and b's such that each block of a's is followed by at least as many b's. Note vertical bar separates different bodies for the same head.

    14. Example Leftmost derivation: S ? AS ? AbS ? abbS ? abbAS ? abbaAbS ? abbaabbS ? abbaabb Rightmost derivation: S ? AS ? AAS ? AA ? AaAb ? Aaabb ? Abaabb ? abbaabb Note we derived the same string

    15. Derivation Trees Nodes = variables, terminals, or ? Variables at interior nodes, terminals and ? at leaves A leaf can be ? only if it is the only child of its parent A node and its children from the left must form the head and body of a production

    16. Example

    17. Ambiguous Grammars A CFG is ambiguous if one or more terminal strings have multiple leftmost derivations from the start symbol. Equivalently: multiple rightmost derivations, or multiple parse trees. Example Consider S ? AS | ?, A ? Ab | aAb | ab The string aabbb has the following two leftmost derivations from S: S ? AS ? aAbS ? aAbbS ? aabbbS ? aabbb S ? AS ? AbS ? aAbbS ? aabbbS ? aabbb Intuitively, we can use A ? Ab first or second to generate the extra b.

    18. Inherently Ambiguous Languages A CFL L is inherently ambiguous if every CFG for L is ambiguous. Such things exist, see book. Example The language of our example grammar is not inherently ambiguous, even though the grammar is ambiguous. Change the grammar to force the extra b's to be generated last.

    19. Why Care? Ambiguity of the grammar implies that at least some strings in its language have different structures (parse trees). Thus, such a grammar is unlikely to be useful for a programming language, because two structures for the same string (program) implies two different meanings (executable equivalent programs) for this program. Common example: the easiest grammars for arithmetic expressions are ambiguous and need to be replaced by more complex, unambiguous grammars. An inherently ambiguous language would be absolutely unsuitable as a programming language, because we would not have any way of finding a unique structure for all its programs.

    20. Expression Grammar

    21. Pushdown Automata Add a stack to a FA Typically non-deterministic An automaton equivalent to CFG's

    22. Example Notation for "transition diagrams": a,Z/X1X2 …Xk Meaning: on input a, with Z on top of the stack, consume the a, make this state transition, and replace the Z on top of the stack by X1X2 …Xk (with X1 at the top).

    24. Formal PDA P = (Q, ?, ?, ?, q0, Z0, F) Q, ?, q0, and F have their meanings from FA. ? = stack alphabet Z0 in ? = start symbol (symbol initially on stack) ? = transition function, takes a state, input symbol (or ?) and a stack symbol and gives you a finite number of choices of: A new state (possibly the same) A string of stack symbols (or ?) to replace the top stack symbol

    25. Instantaneous Descriptions (ID's) For a FA, the only thing of interest is its state. For a PDA, we want to know its state and the entire contents of its stack. It is also convenient to maintain a fiction that there is an input string waiting to be read. Represented by an ID (q, w, ?), where q = state, w = waiting input, and ? = stack [top on left].

    26. Moves of the PDA If ?(q,a,X) contains (p,?), then (q,aw,X?) (p,w,??) Extend to * to represent 0, 1, or many moves. Subscript by name of the PDA, if necessary. Input string w is accepted if (q0,w,Z0) * (p, ?,?) for any accepting state p and any stack string ?. L(P) = set of strings accepted by P.

    27. Example

    28. Acceptance by Empty Stack Another one of those technical conveniences: when we prove that PDA's and CFG's accept the same languages, it helps to assume that the stack is empty whenever acceptance occurs N(P) = set of strings w such that (q0, w,Z0) * (p, ?, ?) for some state p. Note p need not be in F In fact, if we talk about N(P) only, then we need not even specify a set of accepting states

    29. Example For our previous example, to accept by empty stack: Add a new transition ?(p,?,Z0) = {(p,?)} That is, when starting to look for a new a-b block, the PDA has the option to pop the last symbol off the stack instead p is no longer an accepting state, in fact, there are no accepting states

    30. A language is L(P1) for some PDA P1 if and only if it is N(P2) for some PDA P2. Can show with constructive proofs

    31. Given P1 = (Q, ?, ?, ?, q0, Z0, F), construct P2: Introduce new start state p0 and new bottom-of-stack marker X0. First move of P2 : replace X0 by Z0X0 and go to state q0. The presence of X0 prevents P2 from "accidentally" emptying its stack and accepting when P1 did not accept. Then, P2 simulates P1, i.e., give P2 all the transitions of P1. Introduce a new state r that keeps popping the stack of P2 until it is empty. If (the simulated) P1 is in an accepting state, give P2 the additional choice of going to state r on ? input, and thus emptying its stack without reading any more input. Final State ?Empty Stack

    32. Given P2 = (Q, ?, ?, ?, q0, Z0, F), construct P1: Introduce new start state p0 and new bottom-of-stack marker X0 First move of P1 : replace X0 by Z0X0 and go to state q0. Then, P2 simulates P1, i.e., give P2 all the transitions of P1 Introduce a new state r for P1, it is the only accepting state P1 simulates P2 If (the simulated) P1 ever sees X0 it knows P2 accepts so P1 goes to state r on ? input Empty Stack ? Final State

    33. Equivalence of CFG's and PDA's The title says it all We'll show a language L is L(G) for some CFG if and only if it is N(P) for some PDA P

    34. Only If (CFG to PDA) Let L = L(G) for some CFG G = (V,?, P, S) Idea: have PDA A simulate leftmost derivations in G, where a left-sentential form (LSF) is represented by: The sequence of input symbols that A has consumed from its input, followed by… A's stack, top left-most Example: If (q, abcd, S) * (q, cd, ABC), then the LSF represented is abABC

    35. Moves of A If a terminal a is on top of the stack, then there had better be an a waiting on the input. A consumes a from the input and pops it from the stack, if so The LSF represented doesn't change! If a variable B is on top of the stack, then PDA A has a choice of replacing B on the stack by the body of any production with head B

    36. APPENDIX

    37. Equivalence of Parse Trees, Leftmost, and Rightmost Derivations The following about a grammar G = (V,?, P, S) and a terminal string w are all equivalent: S ? w (i.e., w is in L(G)). S ? w S ? w There is a parse tree for G with root S and yield (labels of leaves, from the left) w. Obviously (2) and (3) each imply (1).

    38. Parse Tree Implies LM/RM Derivations Generalize all statements to talk about an arbitrary variable A in place of S. Except now (1) no longer means w is in L(G). Induction on the height of the parse tree. Basis: Height 1: Tree is root A and leaves w = a1, a2, …, ak. A ? w must be a production, so A ? w and A ? w.

    39. Induction: Height > 1: Tree is root A with children = X1, X2, …, Xk. Those Xi's that are variables are roots of shorter trees. Thus, the IH says that they have LM derivations of their yields. Construct a LM derivation of w from A by starting with A ? X1X2 …Xk, then using LM derivations from each Xi that is a variable, in order from the left. RM derivation analogous.

    40. Derivations to Parse Trees Induction on length of the derivation. Basis: One step. There is an obvious parse tree. Induction: More than one step. Let the first step be A ? X1X2 …Xk. Subsequent changes can be reordered so that all changes to X1 and the sentential forms that replace it are done first, then those for X2, and so on (i.e., we can rewrite the derivation as a LM derivation). The derivations from those Xi's that are variables are all shorter than the given derivation, so the IH applies. By the IH, there are parse trees for each of these derivations. Make the roots of these trees be children of a new root labeled A.

    41. Example Consider derivation S ? AS ? AAS ? AA ? A1A ? A10A1 ? 0110A1 ? 0110011 Sub-derivation from A is: A ? A1 ? 011 Sub-derivation from S is: S ? AS ? A ? 0A1 ? 0011 Each has a parse tree, put them together with new root S.

    42. Only-If Proof (i.e., Grammar ?PDA) Prove by induction on the number of steps in the leftmost derivation S ? ? that for any x, (q, wx, S) |-* (q, x, ?) , where w? = ? ? is the suffix of ? that begins at the leftmost variable (? = ? if there is no variable). Also prove the converse, that if (q,wx, S) |-* (q, x, ?) then S ? w?. Inductive proofs in book. As a consequence, if y is a terminal string, then S ? y iff (q, y, S) |-* (q,?,? ), i.e., y is in L(G) iff y is in N(A).

More Related