1 / 22

5. Context-Free Grammars and Languages

5. Context-Free Grammars and Languages. CIS 5513 - Automata and Formal Languages – Pei Wang. Languages and grammars. Regular expression: constants and operators Grammar: variables and rewriting rules Difference: whether to give a pattern a name

mmelendez
Download Presentation

5. Context-Free Grammars and Languages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 5. Context-Free Grammars and Languages CIS 5513 - Automata and Formal Languages – Pei Wang

  2. Languages and grammars Regular expression: constants and operators Grammar: variables and rewriting rules Difference: whether to give a pattern a name Example: Binary palindromes do not form a regular language, but can be specified as P → ɛ | 0 | 1 | 0P0 | 1P1 where ‘P’ is a variable, ‘→’ the production symbol, and ‘|’ for alternatives

  3. Context-free grammar A Context-Free Grammar (CFG) G is defined as G = (V, T, P, S): • V: the set of variables (non-terminals, syntactic categories, each as a language) • T: the set of terminal symbols (alphabet) • P: the set of productions (rules) that each has a variable (head) and a string (body) • S: the start symbol (as the whole language)

  4. Example of CFG A simple arithmetic expression consists of identifiers connected by ‘+’ and ‘*’ operators E → I | E + E | E * E | (E) I → a | b | Ia | Ib | I0 | I1 The rules are defined individually, without ‘|’ In E → E + E, the three E’s represent different strings The star operator can be achieved by recursion

  5. Derivation using a CFG A CFG defines a language that consists of the strings of terminals derived from the start symbol using the production rules • Derivation: from the start symbol to the terminals • Recursive inference: from the terminals to the start symbol

  6. Example of recursive inference

  7. Example of derivation Here ‘’ means “derive in one step”. With a ‘*’ above, it means “derive in any number of steps”; With a ‘G’ below, it means “derive by grammar G”

  8. Leftmost/rightmost derivation Leftmost/rightmost derivation restrict the selection of variable to be derived

  9. Context-free language L(G) is called a context-free language (CFL) since G is a context-free grammar A string derived from S is a “sentential form”, which can be “left” (or “right”) if formed by an leftmost (or rightmost) derivation

  10. CFG and regular language A CFG specifies a regular language if it is in one of the following two forms: • Right-linear: if all of its rules have the form of P → ε, P → a, or P → aQ • Left-linear: if all of its rules have the form of P → ε, P → a, or P → Qa The former maps to an ε-NFA, while the latter to the reverse of the former

  11. Exercises for Section 5.1.1 • 5.1.1(a): define the CFG of { 0n1n | n  1 } • 5.1.1(b): define the CFG of { aibjck | i ≠ j or j ≠ k } Solutions: http://infolab.stanford.edu/~ullman/ialcsols/sol5.html#sol51 Alternative solution of 5.1.1(b): S  AD | EC A  ɛ | aA B  ɛ | bB C  ɛ | cC D  bB | cC | bDc E  aA | bB | aEb

  12. Exercises for Section 5.1.2 Solutions: http://infolab.stanford.edu/~ullman/ialcsols/sol5.html#sol51

  13. Parse trees A derivation can be expressed as a parsing tree

  14. Equivalent statements about CFG The sequence of leaves of a parse tree, from left to right, is the yield of the tree, which is the terminal string derived from the start symbol

  15. Parsers Parsing or syntactic analysis is the process of analyzing a string of symbols according to the rules of a formal grammar A parser is a program that generates parse trees from input strings according to a given grammar In UNIX, the YACC command takes a CFG as input, and the output is a fragment of C code that can generate a parse tree

  16. Ambiguity in CFG A CFG is “ambiguous” if there is a string as the yield of different parse trees For example, the grammar of arithmetic expressions allow E + E * E to be parsed in two ways, for the different orders of the two operators The mere existence of different derivations does not imply ambiguity

  17. Removing ambiguity There is no algorithm that can decide whether an arbitrary CFG is ambiguous, nor to remove all ambiguity Some ambiguity can be removed by revising the CFG, such as separating the order of + and * in expressions:

  18. Unique derivation In an unambiguous grammar, leftmost derivations are unique, and so are rightmost derivations Therefore though a variable can have more than one production rule, only one can be applied in each situation For a given CFG, a string has two distinct parse trees if and only if it has two distinct leftmost derivations from the start symbol

  19. Inherent ambiguity A CFL is “inherently ambiguous” if all its grammars are ambiguous Example: L = {anbncmdm}  {anbmcmdn} where m and n are positive integers It is easy to get a CFG that recognizes the two types of strings separately, but it will given the string “aabbccdd” two leftmost derivations, as well as two parse trees

  20. Inherent ambiguity: example

  21. Exercises for Section 5.4 Exercise 5.4.3: Find an unambiguous grammar for the above language Solutions: http://infolab.stanford.edu/~ullman/ialcsols/sol5.html#sol54

  22. Applications of CFG Examples: • Mathematical language • Logical language • Markup language • Programming language • Natural language

More Related