1 / 38

Formal Languages and Grammars

Formal Languages and Grammars. Xiaoyin Wang CS 5363 Spring 2019. Last Class. Compilers: Introduction Why Compilers? Input and Output Structure of Compilers Compiler Design. Today’s Class. Formal Languages What are languages? What are grammars? Phrase-Structure Grammars

eugenef
Download Presentation

Formal Languages and Grammars

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Formal Languages and Grammars Xiaoyin Wang CS 5363 Spring 2019

  2. Last Class • Compilers: Introduction • Why Compilers? • Input and Output • Structure of Compilers • Compiler Design

  3. Today’s Class Formal Languages What are languages? What are grammars? Phrase-Structure Grammars Classification of Grammars (and corresponding languages) 3

  4. Intro to Languages • English grammar tells us if a given combination of words is a valid sentence. • The syntax of a sentence concerns its form while the semantics concerns its meaning, e.g. • the mouse wrote a poem • From a syntax point of view this is a valid sentence. 4

  5. Intro to Languages • From a semantics point of view not so easy…perhaps in Disney land • Natural languages (English, French, etc) have very complex rules of syntax and not necessarily well-defined. • Me too. • Here you are. 5

  6. Formal Language • Formal language • well-defined • finite set of rules • The key problem: • How to express infinite sentences with finite set of rules? • The answer is recursion • Example: Natural Numbers {1, 2, 3, …} 6

  7. Grammars A formal grammar G is compact, precise mathematical definition of a language L. Not a list of examples Finite Fixed Typically use recursion to resolve infinity

  8. Grammars A grammar implies an algorithm that would generate all legal sentences of the language. Often, it takes the form of a set of recursive definitions. An example of recursive definition: <expression> -> <expression> + 1 <expression> -> 1 {1, 1+1, 1+1+1, 1+1+1+1, …}

  9. What is Grammar? • Answer two key questions: • Is a combination of words a valid sentence in a formal language? • How can we generate the valid sentences of a formal language? • Grammars provide models for both natural languages and programming languages. 9

  10. Grammars Example: A grammar that generates a subset of the English language

  11. A derivation of “the boy sleeps”:

  12. A derivation of “a dog runs”:

  13. Language of the grammar L = { “a boy runs”, “a boy sleeps”, “the boy runs”, “the boy sleeps”, “a dog runs”, “a dog sleeps”, “the dog runs”, “the dog sleeps” }

  14. Notation Variable or Non-terminal Symbols of the vocabulary Terminal Symbols of the vocabulary Production rule

  15. A vocabulary / alphabet, V is a finite non-empty set of elements called symbols. Example: V = {a, b, c, A, B, C, S} A word/sentence over V is a string of finite length of elements of V. Example: Aba The empty/null string, ε is the string with no symbols. Basic Terminology

  16. V* is the set of all words over V. Example: V* = {Aba, BBa, bAA, cab …} A language over V is a subset of V*. We can give some criteria for a word to be in a language. Basic Terminology

  17. Phrase-Structure Grammars A popular way to specify a grammar recursively A phrase-structure grammar (abbr. PSG) G = (V,T,S,P) is a 4-tuple, in which: V is a vocabulary (set of symbols) The “template vocabulary” of the language. TV is a set of symbols called terminals Actual symbols of the language. Also, N :≡ V − T is a set of special “symbols” called non-terminals. (Representing concepts like “noun”)

  18. Phrase-Structure Grammars SN is a special non-terminal, the start symbol. in our example the start symbol was “sentence”. P is a set of productions (to be defined). Rules for substituting one sentence fragment for another Every production rule must contain at least one non-terminal on its left side.

  19. Derivation • Let G=(V,T,S,P) be a phrase-structure grammar • Let w0=lz0r (the concatenation of l, z0, and r) and w1=lz1r be strings over V • If z0 z1 is a production of G we say that w1 is directly derivable from w0 and we write wo => w1

  20. Derivation If w0, w1, …., wn are strings over V such that w0 =>w1,w1=>w2,…, wn-1 => wn, then we say that wn is derivable from w0, and write w0=>*wn. The sequence of steps used to obtain wn from wo is called a derivation.

  21. Let G = (V, T, S, P) V = {a, b, A,S} T = {a, b} S is a start symbol P = {S→ aA, A→ aa, A → b} Process: L(G)T = {S} L(G)T = {S, aA} L(G)T = {S, aA, aaa, ab} L(G) = {aaa, ab} Example

  22. Language Let G(V,T,S,P) be a phrase-structure grammar. The language generated by G (or the language of G) denoted by L(G) , is the set of all strings of terminals that are derivable from the starting state S. L(G)= {w  T* | S =>*w} 28

  23. EXAMPLE: Let G = (V, T, S, P), where V = {a, b, A, B,S} T = {a, b}, S is a start symbol P = {S→ Ab | aB, A→ aS, B→ Sb, A → a, B→ b}. G is a Phrase-Structure Grammar. Languages of PSG What sentences can be generated with this grammar?

  24. Languages of PSG P = {S→ Ab | aB, A→ aS, B→ Sb, A → a, B→ b}. Language of the grammar: S => Ab => ab S => Ab => aSb => aAbb => aabb … S => aB => aSb => aaBb => aabb … L = {anbn, n>=1} How to prove? Mathematical Induction

  25. Languages of PSG P = {S→ Ab | aB, A→ aS, B→ Sb, A → a, B→ b} L1 = {anbn, n>=1} (1) Proving L1 L(P)

  26. Languages of PSG P = {S→ Ab | aB, A→ aS, B→ Sb, A → a, B→ b} L1 = {anbn, n>=1} (2) Proving L(P) L1 Using derivation steps as N Within x+2 steps, enumerating all productions: within 2 steps we can only derive: aS1b, ab, S1 can derive at most x more steps, so S1 => w and any w belongs L1, And awb also belongs to L1

  27. Another PSG Example – English Fragment We have G = (V, T, S, P), where: V = {(sentence), (noun phrase), (verb phrase), (article), (adjective), (noun), (verb), (adverb), a, the, large, hungry, rabbit, mathematician, eats, hops, quickly, wildly} T = {a, the, large, hungry, rabbit, mathematician,eats, hops, quickly, wildly} S = (sentence) P = (see next slide)

  28. Productions for our Language P = { (sentence)→ (noun phrase)(verb phrase) (noun phrase) → (article) (adjective) (noun) (noun phrase) → (article) (noun) (verb phrase) → (verb) (adverb) (verb phrase) → (verb) (article) → a (article) → the (adjective) → large (adjective) → hungry (noun) → rabbit (noun) → mathematician (verb) → eats (verb) → hops (adverb) → quickly (adverb) → wildly }

  29. A Sample Sentence Derivation (sentence) (noun phrase) (verb phrase) (article) (adj.) (noun) (verb phrase) (art.) (adj.) (noun) (verb) (adverb) the(adj.) (noun) (verb) (adverb) the large(noun) (verb) (adverb) the large rabbit (verb) (adverb) the large rabbithops(adverb) the large rabbit hops quickly On each step,we apply a production to a fragment of the previous sentence template to get a new sentence template. Finally, we end up with a sequence of terminals (real words), that is, a sentence of our language L.

  30. Defining the PSG Types Type 0: Phase-structure grammars – no restrictions on the production rules Type 1: Context-Sensitive PSG: All after fragments are either longer than the corresponding before fragments, or empty: if b → a, then|b| < |a|  a = ε.

  31. Defining the PSG Types Type 2: Context-Free PSG: All before fragments have length 1 and are non-terminals: if b → a, then|b| = 1 (b N). Type 3: Regular PSGs: All before fragments have length 1 and nonterminals All after fragments are either single terminals, or a pair of a terminal followed by a nonterminal. if b → a, thena T  a  TN.

  32. Types of Grammars -Chomsky hierarchy of languages Venn Diagram of Grammar Types: Type 0 – Phrase-structure Grammars Type 1 – Context-Sensitive Type 2 –Context-Free Type 3 –Regular

  33. Examples: Regular Grammars Only the last symbol at the right hand side can be non-terminal E.g. P = {S → aB, B → bB, B → a} Generates the language ab*a The restriction can be either on the right or left E.g., P = {S → Ba, B → Bb, B → a} Also called linear grammars Why regular language is not suitable for programming language?

  34. Examples: Context Free Grammars Only one non-terminal on the left E.g. P = {S → aSa, S → b} Generates the language anban, n>=1 Widely used as programming languages

  35. Examples: Context Sensitive Grammars For the English Grammar An apple and A book <noun_phrase> → <article> <noun> <noun> → <vowel_noun> | <other_noun> <article> <vowel_noun> → an <vowel_noun> <article> <other_noun> → a <other_noun>

  36. Today’s Class Formal Languages What are languages? What are grammars? Phrase-Structure Grammars Classification of Grammars (and corresponding languages) 42

  37. Next Class Automatons and Regular Languages From Grammar to Automatons NFA and DFA Regular expressions and regular languages 43

More Related