1 / 52

Discrete Maths

Discrete Maths. 242-213 , Semester 2, 2013-2014. Objectives to introduce grammars and show their importance for defining programming languages; to show the connection between REs and grammars. 14 . Grammars. Overview. Why Grammars? Languages Using a Grammar Parse Trees

manny
Download Presentation

Discrete Maths

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discrete Maths 242-213, Semester 2, 2013-2014 • Objectives • to introduce grammars and show their importance for defining programming languages; • to show the connection between REs and grammars 14. Grammars

  2. Overview • Why Grammars? • Languages • Using a Grammar • Parse Trees • Ambiguous Grammars • Kinds of Grammars • More Information

  3. 1. Why Grammars? • Grammars are the standard way of defining programming languages. • Tools exist for semi-autiomatically translating grammars into compilers (e.g. JavaCC, lex, yacc, ANTLR) • this saves weeks of work

  4. 2. Languages • We use a natural language to communicate • its grammar rules are very complex • the rules don’t cover important things • We use a formal language to define a programming language • its grammar rules are fairly simple • the rules cover almost everything continued

  5. A formal language is a set of legal strings. • The strings are legal if they correctly use the language’s alphabet and grammar rules. • The alphabet is often called the language’s terminal symbols (or terminals).

  6. Example 1 not shown here; see later • Alphabet (terminals) = {1, 2, 3} • Using the grammar rules, the language is: L1 = { 11, 12, 13, 21, 22, 23, 31, 32, 33} • L1 is the set of strings of length 2.

  7. Example 2 • Terminals = {1, 2, 3} • Using different grammar rules, the language is: L2 = { 111, 222, 333} • L2 is the set of strings of length 3, where all the terminals are the same.

  8. Example 3 • Terminals = {1, 2, 3} • Using different grammar rules, the language is: L3 = {2, 12, 22, 32, 112, 122, 132, ...} • L3 is the set of strings whose numerical value is divisible by 2.

  9. 3. Using a Grammar • A grammar is a notation for defining a language, and is made from 4 parts: • the terminal symbols • the syntactic categories (nonterminal symbols) • e.g. statement, expression, noun, verb • the grammar rules (productions) • e,g, A => B1 B2 ... Bn • the starting nonterminal • the top-most syntactic category for this grammar continued

  10. We define a grammar G as a 4-tuple: G = (T, N, P, S) • T = terminal symbols • N = nonterminal symbols • P = productions • S = starting nonterminal

  11. 3.1. Example 1 • Consider the grammar: T = {0, 1} N = {S, R} P = { S => 0 S => 0 R R => 1 S } S is the starting nonterminal the right hand sides of productions usually use a mix of terminals and nonterminals

  12. Is “01010” in the language? • Start with a S rule: • Rule String Generated-- SS => 0 R 0 RR => 1 S 0 1 SS => 0 R 0 1 0 RR => 1 S 0 1 0 1 SS => 0 0 1 0 1 0 • No more rules can be applied since there are no more nonterminals left in the string. Yes, it is in the language.

  13. Example 2 • Consider the grammar: T = {a, b, c, d, z} N = {S, R, U, V} P = { S => R U z | z R => a | b R U => d V U | c V => b | c } S is the starting nonterminal

  14. The notation: X => Y | Z is shorthand for the two rules: X => YX => Z • Read ‘|’ as ‘or’.

  15. Is “adbdbcz” in the language? • Rule String Generated-- SS => R U z R U zR => a a U zU => d V U a d V U zV => b a d b U zU => d V U a d b d V U zV => b a d b d b U zU => c a d b d b c z Yes! This grammar has choices about how to rewrite the string.

  16. Is “abdbcz” in the language? No • Rule String Generated-- SS => R U z R U zR => a a U zwhich U rule? • U must be replaced by something beginning with a ‘b’, but the only U rule is: U => d V U | c

  17. 3.2. BNF • BNF is a shorthand notation for productions • Backus Normal Form, or • Backus-Naur Form • We have already used ‘|’: X => Y1 | Y2 | ... | Yn John Backus (1924 – 2007) Peter Naur (1928 – ) continued

  18. X => Y [Z]is shorthand for two rules: X => YX => Y Z • [Z] means 0 or 1 occurrences of Z. continued

  19. X => Y { Z }is shorthand for an infinite number of rules: X => YX => Y ZX => Y Z ZX => Y Z Z Z : • { Z } means 0 or more occurrences of Z.

  20. 3.3. A Grammar for Expressions • Consider the grammar: T = { 0, 1, 2,..., 9, +, -, *, /, (, ) } N = { Expr, Number } P = { Expr => Number Expr => ( Expr ) Expr => Expr + Expr | Expr - Expr | Expr * Expr | Expr / Expr } Expr is the starting nonterminal

  21. Defining Number • The RE definition for a number is: number = digit digit*digit = [0-9] • The productions for Number are: Number => Digit { Digit }Digit => 0 | 1 | 2 | 3 | … | 9 orNumber => Number Digit | DigitDigit => 0 | 1 | 2 | 3 | ... | 9

  22. Using Productions • Expand Expr into (125-2)*3 Expr => Expr * Expr => ( Expr ) * Expr => ( Expr - Expr ) * Expr => ( Number - Number ) * Number : => ( 125 - 2 ) * 3 continued

  23. Expand Number into 125 Number => Number Digit => Number Digit Digit => Digit Digit Digit => 1 2 5

  24. 3.4. Grammars are not Unique • Two grammars that do the same thing: Balanced => eBalanced => ( Balanced ) Balanced and: Balanced => eBalanced => ( Balanced )Balanced => Balanced Balanced • Both generate the same strings: (()(())) () e (()())

  25. 4. Parse Trees • A parse tree is a graphical way of showing how productions are used to generate a string. • Data structures representing parse trees are used inside compilers to store information about the program being compiled.

  26. Example 1 • Consider the grammar: T = { a, b } N = { S } P = { S => S S | a S b | a b | b a } S is the starting nonterminal

  27. expand the symbol in the circle Parse Tree for “aabbba” S The root of the tree is the start symbol S: Expand using S => S S S S S Expand using S => a S b continued

  28. S S S S a b Expand using S => a b S S S a S b a b Expand using S => b a continued

  29. Stop when there are no more nonterminals in leaf positions. Read off the string by reading the leaves left to right. S S S a b a S b a b

  30. Example 2 • Consider the grammar: T = { a, +, *, (, ) } N = { E, T, F } P = { E => T | T + E T => F | F * T F => a | ( E ) } E is the starting nonterminal

  31. Is “a+a*a” in the Language? E Expand using E => T + E E T + E Expand using T => F E T + E F continued

  32. Continue expansion until: E T + E F T a * T F a F a

  33. 5. Ambiguous Grammars • A grammar is ambiguous when a string can be represented by more than one parse tree • it means that the string has more than one “meaning” in the language • e.g. a variant of the last grammar example: P = { E => E + E | E * E | ( E ) | a }

  34. Parse Trees for “a+a*a” E E E E + E * E and a E a E + E * E a a a a continued

  35. The two parse trees allow a string like “5+5*5” to be read in two different ways: • 5+ 25 (the left hand tree) • 10*5 (the right hand tree)

  36. Why is Ambiguity Bad? • In a programming language, a string with more than one meaning means that the compiler and run-time system will not know how to process it. • e.g in C: x = 5 + 5 * 5;// what is the value in x?

  37. 6. Kinds of Grammars • There are 4 main kinds of grammar, of increasing expressive power: • regular (type 3) grammars • context-free (type 2) grammars • context-sensitive (type 1) grammars • unrestricted (type 0) grammars • They vary in the kinds of productions they allow. Avram Noam Comsky (1928 – )

  38. 6.1. Regular Grammars S => wTT => xTT => a • Every production is of the form: A => a | a B | e • A, B are nonterminals, a is a terminal • These are sometimes called right linear rules because if a nonterminal appears in the rule body, then it must appear last. • Regular grammars are equivalent to REs (and also to automata).

  39. An Equivalence Diagram Regular Grammars Automata same expressive power REs

  40. Example • Integer => + UInt | - UInt | 0 Digits | 1 Digits | ... | 9 DigitsUInt => 0 Digits | 1 Digits | ... | 9 DigitsDigits => 0 Digits | 1 Digits | ... | 9 Digits | e

  41. 6.2. Context-Free Grammars A => aA => aBcdB => ae • Every production is of the form: A => d • A is a nonterminal, d can be any number of nonterminals or terminals • Most of our examples have been context-free grammars • used widely to define programming languages • they subsume regular grammars

  42. 6.3. Context-Sensitive Grammars A => a11A => aB2dB2 => ae • Every production is of the form: a => d • a, d can contain any number of terminals and nonterminals • a must contain at least 1 nonterminal • size(d) >= size(a) • d cannot bee continued

  43. Context-sensitive rules allow the grammar to specify a context for a rewrite • e.g. A1a0 => 1b00 • the string 2A1a01 becomes 21b001 • Context-sensitive grammars are more powerful than context-free grammars because of this context ability.

  44. Example • The language: E = {012, 001122, 000111222, ... } or, in brief, E = {0n 1n 2n | n >= 1} can only be expressed using a context-sensitive grammar: S => 0 A 1 2 | 0 1 2 A => 0 A 1 C | 0 1 C C 1 => 1 C C 2 => 2 2

  45. Rewrite S to 001122 • S => O A 1 2 0 A 1 2 => 0 0 1 C 1 2 0 0 1 C 12 => 0 0 1 1 C 2 0 0 1 1 C 2 => 0 0 1 1 2 2

  46. 6.4. Unrestricted Grammars A => e11A => aB2 => aeA • Every production is of the form: a => d • a, d can contain any number of terminals and nonterminals; a must contain at least 1 nonterminal • no restrictions on size(d) • it may be smaller than size(a) • d can bee • Also called phrase-structure grammars. more general than context sensitive

  47. Example • The language: E = {e, 012, 001122, 000111222, ... } or, in brief, E = {0n 1n 2n | n >= 0} can only be expressed using an unrestricted grammar: S => 0 A 1 2 | e A => 0 A 1 C | e C 1 => 1 C C 2 => 2 2 new features

  48. Rewrite S to 012 • S => 0 A 1 2 • 0 A 1 2 => 0 1 2 • using A ==> e

  49. 6.5. Why so many Grammar Kinds? • More powerful grammars are more expressive, but also harder to implement efficiently • a trade-off between power and implementation continued

  50. For example, most compilers have two grammar-based components: • the lexical analyzer • uses REs (regular grammars) to parse basic nonterminals such as identifier and number • the syntax analyzer • uses (context-free) grammars to deal with complex syntactic categories such as loops and expressions

More Related