1 / 23

Grammars, Languages and Parse Trees

Grammars, Languages and Parse Trees. Language. Let V be an alphabet or vocabulary V* is set of all strings over V A language L is a subset of V* , i.e., L  V * L may be finite or infinite Programming language S et of all possible programs (valid, very long string)

delta
Download Presentation

Grammars, Languages and Parse Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grammars, Languagesand Parse Trees

  2. Language • Let V be an alphabet or vocabulary • V* is set of all strings over V • A language L is a subset of V*, i.e., L V* • Lmay be finite or infinite • Programming language • Set of all possible programs (valid, very long string) • Programs with syntax errors are not in the set • Infinite number of programs

  3. Language Representation • Finite • Enumerate all sentences • Infinite language • Cannot be specified by enumeration • Use a generative device, i.e., a grammar • Specifies the set of all legal sentences • Defined recursively (or inductively)

  4. Sample Grammar • Simple arithmetic expressions (E) • Basis Rules: • A Variable is an E • An Integer is an E • Inductive Rules: • If E1 and E2 are Es, so is (E1 + E2) • If E1 and E2 are Es, so is (E1 * E2) • Examples: x, y, 3, 12, (x + y), (z * (x + y)), ((z * (x + y)) + 12)

  5. Inductive Rules Basis Rules Production Rules • Use symbols (aka syntactical categories) and meta-symbols to define basis and inductive rules • For our example: E  V E  I E  (E + E) E  (E * E)

  6. Formal Definition of a Grammar G = (VN, VT, S, ), where • VN , VT , sets of non-terminal and terminal symbols • SVN, a start symbol •  = a finite set of relations from (VT  VN)+ to (VT  VN)* An element (, ) of , is written as    and is called a production rule or a rewrite rule

  7. E  V | I | (E + E) | (E * E) V  L | VL | VD I  D | ID D  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 L  x | y | z Sample Grammar Revisited VN: E, V, I, D, L VT: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, x, y, z S = E : rules 1-5

  8. Another Simple Grammar • Symbols: S: sentence V: verb O: object A: article N: noun SP: subject phrase VP: verb phrase NP: noun phrase • Rules: S  SP VP SP  A N A  a | the N  monkey | banana | tree VP  V O V  ate | climbs O  NP NP  A N

  9. Context-Free Grammar • A context-free grammar is a grammar with the following restriction: • The relation  is a finite set of relations from VN to (VT  VN)+ • The left hand side of a production is a single non-terminal • The right hand side of any production cannot be empty • Context-free grammars generate context-free languages. With slight variations, essentially all programming languages are context-free languages. We will focus on context-free grammars

  10. More Grammars Which are context-free?

  11. Direct Derivative Let G = (VN, VT, S, ) be a grammar Let α, β  (VN  VT)* β is said to be a direct derivative of α, written α  β, if there are strings 1 and 2 such that: α = 1L 2, β = 1λ 2, L  VN and L  λ is a production of G We go from α to β using a single rule

  12. Examples of Direct Derivatives G = (VN, VT, S, ), where: VN = {I, L, D} VT = {a, b, …, z, 0, 1, …, 9} S = I  = { I  L | ID | IL L  a | b | … | z D  0 | 1 | … | 9 }

  13. Derivation Let G = (VN, VT, S, ) be a grammar A string α producesω, or α reduces to ω, or ωis a derivationof α, written α +ω, if there are strings 1, …, n (n≥1) such that: α  1  2  …  n-1  n  ω We go from α to ω using several rules

  14. Example of Derivation • E  V | I | (E + E) | (E * E) • V  L | VL | VD • I  D | ID • D  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 • L  x | y | z ( ( z * ( x + y ) ) + 12 ) ? E  ( E + E )  ( ( E * E ) + E )  ( ( E * ( E + E ) ) + E )  ( ( V * ( V + V ) ) + I )  ( ( L * ( L + L ) ) + ID )  ( ( z * ( x + y ) ) + DD )  ( ( z * ( x + y ) ) + 12 ) How about: ( x + 2 ) ( 21 * ( x4 + 7 ) ) 3 * z 2y

  15. Grammar-generated Language • If G is a grammar with start symbol S, a sentential form is any derivative of S • A language L generated by a grammar G is the set of all sentential forms whose symbols are all terminals: L(G) = { | S +  and   VT*}

  16. Example of Language • LetG = (VN, VT, S, ), where: VN = {I, L, D} VT = {a, b, …, z, 0, 1, …, 9} S = I  = { I  L | ID | IL L  a | b | … | z D  0 | 1 | … | 9 } • L(G) = {abc12, x, m934897773645, a1b2c3, …} I  ID  IDD  ILDD  ILLDD  LLLDD  aLLDD  abLDD  abcDD  abc1D  abc12

  17. Syntax Analysis: Parsing • The parse of a sentence is the construction of a derivation for that sentence • The parsing of a sentence results in • acceptance or rejection • and, if acceptance, then also a parse tree • We are looking for an algorithm to parse a sentence (i.e., to parse a program) and produce a parse tree

  18. Parse Trees • A parse tree is composed of • interior nodes representing elements of VN • leaf nodes representing elements of VT • For each interior node N, the transition from N to its children represents the application of one production rule

  19. Parse Tree Construction • Top-down • Start with the root (start symbol) • Proceed downward to leaves using productions • Bottom-up • Start from leaves • Proceed upward to the root • Although these seem like reasonable approaches to develop a parsing algorithm, we’ll see later that neither is ideal  we’ll find a better way!

  20. A  V | I | (A + A) | (A * A) • V  L | VL | VD • I  D | ID • D  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 • L  x | y | z • A  V | I | (A + A) | (A * A) • V  L| VL | VD • I  D| ID • D  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 • L  x | y | z • A  V| I | (A + A) | (A * A) • V  L | VL | VD • I  D | ID • D  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 • L  x | y | z • A V | I | (A + A)| (A * A) • V  L | VL | VD • I  D | ID • D  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 • L  x | y | z • A V | I | (A + A) | (A * A) • V  L | VL | VD • I  D | ID • D  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 • L  x | y | z • A V | I | (A + A)| (A * A) • V  L | VL | VD • I  D | ID • D  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 • L  x | y | z A ( A + A ) ( ( A * A ) + A ) ( ( A * ( A + A ) ) + I ) ( ( V * ( V + V ) ) + I D ) ( ( L * ( L + L ) ) + DD ) • A  V | I | (A + A) | (A * A) • V  L | VL | VD • I  D | ID • D  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 • L  x | y | z ( ( z * ( x + y ) ) + 12 ) Top down ( ( z * ( x + y ) ) + 1 2 )

  21. A ( A + A ) ( ( A * A ) + A ) ( ( A * ( A + A ) ) + I ) ( ( V * ( V + V ) ) + I D) ( ( L * ( L + L ) ) + D D) • A V | I | (A + A)| (A * A) • V  L | VL | VD • I  D | ID • D  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 • L  x | y | z ( ( z * ( x + y ) ) + 12 ) Bottom up ( ( z * ( x + y ) ) + 1 2 )

  22. Lexical Analyzer and Parser A syntactically correct program will run. Will it do what you want? [a monkey ate a banana / a banana climbs the tree] • Lexical analyzers • Input: symbols of length 1 • Output: classified tokens • Parsers • Input: classified tokens • Output: parse tree (i.e., syntactically correct program)

  23. Backus-Naur Form (BNF) • A traditional meta-language to represent grammars for programming languages • Every non-terminal is enclosed in < and > • Instead of the symbol , we use ::= • Example • I  L | ID | IL • L  a | b | … | z • D  0 | 1 | … | 9 • <I> ::= <L> | <I><D> | <I><L> • <L> ::= a | b | … | z • <D> ::= 0 | 1 | … | 9 WHY?

More Related