1 / 32

Compilation: Backus-Naur Form (BNF) and Context Free Grammars (CFGs)

Compilation: Backus-Naur Form (BNF) and Context Free Grammars (CFGs). We care about. Completeness of specification Determination of legal expressions Resolution of ambiguities Avoid small mistakes causing major errors. BNF/CFG. Backus-Naur Form Context Free Grammars

Download Presentation

Compilation: Backus-Naur Form (BNF) and Context Free Grammars (CFGs)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compilation: Backus-Naur Form (BNF) and Context Free Grammars (CFGs)

  2. We care about • Completeness of specification • Determination of legal expressions • Resolution of ambiguities • Avoid small mistakes causing major errors

  3. BNF/CFG • Backus-Naur Form • Context Free Grammars • Similar mechanisms for specifying legal syntax • Both focused on production rules

  4. A language is a set S of sentences composed of words from a vocabularyΣ formed according to rules of grammar G. (informally) • e.g.: Let Σ = {Bill, Tom, likes, eats, cheese, rice}

  5. <sentence> is <subject><verb><object> <subject> is Bill or Tom <verb> is likes or eats <object> is cheese or rice • To see if a sentence is legal, we parse it (possibly using a parse tree)

  6. <sentence> is <subject><verb><object> <subject> is Bill or Tom <verb> is likes or eats <object> is cheese or rice • To see if a sentence is legal, we parse it (possibly using a parse tree) • <> around non-terminal symbols – things that are not in the vocabulary • Other words are terminal symbols – things in the vocabulary

  7. Definitions • An alphabet or vocabularyΣ is a (usually finite) set of symbols (characters, words)

  8. Definitions • An alphabet or vocabularyΣ is a (usually finite) set of symbols (characters, words) • A string or sentence is a sequence of (0 or more) symbols from an alphabet or vocabulary

  9. Definitions • An alphabet or vocabularyΣ is a (usually finite) set of symbols (words, characters) • A string or sentence is a sequence of (0 or more) symbols from an alphabet • The empty string λ has no characters • A dictionary over an alphabet Σ, denoted Σ*, is the set of all strings formed from the symbols of Σ • A language L (over an alphabet Σ) is a subset of Σ*, restricted by certain sytactical rules (of grammar)

  10. Strings • The length of a string x, |x|, is the total number of symbols in x. • The concatenation of string x and string y, xy, is the string composed of symbols of x followed by symbols of y

  11. Strings • The length of a string x, |x|, is the total number of symbols in x. • The concatenation of string x and string y, xy, is the string composed of symbols of x followed by symbols of y • Let x = 001 and y = 01001 • What is |x|? • What is |y|? • What is xy? • What is |xy|?

  12. Strings • The length of a string x, |x|, is the total number of symbols in x. • The concatenation of string x and string y, xy, is the string composed of symbols of x followed by symbols of y • xi is the string of i instances of x concatenated • x* (Kleene closure) is the set of all concatenations of 0 or more instances of x • x+ (positive closure) is the set of all concatenations of 1 or more instances of x

  13. Sets of strings • A = {0, 1}* • B = {01, 1000, 1} • C = {01, 1} • Describe the following sets of strings in words • A • AB • C* • B2 • C+

  14. Grammars • A (phrase-structure) grammar (PSG) is a quadruple (N, T, P, S) where • N is the set of non-terminal symbols • T (=Σ) is the set of terminal symbols • P is the set of productions (rewrite rules) • S (in N) is the start symbol • The language L(G) defined by the grammar G is the set of sentences that can be derived from the start symbol S by applying productions in P • A sentential form of G is S or any string that can be derived from S • A sentence of G is a sentential form that contains only terminals

  15. N = {<sentence>, <subject>, <verb>, <object>} • T = {Bill, Tom, likes, eats, cheese, rice} • S = <sentence> • P = { <sentence> <subject><verb><object> <subject> Bill <subject> Tom <verb> likes <verb> eats <object> cheese <object> rice }

  16. BNF (Backus-Naur Form) • A type of grammar and a notation for expressing them. • 4 meta-symbols: ::=, <, >, | • ::= replaces • <, > surround non-teminals • | indicates alternatives • Things not in <, > are terminals • Left-hand side MUST be a single non-terminal • This is a very important restriction! • This restriction on its own classifies a PSG as a CFG

  17. The grammar using BNF N = {<sentence>, <subject>, <verb>, <object>} • T = {Bill, Tom, likes, eats, cheese, rice} • S = <sentence> • P = { <sentence> ::= <subject><verb><object> <subject> ::= Bill | Tom <verb> ::= likes | eats <object> ::= cheese | rice }

  18. How does this apply to programming languages? • https://docs.python.org/3/reference/grammar.html • http://inst.eecs.berkeley.edu/~cs164/fa11/python-grammar.html

  19. Extended BNF • [ ] enclose optional items • { } enclose a string which can be repeated 0 or more times • { }+ enclose a string which can be repeated 1 or more times • { }ij enclose a string which can be repeated between i and j times (inclusive) • You may sometimes see slightly different notation; non-terminals denoted by uppercase letters, instead of ::=, “” around terminals, etc. Use your noggin :) • When would you want one or the other?

  20. <sentence> ::= <subject><verb><object> <subject> ::= Bill | Tom <verb> ::= likes | eats <object> ::= cheese | rice • To see if a sentence is legal, we parse it (possibly using a parse tree)

  21. Ambiguity • A grammar that produces more than one derivation for the same sentence is ambiguous • Eg: E E+E | E*E | (E) | -E | id • There are two derivations for id+id*id – find them both.

  22. Ambiguity continued • There are some languages for which there is no unambiguous grammar • HOWEVER there are some rules of thumb we can use to help us deal with many ambiguous grammars • Ambiguity often arises if the right side of a production rule contains 2 or more occurrences of the same non-terminal • Ambiguity can cause problems if the grammar needs to observe rules of precedence or associativity

  23. E E+E | E*E | (E) | -E | id

  24. E E+E | E*E | (E) | -E | id • E E+T

  25. E E+E | E*E | (E) | -E | id

  26. Derivations • A leftmost derivation is one in which only the leftmost non-terminal in a sentential form is replaced at each step. • A rightmost derivation is one in which only the rightmost non-terminal in a sentential form is replaced at each step • Remember ambiguity? Leftmost and rightmost derivations are usually unique • Generally pick leftmost or rightmost and stick with it • if they are different the language is ambiguous (not iff)

  27. Leftmost/Rightmost derivation example

  28. Parse trees are central to compiler theory • They allow us to identify the production rule corresponding to the chunk of code, and from there replace that chunk of code with a chunk of assembly.

  29. Phases of compilation lexical analysis (tokenization) – grouping the characters into tokens (int, {, x, etc) (linear scan) Syntax analysis (parsing) – grouping tokens into expressions or statements ( int x = 10;) (recursion; CFG) semantic analysis (Syntax-directed translation), - type checking, implicit typecasting, check indices to arrays, variables declared before use, etc. Necessary because most programming languages can't be completely captured by CFG code generation (next) – actually generating assembler code code optimization – looking for ways to make the assembler faster

  30. Code generation: Using the parse tree to generate assembler code To prune the leaves of a parse tree means to eliminate all the leaves of a node, replacing the leaves (and their parent) with the intended “meaning” (a value or a chunk of code)

  31. Compilation vs interpretation An interpreted language is a programming language for which most of its implementations execute instructions directly, without previously compiling a program into machine-language instructions. The interpreter executes the program directly, translating each statement into a sequence of one or more subroutines already compiled into machine code.

More Related