Understanding Context-Free Grammars and Language Specifications
This document explores the concept of context-free grammars (CFGs) and their application in specifying programming languages. It defines CFGs as a 4-tuple, comprising variables, terminal symbols, production rules, and a start symbol. Various examples illustrate the specification of languages, including arithmetic expressions and string patterns. Additionally, it addresses issues of ambiguity in grammars, emphasizing the importance of unambiguous definitions for programming languages to ensure clarity in semantics. Formal notations like Backus-Naur Form (BNF) are also discussed as standard methods for representing CFGs.
Understanding Context-Free Grammars and Language Specifications
E N D
Presentation Transcript
CS 480/680 – Comparative Languages Specifying Languages
Specifying a Language • Informal methods • Textbooks, tutorials, etc. • Formal definitions • Needed for exactness • Compiler writers, etc. • Like technical specifications for design • Syntax – what expressions are legal? • Semantics – what should they do? Language Specification
Context Free Grammars • Definition: A context-free grammar (CFG) is a 4-tuple, G = (V, , R, S) • V = variables, non-terminal symbols • = terminal symbols (alphabet) • R = production rules • S = start symbol, S V • V, , R, S are all finite Language Specification
A Context Free Grammar • V = A, B • = (a, b) • R = A aAa A B B bBb B A B • S = A A aAa A aAa aaAaa A aAa aaBaa A B aabBbaa B bBb aabbBbbaa B bBb aabbbba B What language does this grammar specify? Language Specification
Another Example CFG • V = A • = (a, b) • R = A aAa A bAb A a A b A • S = A What language does this grammar specify? Language Specification
More examples • Write a CFG for the following languages:“All strings consisting of one or more a’s, followed by twice as many b’s.”“Strings with more a’s than b’s.” • There is an entire class devoted to formal specifications of languages: CS 466/666 – Introduction to Formal Languages Language Specification
A CFG for Integer Arithmetic Expressions • V = <num>, <digit>, <op>, <expr> • = [(, ), 0…9, , , , ] • R = <expr> <num> <expr> <op> <expr> (<expr>) <num> <digit><num> | <digit> <digit> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 <op> | | | • S = <expr> Language Specification
Derivation of an Expression • <expr> <expr> <op><expr> (<expr>) <op><expr> (<expr>) + <expr> (<expr> <op> <expr>) + <expr> (<expr> <expr>) + <expr> (<num> <expr>) + <expr> (<digit><num> <expr>) + <expr> (<digit><digit> <num>) + <expr> (7<digit> <num>) + <expr> (73 <num>) + <expr> (73 <digit>) + <expr> (73 4) + <expr> (73 4) + <num> (73 4) + <digit> (73 4) + 9 Language Specification
Parse Trees • The derivation of an expression can also be expressed as a tree • This parse tree can help to resolve the interpretation of an expression • A compiler reads in the source code, and produces a parse tree before generating code. Language Specification
Example Parse Tree • A simple CFG: E E E | 0 | 1 • E E E E E E 1 E E 1 0 E 1 0 1 E E E E E E E E E E 1 1 0 1 0 1 (1 – 0) – 1 1 – (0 – 1) Since there aretwo parse trees for this expression, the grammar is ambiguous. (Note: the order of substitution is not the issue.) Language Specification
Ambiguity • If there are two parse trees for any expression, the grammar is syntactically ambiguous • Programming languages should be specified by unambiguous grammars • Otherwise it is difficult to determine the semantics of a syntactically correct statement • a = b + c * d; • Conventions (like operator precedence) can be used to clarify syntactically ambiguous grammars Language Specification
Disambiguating a grammar • We can disambiguate our simple grammar by adding explicit parentheses: E (E E) | 0 | 1 • E E E (E E) E (1 E) E (1 0) E (1 0) 1 • In general, you can remove ambiguity in a grammar by imposing state in the derivation. Language Specification
An ambiguous grammar • S aSb | aSbb | • Language: L = {anbm | 0 n m 2n} • The number of b’s is between the number of a’s and twice the number of a’s • aabbb can be generated two ways • Disambiguating: • Step 1: Produce all a’s with matching b’s • Step 2: Produce all extra b’s. • S aSb | A | A aAbb | abb Language Specification
BNF • Backus-Naur Form • A standard notation for CFG’s, often used in specifying languages • Non-terminals (variables) are enclosed in <> • <expression>, <number> • <empty> = • is the production symbol () • | is used for “or” Language Specification
BNF Example • <real-number> ::= <integer-part> . <fraction> • <integer-part> ::= <digit> | <integer-part> <digit> • <fraction> ::= <digit> | <digit><fraction> • <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Can we generate the number “.7” from this grammar? Language Specification
Extended BNF • Makes some constructs easier to specify • No more powerful than BNF • Rules: • { } = “zero or more” • [ ] = “optional” or, equivalently “zero or one” • | = “or” • ( ) are used for grouping Language Specification
Arithmetic Expressions • <expression> ::= <expression> + <term> | <expression> – <term> | <term> • <term> ::= <term> * <factor> | <term> / <factor> | <factor> • <factor> ::= number | name | | (<expression>) • <expression> ::= <term> { (+| – ) <term> } • <term> ::= <factor> { (*| / ) <factor> } • <factor> ::= ‘(’ <expression> ‘)’ | number | name Language Specification