### Specifying Languages

CS 480/680 – Comparative Languages

Specifying a Language

- Informal methods
- Textbooks, tutorials, etc.

- Formal definitions
- Needed for exactness
- Compiler writers, etc.

- Like technical specifications for design

- Needed for exactness
- Syntax – what expressions are legal?
- Semantics – what should they do?

Language Specification

Context Free Grammars

- Definition: A context-free grammar (CFG) is a 4-tuple, G = (V, , R, S)
- V = variables, non-terminal symbols
- = terminal symbols (alphabet)
- R = production rules
- S = start symbol, S V

- V, , R, S are all finite

Language Specification

A Context Free Grammar

- V = A, B
- = (a, b)
- R = A aAa A B B bBb B A B
- S = A

A aAa A aAa

aaAaa A aAa

aaBaa A B

aabBbaa B bBb

aabbBbbaa B bBb

aabbbba B

What language does this grammar specify?

Language Specification

Another Example CFG

- V = A
- = (a, b)
- R = A aAa A bAb A a A b A
- S = A

What language does this grammar specify?

Language Specification

More examples

- Write a CFG for the following languages:“All strings consisting of one or more a’s, followed by twice as many b’s.”“Strings with more a’s than b’s.”
- There is an entire class devoted to formal specifications of languages: CS 466/666 – Introduction to Formal Languages

Language Specification

A CFG for Integer Arithmetic Expressions

- V = <num>, <digit>, <op>, <expr>
- = [(, ), 0…9, , , , ]
- R = <expr> <num> <expr> <op> <expr> (<expr>) <num> <digit><num> | <digit> <digit> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 <op> | | |
- S = <expr>

Language Specification

Derivation of an Expression

- <expr> <expr> <op><expr> (<expr>) <op><expr> (<expr>) + <expr> (<expr> <op> <expr>) + <expr> (<expr> <expr>) + <expr> (<num> <expr>) + <expr> (<digit><num> <expr>) + <expr> (<digit><digit> <num>) + <expr> (7<digit> <num>) + <expr> (73 <num>) + <expr> (73 <digit>) + <expr> (73 4) + <expr> (73 4) + <num> (73 4) + <digit> (73 4) + 9

Language Specification

Parse Trees

- The derivation of an expression can also be expressed as a tree
- This parse tree can help to resolve the interpretation of an expression
- A compiler reads in the source code, and produces a parse tree before generating code.

Language Specification

Example Parse Tree

- A simple CFG: E E E | 0 | 1
- E E E E E E 1 E E 1 0 E 1 0 1

E

E

E

E

E

E

E

E

E

E

1

1

0

1

0

1

(1 – 0) – 1

1 – (0 – 1)

Since there aretwo parse trees for this expression, the grammar is ambiguous.

(Note: the order of substitution is not the issue.)

Language Specification

Ambiguity

- If there are two parse trees for any expression, the grammar is syntactically ambiguous
- Programming languages should be specified by unambiguous grammars
- Otherwise it is difficult to determine the semantics of a syntactically correct statement
- a = b + c * d;
- Conventions (like operator precedence) can be used to clarify syntactically ambiguous grammars

Language Specification

Disambiguating a grammar

- We can disambiguate our simple grammar by adding explicit parentheses: E (E E) | 0 | 1
- E E E (E E) E (1 E) E (1 0) E (1 0) 1
- In general, you can remove ambiguity in a grammar by imposing state in the derivation.

Language Specification

An ambiguous grammar

- S aSb | aSbb |
- Language: L = {anbm | 0 n m 2n}
- The number of b’s is between the number of a’s and twice the number of a’s

- aabbb can be generated two ways
- Disambiguating:
- Step 1: Produce all a’s with matching b’s
- Step 2: Produce all extra b’s.

- S aSb | A | A aAbb | abb

Language Specification

BNF

- Backus-Naur Form
- A standard notation for CFG’s, often used in specifying languages
- Non-terminals (variables) are enclosed in <>
- <expression>, <number>
- <empty> =

- is the production symbol ()
- | is used for “or”

- Non-terminals (variables) are enclosed in <>

Language Specification

BNF Example

- <real-number> ::= <integer-part> . <fraction>
- <integer-part> ::= <digit> | <integer-part> <digit>
- <fraction> ::= <digit> | <digit><fraction>
- <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Can we generate the number “.7” from this grammar?

Language Specification

Extended BNF

- Makes some constructs easier to specify
- No more powerful than BNF
- Rules:
- { } = “zero or more”
- [ ] = “optional” or, equivalently “zero or one”
- | = “or”
- ( ) are used for grouping

Language Specification

Arithmetic Expressions

- <expression> ::= <expression> + <term> | <expression> – <term> | <term>
- <term> ::= <term> * <factor> | <term> / <factor> | <factor>
- <factor> ::= number | name | | (<expression>)

- <expression> ::= <term> { (+| – ) <term> }
- <term> ::= <factor> { (*| / ) <factor> }
- <factor> ::= ‘(’ <expression> ‘)’ | number | name

Language Specification

