Programming Languages

ProgrammingLanguages

Language Syntax This lecture introduces the the lexical structure of programming languages; the context-free grammars and their description in BNF; the representation of syntactic structure using trees; the issues that arise in constructing BNFs for a programming language; the EBNFs and syntax diagrams.

Language Syntax • Syntax is the structure of a language. • One of the great advances in programming languages has been the development of a formal system for describing syntax that is now almost universally in use. • In the 1950s Noam Chomsky developed the idea of context-free grammars; and John Backus, with contributions by Peter Naur, developed the Backus-Naur forms (BNFs) notational system for describing these grammars.

Lexical Structures • The lexical structure of a programming language is the structure of its words, or tokens. • Typically, the scanning phase of a translator collects sequences of characters from the input program into tokens; • which are then processed by a parsing phase, that determines the syntactic structure.

Tokens • Typical token categories include the following: • Reserved words, sometimes called keywords, such as "begin," "if," and "while“. • Constants or literals, such as 42 (a numeric constant) or "hello" (a string constant). • Special symbols, such as ";", "< =", or "+“. • Identifiers, such as x24, monthly_balance, or write.

Context-Free Grammars And BNFs • We begin the description of grammars and BNFs with an example: • In English, we can express sentences as: • 1. <sentence>:: = <noun-phrase> <verb-phrase>. • 2. <noun-phrase> :: = <article> <noun> 3. <article > ::= a | the 4. <noun > ::= girl | dog 5. <verb-phrase> :: = <verb> <noun-phrase> 6. <verb> ::= sees | pets

Context-Free Grammars And BNFs • Thus we could construct, or derive, the sentence "the girl sees a dog." as follows:

Context-Free Grammars And BNFs • A context-free grammar consist of a series grammar rules as described; the rules consist of a left-hand side that is a single structure name; • followd by a right-hand side consisting of a sequence of items that can be symbols or other structure names. • The names for structures (like <sentence>) are called nonterminals, since they are broken down into further structures.

Productions • The words or token symbols are also called terminals, since they are never broken down. • Grarmmar rules are also called productions, since they "produce" the strings of the language using derivations. • Productions are in Backus-Naur form if they are as given using only the metasymbols ":: = ", "|", "<", and ">". • ( Sometimes parentheses are also allowed to group things together.)

Context-free? • Why is such a grammar context-free? • The simple reason is that the nonterminals appear singly on the left-hand sides of productions. • This means that each nonterminal can be replaced by any right-hand side alternative, no matter where the nonterminal might appear. • In other words, there is no context under which only certain replacements can occur.

Context-free? • Why is such a grammar context-free? • The simple reason is that the nonterminals appear singly on the left-hand sides of productions. • This means that each nonterminal can be replaced by any right-hand side alternative, no matter where the nonterminal might appear. • We shall adopt the view that anything not expressable using context-free grammars is a semantic, not a syntactic issue.

Context-sensitivity • As an example of a context-sensitivity, we noted that articles that appear at the beginning of sentences in the preceding grammar should be capitalized. • One way of doing this is to rewrite the first rule as: • <sentence>:: = <beginning> <noun-phrase> <verb-phrase> '.' • and then add the context-sensitive rule: <beginning> <article>:: = The | A

Context-sensitivity (2) • Now the derivation would look as follows: • <sentence> -> <beginning><noun-phrase> <verb-phrase>. (new rule 1) • -> <beginning> <article> <noun> <verb-phrase>. (rule 2) • -> The <noun> <verb-phrase>. (new context-sensitive rule) • ->…..

BNF form • Context-free grammars have been studied extensively by formal language theorists and are now so well understood that it is natural to express the syntax of any programming language in BNF form. • By doing so makes it easier to write translators for the language, since the parsing stage can be automated.

Syntax-directed Semantics • Syntax establishes structure, not meaning. • But the meaning of a sentence (or program) must be related to its syntax. • To make use of the syntactic structure of a program to determine its semantics we must have a way of expressing this structure as determined by a derivation. • A standard method for doing this is with a parse tree.

Parse Tree • The parse tree describes graphically the replacement process in a derivation. • For example, the parse tree for the sentence "the girl sees a dog." is as follows:

A Simple Arithmetic Expression Grammar • A typical simple example of the use of a context-free grammar in programming languages is the description of simple integer arithmetic expressions with addition and multiplication: • <exp>::=<exp>+<exp>|<exp>*<exp>| (<exp>) | <number> • <number> :: = <number><digit > | <digit > • <digit> :: = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Ambiguity • A grammar for which two distinct parse are possible for the same string is ambiguous. • For example, if we construct: 3 + 4 * 5

Precedence and Associativity • The revised disambiguating grammar for simple arithmetic expression that expresses both precedence and associativity is given as: • <exp> ::= <exp> + <term> | <term> <term> ::= <term>* <factor> | <factor> <factor> ::= (<exp>) | <number> <number> ::= <number><digit> | <digit> <digit> ::= 0|1|2|3|4|5|6|7|8|9 • The above disambiguating rules define the precedence for * and + operators; and apply the left-recursive associative rule.

EBNFs • A special notation for grammar rules is adopted that expresses more clearly the repetitive nature of their structures: • <exp> ::= <term> { + <term>} <term> ::= <factor> { * <factor>} <factor> ::= (<exp>) | <number> <number> ::= <digit> { <digit>} <digit> ::= 0|1|2|3|4|5|6|7|8|9 • We assume that any operator involved in a curly bracket repetition is left-associative.

Syntax Diagrams • A useful graphical representation for a grammar rule is the syntax diagram.

Programming Languages