CMP 339/692 Programming Languages Day 6 Thursday, February 16, 2012. Rhys Eric Rosholt. Office: Office Phone: Web Site: Email Address:. Gillet Hall  Room 304 7189608663 http://comet.lehman.cuny.edu/rosholt/ rhys.rosholt @ lehman.cuny.edu. Chapter 3. Describing Syntax and Semantics.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Rhys Eric Rosholt
Office:
Office Phone:
Web Site:
Email Address:
Gillet Hall  Room 304
7189608663
http://comet.lehman.cuny.edu/rosholt/
rhys.rosholt @ lehman.cuny.edu
Chapter 3
Describing Syntax and Semantics
Introduction
The General Problem of Describing Syntax
Formal Methods of Describing Syntax
Attribute Grammars
Describing the Meanings of Programs: Dynamic Semantics
Definition: A sentence is an ordered string of characters over some alphabet
Definition: A language is a set of sentences
Definition: A lexeme is the lowest level syntactic unit of a language
(e.g., *, sum,begin)
Definition: A token is a category of lexemes
(e.g., identifier)
A formal grammar G = (N,Σ,P,S) is a quadtuple such that
N is a finite set of nonterminal symbols
Σis a finite set of terminal symbols,
disjoint from N
P is a finite set of production rules of the formαNβ → γ
S ЄN the start symbol
The language of a formal grammar G, denoted as L(G), is the set of all strings over Σ that can be generated by starting with the start symbol S and then applying the production rules in P until no nonterminal symbols are present.
Recognizers
A device that reads input strings of the language
and decides whether the input strings
belong to the language
Generators
A device that generates sentences of a language
which are used to compare
with the syntax of a particular sentence
<id_list> > ident  ident, <id_list>
<if_stmt> > if <logic_expr> then <stmt>
<stmt> > <single_stmt>
 begin <stmt_list> end
<id_list> > ident  ident, <id_list>
<program> > <stmts>
<stmts> > <stmt>  <stmt> ; <stmts>
<stmt> > <var> = <expr>
<var> > a  b  c  d
<expr> ><term> + <term>
<term>  <term>
<term> > <var>  const
<program> => <stmts>
=> <stmt>
=> <var> = <expr>
=> a = <expr>
=> a = <term> + <term>
=> a = <var> + <term>
=> a = b + <term>
=> a = b + const
A hierarchical representation of a derivation
<program>
<stmts>
<stmt>
<var>
=
<expr>
a
<term>
+
<term>
<var>
const
b
A grammar is ambiguous
if and only if
it generates a sentential form
that has two or more distinct parse trees
<expr> <expr> <op> <expr>  const
<op> /  
<expr>
<expr>
<expr>
<op>
<op>
<expr>
<expr>
<op>
<expr>
<expr>
<op>
<expr>
<expr>
<op>
<expr>
const

const
/
const
const

const
/
const
If we use the parse tree to indicate precedence levels of the operators, we cannot have ambiguity
<expr> <expr>  <term>  <term>
<term> <term> / const  const
<expr>
<expr>

<term>
<term>
<term>
/
const
const
const
Operator associativity can also be indicated by a grammar.
ambiguous:
<expr> > <expr> + <expr>  const
unambiguous:
<expr> > <expr> + const  const
<expr>
<expr>
<expr>
+
const
<expr>
+
const
const
Optional parts are placed in brackets []
<proc_call> > ident [(<expr_list>)]
Alternative parts of RHSs are placed inside parentheses and separated via vertical bars
<term> → <term>(+) const
Repetitions (0 or more) are placed inside braces {}
<ident> → letter {letterdigit}
BNF
<expr> ><expr> + <term>
<expr>  <term>
<term>
<term> ><term> * <factor>
<term> / <factor>
<factor>
EBNF
<expr> ><term> {(+) <term> }
<term> ><factor> {(*/) <factor> }
Rhys Eric Rosholt
Office:
Office Phone:
Web Site:
Email Address:
Gillet Hall  Room 304
7189608663
http://comet.lehman.cuny.edu/rosholt/
rhys.rosholt @ lehman.cuny.edu