Programming Languages and Compilers (CS 421)

Programming Languages and Compilers (CS 421) Munawar Hafiz 2219 SC, UIUC http://www.cs.illinois.edu/class/cs421/ Based in part on slides by Mattox Beckman, as updated by Vikram Adve and Gul Agha

Type Inference - Example • Eliminate : [f :  ; x : ] |- f :   [f :  ; x : ] |- x :  [f :  ; x : ] |- (f x) :  [x : ] |- (fun f -> f x) :  [ ] |- (fun x -> fun f -> f x) :  •   ();   (  );   ( );   

Type Inference Algorithm Let has_type (, e, ) = S •  is a typing environment • e is an expression •  is a (generalized) type, • S is a set of equations between generalized types • Idea: S is the constraints on type variables necessary for  |- e :  • LetUnif(S) be a substitution of generalized types for type variables solving S • Solution: Unif(S)() |- e : Unif(S)()

Type Inference Algorithm has_type (, exp, ) = • Case exp of • Var v --> return {  (v)} • Const c --> return {  } where  |- c :  by the constant rules • fun x -> e --> • Let ,  be fresh variables • Let S = has_type ([x: ] + , e, ) • Return {    }  S

Type Inference Algorithm (cont) • Case exp of • App (e1e2) --> • Let  be a fresh variable • Let S1 = has_type(, e1,   ) • Let S2 = has_type(, e2, ) • Return S1  S2

Type Inference Algorithm (cont) • Case exp of • If e1 then e2 else e3 --> • Let S1 = has_type(, e1, bool) • Let S2 = has_type(, e2, ) • Let S2 = has_type(, e2, ) • Return S1  S2 S3

Unification Problem Given a set of pairs of terms (“equations”) {(s1, t1), (s2, t2), …, (sn, tn)} (theunification problem) does there exist a substitution  (the unification solution) of terms for variables such that (si) = (ti), for all i = 1, …, n?

Unification Algorithm • Let S = {(s1, t1), (s2, t2), …, (sn, tn)} be a unification problem. • Case S = { }: Unif(S) = Identity function (ie no substitution) • Case S = {(s, t)}  S’): Four main steps

Unification Algorithm • Delete: if s = t (they are the same term) then Unif(S) = Unif(S’) • Decompose: if s = f(q1, … , qm) and t =f(r1, … , rm) (same f, same m!), then Unif(S) = Unif({(q1, r1), …, (qm, rm)}  S’) • Orient: if t = x is a variable, and s is not a variable, Unif(S) = Unif ({(x,s)}  S’)

Unification Algorithm • Eliminate: if s = x is a variable, and x does not occur in t (the occurs check), then • Let  = x | t • Let  = Unif((S’)) • Unif(S) = {x | (t)} o  • Note: {x | a} o {y | b} = {y | ({x | a}(b)} o {x | a} if y not in a

Example S = {(f(x), f(g(y,z))), (g(y,f(y)),x)} Solved by {x | g(y,f(y))} o {(z | f(y))} f(g(y,f(y))) = f(g(y,f(y))) x z and g(y,f(y)) = g(y,f(y)) x

Example of Failure • S = {(f(x,g(y)), f(h(y),x))} • Decompose • S -> {(x,h(y)), (g(y),x)} • Orient • S -> {(x,h(y)), (x,g(y))} • Substitute • S -> {(h(y), g(y))} with {x | h(y)} • No rule to apply! Decompose fails!

Example Regular Expressions • (01)*1 • The set of all strings of 0’s and 1’s ending in 1, {1, 01, 11,…} • a*b(a*) • The set of all strings of a’s and b’s with exactly one b • ((01) (10))* • You tell me • Regular expressions (equivalently, regular grammars) important for lexing, breaking strings into recognized words

Start State Example FSA 1 0 1 Final State 0 0 1 1 Final State 0

Ocamllex Regular Expression • Single quoted characters for letters: ‘a’ • _: (underscore) matches any letter • Eof: special “end_of_file” marker • Concatenation same as usual • “string”: concatenation of sequence of characters • e1 | e2: choice - what was e1  e2

Ocamllex Regular Expression • [c1 - c2]: choice of any character between first and second inclusive, as determined by character codes • [^c1 - c2]: choice of any character NOT in set • e*: same as before • e+: same as e e* • e?: option - was e1

Ocamllex Regular Expression • e1 # e2: the characters in e1 but not in e2; e1 and e2 must describe just sets of characters • ident: abbreviation for earlier reg exp in let ident = regexp • e1 as id: binds the result of e1 to id to be used in the associated action

Sample Grammar • Language: Parenthesized sums of 0’s and 1’s • <Sum> ::= 0 • <Sum >::= 1 • <Sum> ::= <Sum> + <Sum> • <Sum> ::= (<Sum>)

BNF Derivations • Pick a rule and substitute: • <Sum> ::= <Sum> + <Sum> <Sum> => <Sum> + <Sum >

Example cont. • 1 * 1 + 0: <exp> <factor> <bin> * <exp> 1 <factor> + <factor> <bin> <bin> 1 0 Fringe of tree is string generated by grammar

Example: Ambiguous Grammar • 0 + 1 + 0 <Sum> <Sum> <Sum> + <Sum> <Sum> + <Sum> <Sum> + <Sum> 0 0 <Sum> + <Sum> 0 1 1 0

Two Major Sources of Ambiguity • Lack of determination of operator precedence • Lack of determination of operator assoicativity • Not the only sources of ambiguity

How to Enforce Associativity • Have at most one recursive call per production • When two or more recursive calls would be natural leave right-most one for right assoicativity, left-most one for left assoiciativity

Example • <Sum> ::= 0 | 1 | <Sum> + <Sum> | (<Sum>) • Becomes • <Sum> ::= <Num> | <Num> + <Sum> • <Num> ::= 0 | 1 | (<Sum>)

Operator Precedence • Operators of highest precedence evaluated first (bind more tightly). • Precedence for infix binary operators given in following table • Needs to be reflected in grammar

Predence in Grammar • Higher precedence translates to longer derivation chain • Example: <exp> ::= <id> | <exp> + <exp> | <exp> * <exp> • Becomes <exp> ::= <mult_exp> | <exp> + <mult_exp> <mult_exp> ::= <id> | <mult_exp> * <id>

Problems for Recursive-Descent Parsing • Left Recursion: A ::= Aw translates to a subroutine that loops forever • Indirect Left Recursion: A ::= Bw B ::= Av causes the same problem

Problems for Recursive-Descent Parsing • Parser must always be able to choose the next action based only only the next very next token • Pairwise Disjointedness Test: Can we always determine which rule (in the non-extended BNF) to choose based on just the first token

Pairwise Disjointedness Test • For each rule A ::= y Calculate FIRST (y) = {a | y =>* aw}  { | if y =>* } • For each pair of rules A ::= y and A ::= z, require FIRST(y)  FIRST(z) = { }

Factoring Grammar • Test too strong: Can’t handle <expr> ::= <term> [ ( + | - ) <expr> ] • Answer: Add new non-terminal and replace above rules by <expr> ::= <term><e> <e> ::= + <term><e> <e> ::=  • You are delaying the decision point

Both <A> and have problems: <S> ::= <A> a b <A> ::= <A> b | b ::= a | a Transform grammar to: <S> ::= <A> a b <A> ::-= b<A1> <A1> :: b<A1> |  ::= a<B1> <B1> ::= a<B1> |  Example

Programming Languages and Compilers (CS 421)