410 likes | 546 Views
This document explores the balance between purely functional programming and the need for mutable state, particularly within OCaml. It emphasizes that while functional programming simplifies reasoning and guarantees data persistence, real-world applications often require side effects and mutable structures for efficiency. Examples such as hash-tables illustrate the trade-offs involved. The document also discusses the implications of mutable references, the challenges of reasoning about mutable state, and offers practical considerations for implementing imperative constructs within a functional paradigm.
E N D
CS51: References and Mutation Greg Morrisett
Thus far… • We have considered the (almost) purely functional subset of Ocaml. • We’ve had a few side effects: printing & raising exceptions. • Two reasons for this emphasis: • Reasoning about functional code is easier. • Both formal reasoning (a la the substitution model) and informal reasoning. • In part, because anything you can prove stays true. • e.g., 3 is a member of set S. • This is because the data structures are persistent. • They don’t change – we build new ones and let the garbage collector reclaim the unused old ones. • To convince you that you don’t need side effects for many things where you previously thought you did. • e.g., there’s no need for a loop to have a mutable counter that we update each time. • You can implement functional data structures like 2-3 trees with reasonable space and time.
But alas… • Purely functional code is pointless. • The whole reason we write code is to have some effect on the world. • For example, the Ocaml top-level loop prints out your result. • Without that printing (a side effect), how would you know that your functions computed the right thing? • Some algorithms or data structures need mutable state. • Example: hash-tables have (essentially) constant-time access and update. • The best functional dictionaries have either: • logarithmic access & update • constant access & linear update • constant update & linear access • Don’t forget that we give up something for this: we can’t go back and look at previous versions of the dictionary. We can do this in a functional setting. • Another Example: Robinson’s unification algorithm • A critical part of the Ocaml type-inference engine. • Also used in other kinds of program analyses.
Still: • Reasoning about mutable state (and side effects more generally) is much harder. • Types don’t help much. • These effects don’t show up in interfaces. • If I insert x into S, call a function f, and then ask member(x,S) will it evaluate to true? • I need to go off and look at f to determine if it modifies S. • Worse, in a concurrent setting, I need to look at every other function that any other thread may be executing to see if it modifies S. • Sharing mutable state is fraught with peril. • The “aliasing” problem. • Moral: use mutable data structures only where necessary. • This will also be true when you use Java or C/C++ or Python or … • It’s harder to respect this abstraction boundary in non-functional languages. • But like all abstraction boundaries, it pays dividends down the road.
References • New type: t ref • Think of it as a pointer to a box that holds a t value. • The pointer can be shared. • The contents of the box can be read or written. • To create a fresh box: ref 42 • allocates a new box, initializes its contents to 42, and returns a pointer to that box. • so similar to: int *r = malloc(sizeofint) ; *r = 42 ; return r; • To read the contents: !r • if r points to a box containing 42, then returns 42. • similar to *r in C/C++ • To write the contents: r := 42 • updates the box that r points to so that it contains 42. • similar to *r = 42 in C/C++
Example let c = ref 0 ;; let x = !c ;; (* x will be 0 *) c := 42 ;; let y = !c ;; (* y will be 42. x will still be 0! *)
Another Example let c = ref 0 ;; let next() = let v = !cin (c := v+1 ; v)
Another Example A semicolon is used to sequence expressions. let c = ref 0 ;; let next() = let v = !cin (c := v+1 ; v)
Another Example So this updates c and then returns v. let c = ref 0 ;; let next() = let v = !cin (c := v+1 ; v)
Equivalent to: let c = ref 0 ;; let next() = let v = !cin let _ = c := v+1 in v
Equivalent to: let c = ref 0 ;; let next() = let v = !cin let dummy : unit = c := v+1 in v
Another Example let c = ref 0 ;; let next() = let v = !cin (c := v+1 ; v) next() ;; (* returns 0 *) next() ;; (* returns 1 *) (next()) + (next()) ;; (* returns ??? *)
Aliasing let c = ref 0 ;; let x = c ;; c := 42 ;; !x ;;
Aliasing 0 let c = ref 0 ;; let x = c ;; c := 42 ;; !x ;;
Aliasing 0 let c = ref 0 ;; let x = c ;; c := 42 ;; !x ;;
Aliasing 42 let c = ref 0 ;; let x = c ;; c := 42 ;; !x ;;
Aliasing 42 let c = ref 0 ;; let x = c ;; c := 42 ;; !x ;; (* evaluates to 42 *)
Imperative Stacks module type IMP_STACK = sig type ‘a stack valempty : unit -> ‘a stack valpush : ‘a -> ‘a stack -> unit valpop : ‘a stack -> ‘a option end
Imperative Stacks When you see “unit” as the return type, you know the function is being executed for its side effects. (Like void in C/C++/Java.) module type IMP_STACK = sig type ‘a stack valempty : unit -> ‘a stack valpush : ‘a -> ‘a stack -> unit valpop : ‘a stack -> ‘a option end
Imperative Stacks Unfortunately, we can’t always tell from the type that there are side-effects going on. It’s a good idea to document them explicitly. module type IMP_STACK = sig type ‘a stack valempty : unit -> ‘a stack valpush : ‘a -> ‘a stack -> unit valpop : ‘a stack -> ‘a option end
Imperative Stacks module ImpStack : IMP_STACK = struct type ‘a stack = (‘a list) ref let empty() : ‘a stack = ref [] let push(x:’a)(s:’a stack) : unit = s := x::(!s) let pop(s:’a stack) : ‘a option = match !swith | [] -> None | h::t -> (s := t ; Some h) end
Imperative Queues module type IMP_QUEUE = sig type ‘a queue valempty : unit -> ‘a queue valenq : ‘a -> ‘a queue -> unit valdeq : ‘a queue -> ‘a option end
Imperative Queues module ImpQueue : IMP_QUEUE = struct type ‘a queue = {front:‘a list ref ; rear :’a list ref} let empty() = {front=ref[] ; rear=ref[]} let enqxq = q.rear := x::(!(q.rear)) let deqq = match !(q.front) with | h::t -> (q.front := t; Some h) | [] -> (match rev (!(q.rear)) with | h::t -> (q.front := t; Some h) | [] -> None) end
Alternative? module ImpQueue2 : IMP_QUEUE = struct type ‘a mlist = Nil | Cons of ‘a * ((‘a mlist) ref) type ‘a queue = {front:’amlist ref ; rear :’a mlist ref} let empty() = {front=ref Nil ; rear=ref Nil} let enqxq = match !q.rearwith | Cons(h,t) -> (t := Cons(x,ref Nil) ; q.rear := !t) | Nil -> (q.front := Cons(x,ref Nil); q.rear:= !(q.front)) let deqq = match !q.frontwith | Cons(h,t) -> (q.front := !t ; Some h) | Nil -> None end
Better module ImpQueue2 : IMP_QUEUE = struct type ‘a mlist = Nil | Cons of ‘a * ((‘a mlist) ref) type ‘a queue = {front:’amlist ref ; rear :’a mlist ref} let empty() = {front=ref Nil ; rear=ref Nil} let enqxq = match !q.rearwith | Cons(h,t) -> (assert (!t = Nil); t := Cons(x,ref Nil) ; q.rear := !t) | Nil -> (assert (!(q.front) = Nil); q.front := Cons(x,ref Nil); q.rear = !(q.front)) let deqq = match !q.frontwith | Cons(h,t) -> (q.front := t ; (match !twith Nil -> q.rear := Nil | Cons(_,_) -> ()) ; Some h) | Nil -> None end
Mutable Lists type ‘a mlist = Nil | Cons of ‘a * ((‘a mlist) ref) let reclength(m:’amlist) : int = match mwith | Nil -> 0 | Cons(h,t) -> 1 + length(!t)
Fraught with Peril type ‘a mlist = Nil | Cons of ‘a * ((‘a mlist) ref) let recmlength(m:’amlist) : int = match mwith | Nil -> 0 | Cons(h,t) -> 1 + length(!t) let r = ref Nil ;; let m = Cons(3,r) ;; r := m ;; mlengthm ;;
Another Example: let recmappendxsys = match xswith | Nil -> () | Cons(h,t) -> (match !twith | Nil -> t := ys | Cons(_,_) asm -> mappendmys)
Mutable Append Example: 1 2 3 4 5 6 let recmappendxsys = match xswith | Nil -> () | Cons(h,t) -> (match !twith | Nil -> t := ys | Cons(_,_) asm -> mappendmys) ;; let xs = Cons(1,ref (Cons 2, ref (Cons 3, ref Nil))) ;; let ys = Cons(4,ref (Cons 5, ref (Cons 6, ref Nil))) ;; mappendxsys ;;
Mutable Append Example: xs 1 2 3 ys 4 5 6 let recmappendxsys = match xswith | Nil -> () | Cons(h,t) -> (match !twith | Nil -> t := ys | Cons(_,_) asm -> mappendmys) ;; let as = Cons(1,ref (Cons 2, ref (Cons 3, ref Nil))) ;; let bs = Cons(4,ref (Cons 5, ref (Cons 6, ref Nil))) ;; mappend as bs ;;
Mutable Append Example: xs 1 2 3 ys 4 5 6 let recmappendxsys = match xswith | Nil -> () | Cons(h,t) -> (match !twith | Nil -> t := ys | Cons(_,_) asm -> mappendmys) ;; let as = Cons(1,ref (Cons 2, ref (Cons 3, ref Nil))) ;; let bs = Cons(4,ref (Cons 5, ref (Cons 6, ref Nil))) ;; mappend as bs ;;
Mutable Append Example: xs 1 2 3 ys 4 5 6 let recmappendxsys = match xswith | Nil -> () | Cons(h,t) -> (match !twith | Nil -> t := ys | Cons(_,_) asm -> mappendmys) ;; let as = Cons(1,ref (Cons 2, ref (Cons 3, ref Nil))) ;; let bs = Cons(4,ref (Cons 5, ref (Cons 6, ref Nil))) ;; mappend as bs ;;
Mutable Append Example: xs 1 2 3 ys 4 5 6 let recmappendxsys = match xswith | Nil -> () | Cons(h,t) -> (match !twith | Nil -> t := ys | Cons(_,_) asm -> mappendmys) ;; let as = Cons(1,ref (Cons 2, ref (Cons 3, ref Nil))) ;; let bs = Cons(4,ref (Cons 5, ref (Cons 6, ref Nil))) ;; mappend as bs ;;
Another Example: let recmappendxsys = match xswith | Nil -> () | Cons(h,t) -> (match !twith | Nil -> t := y | Cons(_,_) asm -> mappendmys) let m = Cons(1,ref Nil);; mappendmm ;; mlengthm ;;
Mutable Append Example: xs ys 1 let recmappendxsys = match xswith | Nil -> () | Cons(h,t) -> (match !twith | Nil -> t := ys | Cons(_,_) asm -> mappendmys) ;; let m = Cons(1,ref Nil);; mappendmm ;;
Mutable Append Example: xs ys 1 let recmappendxsys = match xswith | Nil -> () | Cons(h,t) -> (match !twith | Nil -> t := ys | Cons(_,_) asm -> mappendmys) ;; let m = Cons(1,ref Nil);; mappendmm ;;
Morals • Mutable data structures can lead to asymptotic efficiency improvements. • e.g., our example queues with worst-case constant-time enq/deq • But they are much harder to get right. • mostly because we must think about aliasing. • updating in one place may actually have a big effect on other places. • writing and enforcing invariants becomes more important. • e.g., assertions we used in the queue example • cycles in data structures aren’t a problem until we introduce refs. • must write operations much more carefully to avoid looping • we haven’t even gotten to the multi-threaded part. • So use refs when you must, but try hard to avoid it.
Is it possible to avoid all state? • Yes! (in single-threaded programs) • We can pass in the old values of references and return their new values. • Consider the difference between our functional stacks and our imperative ones: • fnl_push : ‘a -> ‘a stack -> ‘a stack • imp_push : ‘a -> ‘a stack -> unit • In general, we can pass in to and out of each function a dictionary that records the current values of references. • But then accessing or updating a reference takes O(lgn) time. • In a world where we do updates in place, we get O(1) time. • (actually only true if you think memory access has constant time.) • In practice, doesn’t solve the problems of aliasing and local reasoning.
How/when to use state? • In general, I try to write the functional version first. • e.g., prototype • don’t have to worry about sharing and updates • don’t have to worry about race conditions • replacement is easy (substitution model!) • Sometimes you find you can’t afford it, for big-O reasons. • example: routing tables need to be fast in a switch • constant time lookup, update (hash-table) • When I do use state, I try to encapsulate it behind an interface. • must think explicitly about sharing and invariants • write these down, write assertions to test them • if encapsulated in a module, these tests can be localized
Arrays and Mutable Fields • In addition to refs, Ocaml provides arrays and mutable fields within records. • in fact a T ref is just a {mutable contents : T} record. • For arrays, we have: • A.(i) to read the ith element of the array A • A.(i) <- 42 to write the ith element of the array A • Array.make : int -> ‘a -> ‘a array • Array.make 42 ‘x’ creates an array of length 42 with all elements initialized to the character ‘x’. • See the reference manual for more operations.