Parametric Polymorphism for Popular Programming Languages

Parametric Polymorphism for Popular Programming Languages Andrew Kennedy Microsoft Research Cambridge

Or:Forall for all Andrew Kennedy Microsoft Research Cambridge(Joint work with Don Syme)

Curriculum Vitae for FOOLs http://research.microsoft.com/~akenn

Parametric polymorphism Parameterize types and code by types • Concept: Strachey (1967) • Language: ML (Milner, 1975), Clu (Liskov, 1975) • Foundations: System F (Girard, 1971), Polymorphic lambda calculus (Reynolds, 1974) • Engineering benefits are well-known (code re-use & strong typing) • Implementation techniques are well-researched

Polymorphic Programming Languages Standard ML Eiffel O’Caml C++ Ada Clu GJ Haskell Mercury Miranda Pizza

Widely-usedPolymorphic Programming Languages C++

Widely-used Strongly-typedPolymorphic Programming Languages

In 2004? C# Visual Basic? Java Cobol, Fortran, …?

This talk The .NET “generics” project: • What was challenging? • What was surprising? • What’s left?

What is the .NET CLR (Common Language Runtime)? • For our purposes: the CLR • Executes MS-IL (Intermediate Language) programs using just-in-time or way-ahead-of-time compilation • Provides an object-oriented common type system • Provides managed services: garbage collection, stack-walking, reflection, persistence, remote objects • Ensures security through type-checking (verification) and code access security (permissions + stack inspection) • Supports multiple source languages and interop between them

Themes • Design: Can multiple languages be accommodated by a single design? What were the design trade-offs? • Implementation: How can run-time types be implemented efficiently? • Theory: How expressive is it? • Practice: Would you like to program in it? • Future: Have we done enough?

Timeline of generics project May 1999 Don Syme presents proposal to C# and CLR teams Feb 2000 Initial prototype of extension to CLR Feb 2001 Product Release of .NET v1.0 Jan 2002 Our code is integrated into the product team’s code base Nov 2002 Anders Hejlsberg announces generics at OOPSLA’02 late 2004? Product release of .NET v1.2 with generics

Design

Design for multiple languages C++ Give me template specialization C++ Can I write class C<T> : T C#Just give me decent collection classes C++ And template meta-programming Visual BasicDon’t touch my language! JavaRun-time types please EiffelAll generic types covariant please MLFunctors are cool! HaskellRank-n types? Existentials? Kinds? Type classes? SchemeWhy should I care?

Some design goals • SimplicityDon’t surprise the programmer with odd restrictions • ConsistencyFit with the object model of .NET • Separate compilationType-check once, instantiate anywhere

Non-goals • C++ style template meta-programmingLeave this to source-language compilers • Higher-order polymorphism, existentialsHey, let’s get the basics right first!

What’s in the design? • Type parameterization for all declarations • classes e.g. class Set<T> • interfaces e.g. interface IComparable<T> • structse.g. struct HashBucket<K,D> • methods e.g. static void Reverse<T>(T[] arr) • delegates (“first-class methods”) e.g. delegate void Action<T>(T arg)

What’s in the design (2)? • Bounds on type parameters • single class bound (“must extend”)e.g. class Grid<T> where T : Control • multiple interface bounds (“must implement”)e.g. class Set<T> where T : IComparable<T>

Simplicity => no odd restrictions interface IComparable<T> { int CompareTo(T other); } class Set<T> : IEnumerable<T> where T : IComparable<T>{ private TreeNode<T> root; public static Set<T> empty = new Set<T>(); public void Add(T x) { … } public bool HasMember(T x) { … }}Set<Set<int>> s = new Set<Set<int>>(); Interfaces and superclass can be instantiated Bounds can reference type parameter (“F-bounded polymorphism”) Even statics can use type parameter Type arguments can be value or reference types

Consistency => preserve types at run-time • Type-safe serialization: • Interop with legacy code: • Reflection: Object obj = formatter.Deserialize(file);LinkedList<int> list = (LinkedList<int>) obj; // Just wrap existing Stack until we get round to re-implementing it class GStack<T> { Stack st; public void Push(T x) { st.Push(x); } public T Pop() { return (T) st.Pop(); }… object obj; …Type ty = obj.GetType().GetGenericArguments()[0];

Separate compilation => restrict generic definitions • No dispatch through a type parameter • No inheritance from a type parameter class C<T> { void meth() { T.othermeth(); } // don’t know what’s in T} class Weird<T> : T { … } // don’t know what’s in T

Implementation

Compiling polymorphism, as was Two main techniques: • Specialize code for each instantiation • C++ templates, MLton & SML.NET monomorphization • good performance  • code bloat  • Share code for all instantiations • Either use a single representation for all types (ML, Haskell) • Or restrict instantiations to “pointer” types (Java) • no code bloat  • poor performance (extra boxing operations required on primitive values)

Compiling polymorphism in the Common Language Runtime • Polymorphism is built-in to the intermediate language (IL) and the execution engine • CLR performs “just-in-time” type specialization • Code sharing avoids bloat • Performance is (almost) as good as hand-specialized code

Code sharing • Rule: • share field layout and code if type arguments have same representation • Examples: • Representation and code for methods in Set<string> can be also be used for Set<object> (string and object are both 32-bit pointers) • Representation and code for Set<long> is different from Set<int> (int uses 32 bits, long uses 64 bits)

Exact run-time types • We want to supportif (x is Set<string>) { ... } else if (x is Set<Component>) { ... } • But representation and code is shared between compatible instantiations e.g. Set<string> and Set<Component> • So there’s a conflict to resolve… • …and we don’t want to add lots of overhead to languages that don’t use run-time types (ML, Haskell)

Object representation in the CLR vtable ptr vtable ptr element type fields no. of elements elements normal object representation:type = vtable pointer array representation:type is inside object

Object representation for generics • Array-style: store the instantiation directly in the object? • extra word (possibly more for multi-parameter types) per object instance • e.g. every list cell in ML or Haskell would use an extra word • Alternative: make vtable copies, store instantiation info in the vtable • extra space (vtable size) per type instantiation • expect no. of instantiations << no. of objects • so we chose this option

Object representation for generics x : Set<string> y : Set<object> vtable ptr vtable ptr fields fields code for Add Add Add code for HasMember HasMember HasMember ToArray ToArray code for ToArray … … string object

Type parameters in shared code • Run-time types with embedded type parameters e.g.class TreeSet<T> { void Add(T item) { ..new TreeNode<T>(..).. } }Q: Where do we getT from if code for m is shared?A: It’s always obtainable from instantiation info in this objectQ: How do we look up type rep for TreeNode<T> efficiently at run-time?A: We keep a “dictionary” of such type reps in the vtable for TreeSet<T>

Dictionaries in action class Set<T> { … public void Add(T x) { … …new TreeNode<T>()… } public T[] ToArray() { … …new T[]… }}Set<string> s = new Set<string>();s.Add(“a”);Set<Set<string>> ss = new Set<Set<string>>();ss.Add(s);Set<string>[] ssa = ss.ToArray();string[] sa = s.ToArray();

Dictionaries in action vtable for Set<string> class Set<T> { … public void Add(T x) { … …new TreeNode<T>()… } public T[] ToArray() { … …new T[]… }}Set<string> s = new Set<string>();s.Add(“a”);Set<Set<string>> ss = new Set<Set<string>>();ss.Add(s);Set<string>[] ssa = ss.ToArray();string[] sa = s.ToArray(); …vtable slots… string

…vtable slots… string TreeNode<string> Dictionaries in action vtable for Set<string> class Set<T> { … public void Add(T x) { … …new TreeNode<T>()… } public T[] ToArray() { … …new T[]… }}Set<string> s = new Set<string>();s.Add(“a”);Set<Set<string>> ss = new Set<Set<string>>();ss.Add(s);Set<string>[] ssa = ss.ToArray();string[] sa = s.ToArray();

…vtable slots… string TreeNode<string> Dictionaries in action vtable for Set<string> class Set<T> { … public void Add(T x) { … …new TreeNode<T>()… } public T[] ToArray() { … …new T[]… }}Set<string> s = new Set<string>();s.Add(“a”);Set<Set<string>> ss = new Set<Set<string>>();ss.Add(s);Set<string>[] ssa = ss.ToArray();string[] sa = s.ToArray(); vtable for Set<Set<string>> …vtable slots… Set<string>

…vtable slots… string TreeNode<string> Dictionaries in action vtable for Set<string> class Set<T> { … public void Add(T x) { … …new TreeNode<T>()… } public T[] ToArray() { … …new T[]… }}Set<string> s = new Set<string>();s.Add(“a”);Set<Set<string>> ss = new Set<Set<string>>();ss.Add(s);Set<string>[] ssa = ss.ToArray();string[] sa = s.ToArray(); vtable for Set<Set<string>> …vtable slots… Set<string> TreeNode<Set<string>>

…vtable slots… string TreeNode<string> Dictionaries in action vtable for Set<string> class Set<T> { … public void Add(T x) { … …new TreeNode<T>()… } public T[] ToArray() { … …new T[]… }}Set<string> s = new Set<string>();s.Add(“a”);Set<Set<string>> ss = new Set<Set<string>>();ss.Add(s);Set<string>[] ssa = ss.ToArray();string[] sa = s.ToArray(); vtable for Set<Set<string>> …vtable slots… Set<string> TreeNode<Set<string>> Set<string>[]

Dictionaries in action vtable for Set<string> class Set<T> { … public void Add(T x) { … …new TreeNode<T>()… } public T[] ToArray() { … …new T[]… }}Set<string> s = new Set<string>();s.Add(“a”);Set<Set<string>> ss = new Set<Set<string>>();ss.Add(s);Set<string>[] ssa = ss.ToArray();string[] sa = s.ToArray(); …vtable slots… string TreeNode<string> string[] vtable for Set<Set<string>> …vtable slots… Set<string> TreeNode<Set<string>> Set<string>[]

x86 code for new TreeNode<T> mov ESI, dword ptr [EDI]mov EAX, dword ptr [ESI+24]mov EAX, dword ptr [EAX]add EAX, 4mov dword ptr [EBP-0CH], EAXmov EAX, dword ptr [EBP-0CH]mov EBX, dword ptr [EAX]test EBX, EBXjne SHORT G_M003_IG06G_M003_IG05:push dword ptr [EBP-0CH]push ESImov EDX, 0x1b000002mov ECX, 0x903ea0call @RuntimeHandlejmp SHORT G_M003_IG07G_M003_IG06:mov EAX, EBXG_M003_IG07:mov ECX, EAXcall @newClassSmall Retrieve dictionary entry from vtable If non-null then skip Look up handle the slow way Create the object with run-time type

Is it worth it? • With no dictionaries, just run-time look-up: • new Set<T>() is 10x to 100x slower than normal object creation • With lazy dictionary look-up: • new Set<T>() is ~10% slower than normal object creation

Shared code for polymorphic methods • Polymorphic methods • Specialize per instantiation on demand • Again share code between instantiations where possible • Run-time types issue solved by “dictionary-passing” style

Performance • Non-generic quicksort:void Quicksort(object[] arr, IComparer comp) • Generic quicksortvoid GQuicksort<T>(T[] arr, GIComparer<T> comp) • Compare on element types int, string, double

Performance

Theory

Transposing F to C# • As musical keys, F and C♯ are far apart • As programming languages, (System) F and (Generic) C♯ are far apart • But: Polymorphism in Generic C♯is as expressive as polymorphism in System F

System F and C♯

System F into C♯ • Despite the differences, we can formalize a translation from System F into (Generic) C♯ that • is fully type-preserving (no loss of information) • is sound (preserves program behaviour) • makes crucial use of the fact that: polymorphic virtual methodsexpressfirst-class polymorphism

Polymorphic virtual methods • Define an interface or abstract class:interface Sorter { void Sort<T>(T[] a, IComparer<T> c); } • Implement the interface:class QuickSort : Sorter { ... }class MergeSort : Sorter { ... } • Use instances at many type instantiations:void TestSorter(Sorter s, int[] ia, string[] sa) { s.Sort<int>(ia, IntComparer); s.Sort<string>(sa, StringComparer);}TestSorter(new QuickSort(), ...);TestSorter(new MergeSort(), ...);

Compare: • Define an SML signature:signature Sorter = sig val Sort : ‘a array * (‘a*’a->order) –> unit end • Define structures that match the signature:structure QuickSort :> Sorter = ... structure MergeSort :> Sorter = ... • Use structures at many type instantiations:functor TestSorter(S : Sorter) = struct fun test (ia, sa) = (S.Sort(ia, Int.compare); S.Sort(sa, String.compare) endstructure TestQS = TestSorter(QuickSort); TestQS.test(...);structure TestMS = TestSorter(MergeSort); TestMS.test(...);

Or (Russo first-class modules): • Define an SML signature:signature Sorter = sig val Sort : ‘a array * (‘a*’a->order) –> unit end • Define structures that match the signature:structure QuickSort :> Sorter = ... structure MergeSort :> Sorter = ... • Use a function to test the structures:fun TestSorter (s, ia, sa) = let structure S as Sorter = s in (S.Sort(ia, Int.compare); S.Sort(sa, String.compare)) endTestSorter ([structure QuickSort as Sorter], ...);TestSorter ([structure MergeSort as Sorter], ...);

Observations • Translation from System F to C# is global • generates new class names for (families of) polymorphic types • The generics design for Java (GJ) also supports polymorphic virtual methods • C++ has “template methods” but not virtual ones • for good reason: it compiles by expansion • Distinctiveness of polymorphic virtual methods shows up in (type-passing) implementations (e.g. CLR) • requires execution-time type application

Parametric Polymorphism for Popular Programming Languages