1 / 27

Formalization of Generics for the .NET Common Language Runtime

Formalization of Generics for the .NET Common Language Runtime. Dachuan Yu (Yale University) Andrew Kennedy , Don Syme (Microsoft Research Cambridge). Introduction. Upcoming revision of Microsoft .NET platform includes support for parametric polymorphism (“generics”) in

Download Presentation

Formalization of Generics for the .NET Common Language Runtime

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Formalization of Generics for the .NET Common Language Runtime Dachuan Yu (Yale University)Andrew Kennedy, Don Syme (Microsoft Research Cambridge)

  2. Introduction • Upcoming revision of Microsoft .NET platform includes support for parametric polymorphism (“generics”) in • Programming languages C#, Visual Basic, Managed C++ • Common Language Runtime (the “virtual machine”) • Visual Studio (Integrated Development Environment) • Libraries • Previous work (PLDI’01) described implementation techniques used in the CLR • Now we formalize the polymorphic intermediate language and aspects of the implementation

  3. CLR: The big picture C# program Visual Basicprogram SML.NETprogram C# compiler Visual Basic compiler SML.NET compiler IL IL IL Native binary Loader & JIT front-end Native interop Common Language Runtime JIT IL Remoting Garbage collector Security JIT code-gen Threads ExceptionHandling Machine code

  4. CLR: The big picture C# program Visual Basicprogram SML.NETprogram C# compiler Visual Basic compiler SML.NET compiler IL IL IL Native binary Loader & JIT front-end Native interop Common Language Runtime JIT IL Remoting Garbage collector Security JIT code-gen Threads ExceptionHandling Machine code

  5. High-level design of generics • Type parameterization for all declarations • classes e.g. class Set<T> • interfaces e.g. interface IComparable<T> • structse.g. struct HashBucket<K,D> • methods e.g. static void Reverse<T>(T[] arr) • delegates (“first-class methods”) e.g. delegate void Action<T>(T arg)

  6. Good design => Tricky Implementation • Unrestricted instantiationList<string> ls = new List<string>(); // reference typesList<double> ld = … // primitive typesList<Pair<string,double>> lsd = … // struct types • Full support for run-time typesif (x is Set<string>) { ... } // type-test y = (List<T>) z; // checked cast • Recursion in instantiationsclass List<T> : ICloneable<List<T>> // finiteclass C<T> { C<C<T>> fld; } // infinite

  7. Why formalize? • In previous work (POPL’01, Gordon & Syme) the aim was a type soundness proof for a subset of IL (Baby IL) • Our aims are different: • Implementation techniques used in the CLR product are subtle and difficult to get right (=> bugs, perhaps security holes) • We’d like to validate those techniques • Current JIT- and pre-compilers are not type-preserving • Our formalization provides a basis for typed compiler intermediate languages for more capable and robust compilers • It’s also difficult to express and apply optimizations • Formalization makes this easier • By-product is a generic variant on Baby IL

  8. Formalization: the big picture BILG classes and methods BILG = “Baby IL with Generics”A tiny subset of MS-IL Specialize generic classes and methodsShare instantiations w.r.t. data representationIntroduce types-as-valuesOptimize use of types-as-values BILC classes and methods BILC = “Baby IL with Constrained generics”A typed intermediate language more suitable for code-generation

  9. class ArrayUtils { static List<T> ArrayToList<T>(T[] arr){ …new List<T>()… } } class List<T> { virtual List<T> Append(object obj) { …(List<T>) obj… …new ListCell<T>…} } Illustrative example, in C# Want to share generated code for ArrayToList over different instantiations of T Pass type parameters at runtime? Look up type representations at runtime? Want to share generated code for List over different instantiations of T Look up type representations at runtime? How do we know what T is?

  10. Source Language: BILG • “Baby IL with Generics” • Purely functional, à la Featherweight Java (Igarashi, Pierce, Wadler) • Primitive types & generic classes • Inheritance-based subtyping • Generic methods (static and virtual) • Type-case operation (isinst) inspects run-time type of object • No overloading, no interfaces, no abstract methods, no structs (“value classes”), no delegates, no boxing, no null values, no heap, no bounded polymorphism • Just enough to demonstrate most of the implementation techniques! • Typing rules & big-step semantics in paper • Easier to work with big-step • ¬ 9 v. e  v taken as definition of divergence

  11. Source language: BILG (type) T,U ::= X | int32 | int64 | I (inst type) I ::= C<T1,…,Tn> (class def) cd ::= class C<X1,…,Xn > : I {T1 f1 ;…;Tm fm; md1 … mdk } (method def ) md ::= static T m<X1,…,Xn>(T1,…,Tm) { e; } | virtual T m<X1,…,Xn>(T1,…,Tm) { e; } (method ref) M ::= I::m<T1,…,Tn> (expr) e ::= ldc.i4 i4 | ldc.i8 i8 | ldarg x | e1 … en newobj I | e ldfld I::f | e1 … en call M | e e1 … en callvirt M | e isinst I or e

  12. BILG typing and evaluation for isinst E ` e : I E ` e’ : I’ E ` e isinst I’ or e’ : I fr` e  I’(f1=v1,…,fn=vn) ` I’ <: I fr ` e isinst I or e’  I’(f1=v1,…,fn=vn) fr` e  I’(f1=v1,…,fn=vn) ` ¬(I’ <: I) fr` e’  v’ fr ` e isinst I or e’  v’

  13. BILG typing and evaluation for isinst E ` e : I E ` e’ : I’ Observe: Types affect evaluation They cannot be erased They serve static and dynamic purposes E ` e isinst I’ or e’ : I fr` e I’(f1=v1,…,fn=vn) `I’ <: I fr ` e isinst I or e’  I’(f1=v1,…,fn=vn) fr` e I’(f1=v1,…,fn=vn) `¬(I’ <: I)fr` e’  v’ fr ` e isinst I or e’  v’

  14. Target Language: BILC • Similar to BILG, but adds • Representation constraints on type parameters • ref: “must be a reference type” • i4: “must be a 32-bit integer” • i8: “must be a 64-bit integer” • Types-as-values • RT is a value representing closed type T • The value RT has singleton type Rep(T), interpreted as “is a value representing the type T” • Construct reps for open types mkrepC<T1,…,Tn>(e1,…,en) creates a type-rep for C<T1,…,Tn> given type-reps for T1,…,Tn • Semantics given by small-step reduction relation

  15. Target language: BILC (subset) (type)T,U::=X | int32 | int64 | I (inst type)I::=C<T1,…,Tn> (extended types)  ::= T | Rep(T) (constraint) s ::= ref | i4 | i8 (class def)cd::=class C<X1 :s1,…,Xn :sn> : I {T1 f1 ;…;Tm fm; md1 … mdk } (method def )md::=static T m<X1 :s1,…,Xn :sn>(1,…, k) { e; } | virtual T m<X1 :s1,…,X :sn>(1,…, k ) { e; } (method ref) M ::= I::m<T1,…,Tn> (expr) e ::= i4 | i8 | x | I(e,e1,…,en) | e ldfld I::f | e1 … en call M | e e1 … en callvirt M | e isinstIe or e | RT | mkrepC<T1,…,Tn>(e1,…,en)

  16. Some typing and reduction rules E ` C<T1,…,Tn> ok E ` e1 : Rep(T1) … E ` en : Rep(Tn) E ` mkrepC<T1,…,Tn>(e1,…,en) : Rep(C<T1,…,Tn>) E ` e : I’ E ` e’ : Rep(I) E ` e’’ : I “Reflected subtyping”:RIÁ RI’ iff I <: I’ E ` e isinstI e’ or e’’ : I v = I(w,v1,…,vn) w Á w’ ` (v isinstT w or v’) ! v v = I(w,v1,…,vn) w § w’ ` (v isinstT w or v’) ! v’

  17. Some typing and reduction rules E ` C<T1,…,Tn> ok E ` e1 : Rep(T1) … E ` en : Rep(Tn) E ` mkrepC<T1,…,Tn>(e1,…,en) : Rep(C<T1,…,Tn>) E ` e : I’ E ` e’ : Rep(I) E ` e’’ : I E ` e isinstI e’ or e’’ : I v = I(w,v1,…,vn) w Á w’ Observe: Types do not affect evaluation They can be erased They serve only static purposes ` (v isinstT w or v’) ! v v = I(w,v1,…,vn) w § w’ ` (v isinstT w or v’) ! v’

  18. Example • Static generic method in BILG: static List<T> Conv<T>(object a) { …a isinst List<T>… • Translated to BILC: static Listi Convi(object a) { …a isinstTreei RTreei)… static Listl Convl(object a) { …a isinstTreel RTreel… static Listr<T> Convr<T:ref>(Rep(T) r, object a) { …a isinstListr<T> (mkrepListr<T>(r))… Specialized code for T= int32 Specialized code for T= int64 Code shared for reference types Extra parameter representing T Lookup/Create type rep at runtime

  19. We need more… • So far: • specialization, sharing, and separation of run-time types from static types • but mkrep is a costly operation, requiring type-rep creation at runtime • Idea: instead of passing representations for type parameters, pass representations of types that we actually need:static Listr<T> Convr<T:ref>(Rep(Listr<T>) r, object a) { …a isinstListr<T>(r)… Extra parameter representing List<T>

  20. We need more… • In general, we need many type-reps in a single method body • So we pass around dictionaries of type-reps • What type does a dictionary of type-reps have? • At its simplest, it is just a tuple e.g. Rep(List<X>) £ Rep(Vec<Vec<X>>) is type of a two-slot dictionary containing type-reps for List<X> and Vec<Vec<X>> • In general, dictionaries may contain cycles (e.g. for mutually recursive methods), so we need recursive values and their types • Worse still, polymorphic recursion requires “infinite” dictionaries • Simpler: use name-based types for dictionaries • reps for methods: Rep(M), RM, mkrepM(e1,…,en) • statically: each Rep-type determines a particular tuple of other Rep-types • dynamically: each type-rep RT or method-rep RM determines a tuple of type-rep/method-rep values

  21. Target language: BILC (full) (type)T,U::=X | int32 | int64 | I (inst type)I::=C<T1,…,Tn> (ext type)  ::= T | Rep(T) | Rep(M) (constraint) s ::= ref | i4 | i8 (class def)cd::=class C<X1 :s1,…,Xn :sn> : I {T1 f1 ;…;Tm fm; md1 … mdk } with 1,…,p (method def )md::=static T m<X1 :s1,…,Xn :sn>(1,…, k) { e; } with 1,…,p | virtual T m<X1 :s1,…,X :sn>(1,…, k) { e; } (method ref) M ::= I::m<T1,…,Tn> (expr) e ::= i4 | i8 | x | I(e,e1,…,en) | e ldfld I::f | e1 … en call M | e e1 … en callvirt M | e isinstIe or e | RT | RM | mkrepC<T1,…,Tn>(e1,…,en) | mkrepC<T1,…,Tn>::m<U1,…,Uk>(e1,…,en,e1,…,ek) | objdicti e | mdicti e

  22. Translation scheme • Static generic methods: • Extra dictionary parameter associated with method • Accessed using mdicti(e) • Virtual methods in generic classes • Obtain dictionary through type of object • Accessed using objdict_i(e) • Generic virtual methods: • Dictionary type not known statically (body could be overridden) • So pass reps for type parameters and construct type-reps at runtime using mkdrep

  23. In the paper… • Complete formalization of BILG, BILC, and a translation • Theorems: • Translation preserves types • Translation preserves behaviour • And in forthcoming technical report: • Full proofs • Type erasure theorem: types in BILC do not affect evaluation

  24. Future work • Extend BILG and the translation to cover more features • Value classes (structs) • Would satisfy representation constraint of form [s1,…,sn] where s1,…,sn are constraints on the fields’ representations • Now have unbounded number of specializations • All methods on generic structs whose code is shared take a dictionary parameter • Need treatment of boxing • Flexible specialization policies • Less sharing: e.g. full specialization of selected types • More sharing: e.g. share all instantiations of C<T> by boxing and unboxing appropriately (cf ML)

  25. Future work: structural typing • Flexible specialization interacts badly with run-time types based on name-equivalence • Instead, describe dictionaries using structural typing: • Products:Rep(List<X>) £ Rep(X) is two-slot dictionary with type-reps for List<X> and X • Circular dictionaries => Recursive types e.g. D. Rep(Vec<X>) £ (Rep(Set<X>) £ D) • Polymorphic recursion in code => Higher-kinded recursive types e.g. (D. X. Rep(Vec<X>) £ D(Set<X>)) string

  26. Related work • Rep(T) • Crary, Weirich, Morrisett: “Intensional polymorphism in type-erasure semantics” • Dictionary-passing for polymorphism implementation • Saha and Shao (ML) • Viroli and Natali (Java)

  27. Questions?

More Related