Breve introduzione a CLI/CLR

Breve introduzione a CLI/CLR Massimo Ancona DISI Università di Genova Testi: J. Gough, Compiling for .NET Common Language Runtime (CLR), .NET Series, B. Mayer Editor J. Richter, CLR via C#, Microsoft Press

CLR - Common Language Runtime The CLR has been designed with three objectives: • portability (write once, run anywhere), • reliability(make operations predictable), • reusability (object-orientation and parametric code [generics]). GenCLI has the objective of meeting all the three objective above.

CLR 2 The CLR machine is composed by the CTS specification (.NET Common Type Specification) and the CLR instructions. The CTS defines all possible data types and constructs supported by the .NET Run Time Environment (RTE) , while The CLR instructions define a virtual stack-based machine.

Execution Model CLR 3 Code generators for .NET emit CIL (IL for short), either in form of text file for subsequent assembly or directly into a file or memory buffer. The code of CIL are instructions for a virtual machine and are always executed indirectly by means of a Just-In-Time compiler (JIT). The JIT translates the instructions of IL into machine code for a specific computer on which the program has to be executed. Program executable modules called assemblies are usually demand-loadedand are just-in-time compiled (JIT-ed) at the time of loading.

Execution Model CLR 4 At load time each assembly is subject to some form of checking. The execution engine is able to ensure that the assembly is memory-safe. Programs that are intended to pass the checks [of verification] are said to written in in verifiable code.

Verifiable Code CLR 5 Verifiable code must conform to several requirements. First of all dynamically allocated memory must be managed data. This means that all objects must be allocated from the garbage collected heap, and must be self-describing. The GC must be able to discern the exact type of the object from inspection of the object encoding.

Verifiable Code CLR 6 • Operations on data must be performed in such a way that the verifier is able to statically prove that the operation is safe for the type of object. • Method calls must pass arguments that are conformant to the statically specified method signature. • For most programming languages not all PGMs can be translated into verifiable code. In such cases a programmers who whish their PGM to pass verification must restrict themselves to a subset of the language.

Verifiable Code CLR 7 Programming constructs that can cause problems are, for example, union types (varianttypes) and pointer arithmetic. As well as speaking of managed data we speak of managed code. Managed code is code that is executed by the CLR as opposed to ordinary native-code execution. An erroneous address computation allows an arbitrary memory allocation to be overwritten.

Verifiable Code CLR 8 An erroneous address computation can be generated by: • Accessing a deallocated memory location • Accessing a non-existing array element • Treating a pointer of one type as another • Sending wrongly typed arguments to a function

Memory Safety by Design 0 • How to design languages and RTEs for which every semantically correct source program may be compiled into a memory safe executable program. • One approach is to define a statically typed (or strongly typed)programming language, e.g. Modula-2 • .NET system provides a framework for memory-safe programming. There are a number of different aspects of .NET that contribute toward this outcome: • dynamically allocated data in verifiable code is garbage collected • Every datum is of known type at runtime.

Memory Safety by Design 1 Objects of reference type are allocated from a heap called the managed heap. The managed heap is garbage collected and the CLR provides instructions for managing it in a safe way. Value types are not allocated on the managed heap. However, an object of value type can be converted to a reference type by using the boxing mechanism: a copy of the object value is allocated on the managed heap and its address is returned as a reference type.

Memory Safety by Design 2 How to design languages and run-times for which every semantically correct source PGM may be compiled into a memory-safe executable PGM. The .NET execution engine is able to ensure that the generated code is safe by performing a verification process. It checks that every method is called with the correct number of parameters, and that each parameter passed is of the correct type.

Memory Safety by Design 3 In order to be safe the generated code must allocate dynamic objects only as managed data on the managed heap by means of specific CLR instructions. The code generated for .NET is always executed indirectly via a JIT (Just In time Translator) that translates the code generated by a .NET compiler, into native machine code, while safety checks are performed at load time, just before the JIT translation.

Memory Safety by Design 4 .NET resolves these problems by a combination of load-time and runtimechecking. The load-time verifier computes the types of all data used by the IL code of a PGM. This involve significant computations based on the control flowgraph: the verifier checks that all data. This involve access to multiple assemblies because consistency of argumt types between method caller and callee may cut across PEM boundariees.

CTS • CTS provides three sets of types: • primitive types, managed by the compiler, • reference types, allocated on the managed heap, and • value types

CTS Types Hierarchy

CTS 3 The CLS (Common Language specification, a subset of CTS) defines the requirements to be met by a language in order to be classified as a safe .NET language. Programs generated by such a compiler, in order to pass the verification process, must be written in verifiable code. Example: GenCLI generates only verifiable high-level IL making the with Rpython compiler, a de facto .NET compiler.

CTS Generics 1 The CTS allows the creation of generic reference types as well as generic value types. In addition, the CLR allows the creation of generic classes, interfaces, and generic delegates. Moreover, the CLR allows the creation of generic methods that are defined in a reference type, value type, or interface

CTS Generics 2 • Adding generics to the CLR required to: • create new IL instructions that are aware of type arguments • insert type names and methods with generic parameters in metadata tables • modify languages, compilers and the JIT compiler to process the new type-argument-aware IL instructions.

CLR Assemblies 1 Combining managed modules into Assemblies Pg 6 The CLR does not actually work with modules it works with assemblies. An assembly is a logical grouping of one or more modules or resource files. An assembly is the smallest unit of reuse, security and versioning. It supports the separation of types and resources into separate files used by users of the assembly

CLR/CTS Assemblies 2 An assembly is the smallest unit of reuse, security and versioning. It supports the separation of types and resources into separate files used by users of the assembly

CLR/CTS Assemblis 3 An assembly is the smallest unit of reuse, security and versioning. It supports the separation of types and resources into separate files used by users of the assembly

Mapping Oberon-2 to CLR The record types of Oberon-2 need to be mapped in some way to the class constructs of the CTS. Oberon-2 does not make a declarative distinction between value and reference aggregate types. Record types always have value semantics, and pointer types always have a reference semantics. Our choice is the following.

Mapping Oberon One of the most relevant features of CLR (.NET 2.0) are generics. With generics, it is now possible for the .NET languages to easily create type-safe, reusable code. The term generics, means parameterized types. A parameterized type is a class, interface, method, or delegate in which the type of data upon which it operates is specified as a parameter. A class, interface, method, or delegate that operates on a parameterized type is called generic, class, interface, method or delegate.

Mapping Oberon-2 to CLR Record types that are not extensible [i.e., heirless] nor extensions of another type are implemented as value classes. If a program declare a pointer to such a record type, the pointer type is implemented as a reference class with a single field of the type of the value class. This reference class is an explicit boxed occurrence of the embedded value class. It has at least one advantage over the automatically boxed classes manipulated by “box” and “unbox” instructions. In this case we may access the fields of the boxed value without unboxing.

CTS X+1 Procedures that are bound to such a record type [equivalent to a method in Oberon-2 ] are implemented as (non-virtual) instance methods of the value class. Procedures bound to a type that is a pointer to the record are implemented as (non-virtual) instance methods of the explicitly boxed class: MODULE ValCls; IMPORT CPmain; TYPE RecTyp = RECORD c: CHAR END; PtrTyp=POINTER TO RecTyp; PROCEDURE (IN r:RecTyp) Foo(), NEW; END Foo; PROCEDURE ( r: PtrTyp) Bar(), NEW; END Bar; There is an interesting artefact of this design. Procedures bound [methods] to the record type, and to the pointer to record type, are bound to the same underlying type in the source semantics but are bound to separate types in the implementation. It seems curious, but no ambiguity can arise [esempio].

PL0 29 ssym['+']:=plus; ssym['-']:=minus; ssym['*']:=times; ssym['/']:=slash; ssym['(']:=lparen; ssym[')']:=rparen; ssym['=']:=eql; ssym[',']:=comma; ssym['.']:=period; ssym['#']:=neq; ssym['<']:=lss; ssym['>']:=gtr; ssym['%']:=leq; ssym['@']:=geq; ssym['<']:=lss; ssym['>']:=gtr; ssym[';']:=semicolon;

PL0 30 mnemonic[lit]:='LIT '; mnemonic[opr]:='OPR '; mnemonic[lod]:='LOD '; mnemonic[sto]:='STO '; mnemonic[cal]:='CAL '; mnemonic[int]:='INT '; mnemonic[jmp]:='JMP '; mnemonic[jpc]:='JPC '; declbegsys:=[constsym,varsym,procsym]; statbegsys:=[beginsym,callsym,ifsym,whilesym]; facbegsys:=[ident,number,lparen]; RESET(in,'pl0','pgm');err:=0; cc:=0;ll:=0;ch:=' ';kk:=al; REWRITE(cout,'PL0','asm'); getsym; mysys:=[period]+declbegsys+statbegsys; block(0,0,mysys(*[period]+declbegsys+statbegsys*)); WRITELN('END COMPILATION'); IF sym<>period THEN error(9) FI;WRITECODE;CLOSE(cout); IF err = 0 THEN WRITE('CICCIO'); interpret ELSE WRITE('Errors IN PL/0 PROGRAM') FI; WRITELN END.

Hendren93register Grafo di interferenza G=(V,E)(Chaiting) Ciascun vertice in G corrisponde ad un live range di una variabile del programma. Un arco unisce due vertici del grafo se vi è interferenza tra i due vertici del grafo cioè un overlapping temporale dei corrispondenti live range. Più precisamente uno è vivo in un punto di definizione del secondo. Un

Hen 93

Hen 93 Definizione. Un grafo di intervalli (grafo di intersezione) G (IG=Interval Graph): è definito da un insieme di intervalli sulla retta nel modo seguente: • Ad ogni intervallo I viene associato un vertice v di V • Esiste un arco e  E e=(v,w) gli intervalli Iv e Iw, associati a v e w rispettivamente, hanno intersezione non vuota IvIw.

Hen 93 Un vertice del grafo ha grado k se ha k vertici vicini (direttamente ad esso connessi) Il metodo di Chaitin colora con m colori il grafo con la proprietà che due vertici adiacenti abbiano colori diversi. Una colorazione del grafo di interferenza con k colori definisce una soluzione feasible con k registri

Breve introduzione a CLI/CLR

Breve introduzione a CLI/CLR

Presentation Transcript