実行時バイトコード特化のオブジェクト指向言語への拡張

実行時バイトコード特化のオブジェクト指向言語への拡張実行時バイトコード特化のオブジェクト指向言語への拡張アフェルト　レナルド

Known constructsUnknown constructs Method Specialization Original program Generic code Arith obj = new Arith ();int x = 42;int x = 666;int x_plus_y = obj.add (x, y); class Arith { int add (int x, int y) { return x + y; } } User BCS Arith/add x:Sy:D Client program Specialized code Arith obj = new Arith (); int x = 42; int y = 666; Arith_Abs obj_spec = specialize (obj, x); int x_plus_y = obj_spec.add_spec (y); class Arith_Templ { int add_spec (int y) { return 42 + y; } }

Overview of BCS • The specialization process is divided in: • Binding-time analysis (compile-time) • Code generation generation (compile-time) • Code generator (specialization-time) • Run-time specialization (partial evaluation) • Code generation (specialization) happens at run-time • Offline • Binding-time analysis happens at compile-time • Bytecode-level

Outline • Imperativity related issues • Pointer lifting • Side-effects • Object related issues • Partially static objects • Virtual dispatching • Object construction • Experiments • TODO

Imperativity Related Issues • References • Side-effects

Pointer Lifting • Lifted reference: • “A reference known at specialization-time that must be remembered to carry out some operation at execution-time” • Concretely its value must be embedded in the specialized method

pair:S fst:S snd:D Pointer Lifting Known constructs Unknown constructs Lifted constructs Analysis-time class Arith { int add (Pair pair) { return pair.fst + pair.snd; }} Arith/add Specialization-time Heap chunk class Arith { int add (Pair pair) { return pair.fst + pair.snd; } } Execution-time Heap chunk class Arith_Abs { int add_spec () { return 42 + 0x1234.snd; }}

Pointer Lifting • Specialization-time: • Save the lifted reference into the spec-time register • Execution-time: • Indexed access to the spec-time register • Generate aconst_null if the lifted reference is nil spec-time register specialized code Lifted references iconst_1 COPY 0 getfield fst: Point Residualizedaccess index

Method Specialization Applications • Amortization requirement: • “The number of runs of the specialized code that compensate the specialization is reasonably small” • Applications: • Some moments are more critical than others • Amortized in a few runs • E.g.: Database requests • Extensive reuse of specialized code • Amortized in many runs • E.g.: Computer graphics • Run-time Specialization <> Staged-computation

Known constructs Handling of Side-effects Client program Generic code class Arith { void addAndStore (Pair p, Store s) { s.value = p.fst + p.snd; } } Arith obj = new Arith ();Pair pair = new Pair (1, 2);Store store = new Store ();int[] tab = {0, 1, 2}obj.addAndStore (Pair, Store); User BCS Side-effect analysis Arith/addAndStore p:S s:S Client program Specialized code Arith obj = new Arith (); Pair pair = new Pair (1, 2);Store store = new Store (); Arith_Abs obj_spec = specialize (obj, pair);obj_spec.addAndStore_spec (Store); class Arith_Templ { void addAndStore_spec (Store s) { 0x1234.value = 3; } }

Skipped Points • Global variables • Instance field of the enclosing object are treated as parameters • Class fields are C-like global variables • Aliasing • Aliases created by method calls are handled directly • Quid in general? • Control Flow • JVML is an unstructured language

Object Related Issues • Partially static object • Virtual dispatching • Object construction

Partially Static Objects • Supported partially static objects: • Unsupported partially static objects ray: Ray: S origin: Point: S vector: Vector: D ray: Ray: D ray: Ray: D origin: Point: S vector: Vector: D origin: Point: S vector: Vector: D

Virtual Dispatching • Virtual dispatching (late binding) is: • A feature of object-oriented languages • A major source of inefficiency • Irrelevant virtual dispatching can be eliminated by: • Class-hierarchy analysis • Complemented by run-time specialization • Implementation: • Virtual dispatch is evaluated away during specialization • Optimization enabled: • Traditional compiler optimizations across virtual dispatching call sites

Virtual Dispatching Interaction diagram • A genericPowerclass: PowerClient class Power { int exp; Binary op; int neutral; int raise (int base) { return loop (base, exp); } int loop (int b, int e) { if (e == 0) return neutral; else return op.eval (b, loop (b, e-1)); } } op

Virtual Dispatching • The specializer code (code generator): class PowerSpecializer { static … eval_gen (Binary op, …) { class op_class = op.getClass (); if (op_class.isInstance (new Mul ())) { return eval_gen (op : Mul, …); } else if () { return Eval_gen (op : Add, …); } static … eval_gen (Add op, …) { … } static … eval_gen (Mul op, …) { … } } dispatcher method code generators

Virtual Dispatching • Specialized code (residual code): Interaction diagrams • int raise_spec (int, …) { • 0 iload_1 • aload_2 • astore 4 • istore_3 • iload_1 • aload 4 • iload_3 • istore_2 • aload 4 • astore_1 • … • ireturn • } Power_Templ PowerClient PowerClient raise_spec b*b*b*b*1

Object Construction • Problem: How do we handle • ‘Dynamically’ allocated objects (new) • Statically constructed and operated on • From the BTA viewpoint: • Static local object constructed by a constructor called with static arguments • Dynamic local object constructed by a a constructor with at least one dynamic argument • From the CGG viewpoint: • Dynamic local objects construction is residualized • Static local objects are CLONEd

Preserving Pointer Equality • Some points: • Don’t clone parameters • Don’t clone local references twice • Always clone escaping objects • Solutions: • Local references and parameters are distinguished in the spec-time register • A per-specialized method run-time register records cloned on the way

The Cloning Strategy • Specialization fills a spec-time register Static heap Lifted references class Point { int x; int y; } Parameter of local reference ? Residualizedaccess index Specialization store class Point { int x; int y; } class Pair { Point fst; Point snd: }

The Cloning Strategy • Each specialized code instance builds a run-time register • Cloning • Initialization? spec-time register run-time register CLONE 2

The Cloning Strategy specialized code spec-time register run-time register iconst_1 CLONE 2 astore_0

Is it Worth Doing? • Advantages: • Clean way to specify the BTA • Evaluates away pointer-specific operations • Inconvenients: • Inefficient in practice • Not easy to implement • But it also enables further optimizations: • Data sharing among instances of the same specialized code • Data splitting for stack allocation

Data Sharing • The following code concatenate two integers class Text { String add (int a, int b) { /* String tmp0 = new String (“a = “ + a); String tmp1 = new String (“b = “ + b); String ret = tmp0 + tmp1; */ String tmp0 = new String (((new StringBuffer (“a = “)).append (a)).toString ()); String tmp1 = new String (((new StringBuffer (“b = “)).append (b)).toString ()); String ret = ((new StringBuffer (String.valueOf (tmp0))) .append (tmp1)).toString (); return ret; } } Known constructs Unknown constructs

Data Sharing class Text { String add (int a, int b) { String tmp0 = new String (((new StringBuffer (“a = “)).append (a)).toString ()); String tmp1 = new String (((new StringBuffer (“b = “)).append (b)).toString ()); String ret = ((new StringBuffer (String.valueOf (tmp0))) .append (tmp1)).toString (); return ret; } } Known constructs Unknown constructs

Data Sharing class Text { String add (int b) { String tmp0 = CLONE 0; String tmp1 = new String (((new StringBuffer (“b = “)).append (b)).toString ()); String ret = ((new StringBuffer (String.valueOf (tmp0))) .append (tmp1)).toString (); return ret; } } Unknown constructs Lifted constructs

Data Sharing • The specialization code looks like • In that example, each instance of the specialized code can actually share a single ‘local object’ class Text { String add (int b) { String tmp0 = CLONE 0; // COPY 0 is also safe! String tmp1 = new String (((new StringBuffer (“b = “)).append (b)).toString ()); String ret = ((new StringBuffer (String.valueOf (tmp0))) .append (tmp1)).toString (); return ret; } }

Data Sharing • Cloning strategy extension: • * is also a good candidate for stack allocation optimizations

Data Splitting • List search class SearchList { int method (int n, int data, int key) { list list = new List (); list.key = 0; list.data = data++; list ptr = list; for (int i = 1; i < n; i++) { ptr.next = new List (); ptr = ptr.next; ptr.key = i; ptr.data = data++; } for (ptr = list; ptr.key != key; ptr = ptr.next) ; return ptr.data }

Data Splitting • Specialized method: class SearchList_Templ { int method (int data) { (CLONE 0).data = data ++: … (CLONE n-1).data = data ++; return (COPY key).data; } }

Data Splitting • Optimized specialized method: • class SearchList_Templ { • int method (int data) { • int data0, … datan-1; • data0 = data ++: • … • datan-1 = data ++; • return datakey; • } • }

Experiments • Ray tracer: w.r.t. the objects in the scene (500x500, 51 objects, depth 4, primary rays only)

Experiments • Ray tracer: w.r.t. the objects in the scene and the observer position

Experiments • Mandelbrot set: w.r.t. z**5 + c

Conclusion • Closest related work: • Jspec (Harissa + Tempo + Assirah) • What has been done so far: • Support for previously discussed points • Miscellaneous fixes • Partial performance evaluation • Code soiling 

TODO • Evaluate the cloning strategy w.r.t. enabled optimizations • Mechanism to handle global variables • Mechanism to enable selective inlining • Not to inline methods accessing to private fields in particular • Reaching-definition analysis vs side-effect analysis • A third binding-time for partial evaluation of all partially static objects • Reuse someone else’s static analyses

実行時バイトコード特化のオブジェクト指向言語への拡張

実行時バイトコード特化のオブジェクト指向言語への拡張

Presentation Transcript