1. Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab

2. Outline • Goal: eliminate abstraction overhead using static analysis and program transformation • Topics: • Intraprocedural type inference • Static method selection • Specialization and Inlining • Static class prediction • Splitting • Box/unboxing • Common Subexpression Elimination • Overflow and range checks • Partial evaluation revisited • Partially based on: Chambers’ “Efficient Implementation of Object-oriented Programming Languages” OOPSLA Tutorial

3. (dg + ((x <num>) (y <num>) => <num>)) (dm + ((x <int>) (y <int>) => <int>) (%ib (%i+ (%iu x) (%iu y))) (dm + ((x <flo>) (y <flo>) => <flo>) (%fb (%f+ (%fu x) (%fu y))) (dm x2 ((x <num>) => <num>) (+ x x)) (dm x2 ((x <int>) => <int>) (+ x x)) Anatomy of Pure Proto Arithmetic Dispatch Boxing Overflow checks Actual instruction C Arithmetic Actual instruction Running Example

4. Biggest Inefficiencies • Method dispatch • Method calls • Boxing • Type checks • Overflow and range checks • Slot access • Object creation

5. Intraprocedural Type Inference • Goal: determine concrete class(es) of each variable and expression • Standard data flow analysis through control graph • Propagate bindingsb -> { class … } • Sources are literals, isa expressions, results of some primitives, and type declarations • Form unions of bindings at merge points • Narrow sets after typecases • Assumes closed world (or at least final classes)

6. Type Inference Example (set x (isa <tab> …)) ;; x in { <tab> } (set y (table-growth-factor x)) ;; y in { <int> <flo> } (set z (if t x y)) ;; z in { <tab> <int> <flo> }

7. (if (isa? x <int>) (+ x 1) (+ x 37.0)) (if (isa? x <int>) (let (([x <int>] x)) (+ x 1)) (let (([x !<int>] x)) (+ x 37.0))) Narrowing Type Precision

8. Static Method Selection (set x (isa <tab> …)) ;; x in { <tab> } (set y (table-growth-factor x)) ;; y in { <int> <flo> } (print out y) • If only one class is statically possible then can perform dispatch statically: (set y (<tab>:table-growth-factor x)) • If a couple classes are statically possible then can insert typecase: (sel (class-of y) ((<int>) (<int>:print y)) ((<flo>) (<flo>:print y)))

9. Type Check Removal • Type inference can clearly be used to remove type checks and casts (set x (isa <tab> …)) ;; x in { <tab> } (if (isa? x <tab>) (go) (stop)) ==> (set x (isa <tab> …)) ;; x in { <tab> } (go)

10. Pros: Simple Fast Fewer dependents Cons: Limited type precision No result types Incoming arg types No slot types Etc. Intraprocedural Type Inference Critique

11. Specialization • Q: How can we improve intraprocedural type inference precision? • A: Specialization which is the cloning of methods with narrowed argument types • Improves type precision of callee by contextualizing body: (dm sqr ((x <num>) (y <num>)) (* x y)) ==> (dm sqr ((x <int>) (y <int>)) (* x y)) (dm sqr ((x <flo>) (y <flo>)) (* x y)) • Must make sure super calls still mean same thing

12. Specialization of Constructors • Crucial to get object creation to be fast • Specialization can be used to build custom constructors (def <thingy> (isa <any>)) (slot <thingy> thingy-x 0) (slot (t <thingy>) thingy-tracker (+ (thingy-x t) 1)) (slot <thingy> thingy-cache (fab <tab>)) (df thingy-isa (x tracker cache) (let ((thingy (clone <thingy>))) (unless (== x nul) (set (%slot-value thingy thingy-x) x)) (set (%slot-value thingy thingy-tracker) (if (== tracker nul) (+ (thingy-x p) 1) tracker)))) (set (%slot-value thingy thingy-cache) (if (== cache nul) (fab <tab>) cache))))

13. Inlining • Q: Can we do better? • A: Inlining can improve specialization by inserting specialized body • Improves type precision at call-site by contextualizing body (includes result types): (dm f ((x <int>) (y <int>)) (+ (g x y) 1)) (dm g (x y) (+ x y)) ==> (dm f ((x <int>) (y <int>)) (+ (+ x y) 1))

14. Synergy: Method Selection + Inlining (df f ((x <int>) (y <int>)) (+ x y)) ;; method selection (df f ((x <int>) (y <int>)) (<int>:+ x y)) ;; inlining (df f ((x <int>) (y <int>)) (%ib (%i+ (%iu x) (%iu y))))

15. Pitfalls of Inlining and Specialization • Must control inlining and specialization carefully to avoid code bloat • Inlining can work merely using syntactic size trying never to increase size over original call • Class-centric specialization usually works by copying down inherited methods tightening up self references (harder for multimethods) • Can run inlining/specialization trials based on • Final static size • Performance feedback

16. Class Centric Specialization (def <point> (isa <any>)) (slot <point> (point-x <int>) 0) (dm point-move ((p <point>) (offset <num>)) (set (point-x p) (+ (point-x p) offset))) (def <color-point> (isa <point>)) ==> (dm point-move ((p <color-point>) (offset <num>)) (set (point-x p) (+ (point-x p) offset)))

17. Static Class Prediction • Can improve type precision in cases where for a given generic a particular method is much more frequent • Insert type check testing prediction • Can narrow type precision along then and else branches • Especially useful in combination with inlining

18. (df f (x) (let ((y (+ x 1))) (+ y 2))) (df f (x) (let ((y (if (isa? x <int>) (+ x 1) (+ x 1)))) (if (isa? y <int>) (+ y 2) (+ y 2))))) (df f (x) (let ((y (if (isa? x <int>) (<int>:+ x 1) (+ x 1)))) (if (isa? y <int>) (<int>:+ y 2) (+ y 2))))) Static Class Prediction Example

19. (df f (x) (let ((y (if (isa? x <int>) (+ x 1) (+ x 1)))) (if (isa? y <int>) (+ y 2) (+ y 2))))) ;; method selection (df f (x) (let ((y (if (isa? x <int>) (<int>:+ x 1) (+ x 1)))) (if (isa? y <int>) (<int>:+ y 2) (+ y 2))))) ;; inlining (df f (x) (let ((y (if (isa? x <int>) (%ib (%i+ (%iu x) %1)) (+ x 1)))) (if (isa? y <int>) (%ib (%i+ (%iu y) (%iu 2))) (+ y 2))))) Synergy: Class Prediction + Method Selection + Inlining

20. Splitting • Problem: Class prediction often leads to a bunch of redundant type tests • Solution: Split off whole sections of graph specialized to particular class on variable • Can split off entire loops • Can specialize on other dataflow information

21. (df f (x) (let ((y (+ x 1))) (+ y 2))) (df f (x) (if (isa? x <int>) (let ((y (+ x 1))) (+ y 2)) (let ((y (+ x 1))) (+ y 2)))) (df f (x) (if (isa? x <int>) (let ((y (<int>:+ x 1))) (<int>:+ y 2)) (let ((y (+ x 1))) (+ y 2)))) Splitting Example

22. Splitting Downside • Splitting can also lead to code bloat • Must be intelligent about what to split • A priori knowledge (e.g., integers most frequent) • Actual performance

23. Box / Unboxing (df + ((x <int>) (y <int>) => <int>) (%ib (%i+ (%iu x) (%iu y)))) (df f ((a <int>) (b <int>) => <int>) (+ (+ a b) a)) ;; inlining + (df f ((a <int>) (b <int>) => <int>) (%ib (%i+ (%iu (%ib (%i+ (%iu a) (%iu b)))) (%iu a)))) ;; remove box/unbox pair (df f ((a <int>) (b <int>) => <int>) (%ib (%i+ (%i+ (%iu a) (%iu b)) (%iu a))))

24. (df f (x) (if (isa? x <int>) (let ((y (+ x 1))) (+ y 2)) (let ((y (+ x 1))) (+ y 2)))) ;; method selection (df f (x) (if (isa? x <int>) (let ((y (<int>:+ x 1))) (<int>:+ y 2)) (let ((y (+ x 1))) (+ y 2)))) (df f (x) (if (isa? x <int>) (<int>:+ (<int>:+ x 1) 2) (let ((y (+ x 1))) (+ y 2)))) ;; inlining (df f (x) (if (isa? x <int>) (%ib (i+ (%iu (%ib (%i+ (%iu x) %1)))) %2)) (let ((y (+ x 1))) (+ y 2)))) ;; box/unbox (df f (x) (if (isa? x <int>) (%ib (%i+ (%i+ (%iu x) %1)) %2)) (let ((y (+ x 1))) (+ y 2)))) Synergy: Splitting + Method Selection + Inlining + Box/Unboxing

25. Common Subexpression Elimination (CSE) • Removes redundant computations • Constant slot or binding access • Stateless/side-effect-free function calls • Examples (or (elt (cache x) ‘a) (elt (cache x) ‘b)) ==> (let ((t (cache x))) (or (elt t ‘a) (elt t ‘b)) (if (< i 0) (if (< i 0) (go) (putz)) (dance)) ==> (if (< i 0) (go) (dance))

26. Overflow and Bounds Checksaka “Moon Challenge” • Goal: • Support mathematical integers and bounds checked collection access • Eliminate bounds and overflow checks • Strategy: • Assume most integer arithmetic and collection accesses occur in restricted loop context where range can be readily inferred • Perform range analysis to remove checks • Bound from above variables by size of collection • Bound from below variables by zero • Induction step is 1+

27. (rep (((sum <int>) 0) ((i <int>) 0)) (if (< i (len v)) (let ((e (elt v i))) (rep (+ sum e) (+ i 1))) sum)) ;; inlining bounds checks (rep (((sum <int>) 0) ((i <int>) 0)) (if (< i (len v)) (let ((e (if (or (< i 0) (>= i (len v))) (sig ...) (vref v i)))) (rep (+ sum e) (+ i 1))) sum)) ;; CSE (rep (((sum <int>) 0) ((i <int>) 0)) (if (< i (len v)) (let ((e (if (< i 0) (sig ...) (vref v i)))) (rep (+ sum e) (+ i 1))) sum)) ;; range analysis (rep (((sum <int>) 0) ((i <int>) 0)) (if (< i (len v)) (let ((e (vref v i))) (rep (+ sum e) (+ i 1))) sum)) Range Check Example

28. Overflow Check Removal aka “Moon Challenge” Critique • Pros: • simple analysis • Cons: • could miss a number of cases • but then previous approaches (e.g., box/unbox) could be applied

29. Advanced topic:Representation Selection • Embed objects in others to remove indirections • Change object representation over time • Use minimum number of bits to represent enums • Pack fields in objects

30. Advanced Topic:Algorithm Selection • Goal: compiler determines that one algorithm is more appropriate for given data • Sorted data • Biased data • Solution: • Embed statistics gathering in runtime • Add guards to code and split

31. Rule-based Compilation • First millennium compilers were based on special rules for • Method selection • Pattern matching • Oft-used system functions like format • Problems • Error prone • Don’t generalize to user code • Challenge • Minimize number of rules • Competitive compiler speed • Produce competitive code

32. Partial Evaluation to the Rescue • Holy grail idea: • Optimizations are manifest in code • Do previous optimizations with only p.e. • Simplify compiler based on limited moves • Static eval and folding • Inlining • Eliminate • Custom method selection • Custom constructor optimization • Etc.

33. (dm format (port msg (args …)) (rep nxt ((I 0) (ai 0)) (when (< I (len msg))) (let ((c (elt msg I))) (if (= c #\%) (seq (print port (elt args ai)) (nxt (+ I 1) (+ ai 1)))) (seq (write port c) (nxt (+ I 1) ai))))))) (format out “%>? ” n) First millennium solution is to have a custom optimizer for format (seq (print port n) (write port “> “)) Second millennium solution with partial evaluation (nxt 0 0) (seq (print port n) (nxt 1 1)) (seq (print port n) (seq (write port #\>) (nxt 2 1))) (seq (print port n) (seq (write port #\>) (seq (write port #\space)))) Partial Eval Example

34. Partial Eval Challenge • Inlining and static eval are slow • “Running” code through inlining • Need to compile oft-used optimizations • Residual code is not necessarily efficient • Sometimes algorithmic change is necessary for optimal efficiency • Example: method selection uses class numbering and decision tree whereas straightforward code does naïve method sorting • Perhaps there is a middle ground

35. Open Problems • Automatic inlining, splitting, and specialization • Efficient mathematical integers • Constant determination • Representation selection • Algorithmic selection • Efficient partial evaluation • Super compiler that runs for days

36. Reading List • Chambers: “Efficient Implementation of Object-oriented Programming Languages” OOPSLA Tutorial • Chambers and Ungar: SELF papers • Chambers et al.: Vortex papers