1 / 50

CSE P501 – Compiler Construction

CSE P501 – Compiler Construction. Optimization: Goals Eliminating Redundancy & Value N umbering Control flow graphs (CFGs ) Optimization Scopes: local, regional, global, whole-program Dominators See Cooper&Torczon chapter 8, 9. Code Improvement. Pick a better algorithm(!)

pakuna
Download Presentation

CSE P501 – Compiler Construction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE P501 – Compiler Construction Optimization: Goals Eliminating Redundancy & Value Numbering Control flow graphs (CFGs) Optimization Scopes: local, regional, global, whole-program Dominators See Cooper&Torczon chapter 8, 9 Jim Hogg - UW - CSE - P501

  2. Code Improvement • Pick a better algorithm(!) • Eg: Quicksort rather than BubbleSort • Compilers ain't smart enough to choose better algorithms • Use machine resources effectively • Instruction selection & scheduling • Register allocation • More about these later… Jim Hogg - UW - CSE - P501

  3. “Optimization” • No optimizations are actually “optimal” • Best we can do is to improve things • Most (much?) (some?) of the time • Realistically: try to do better for common idioms both in the code and on the machine • Never to make it worse! Jim Hogg - UW - CSE - P501

  4. Example Optimization • Classic example: Array references in a loop • for (k = 0; k < n; k++) a[k] = 0; • Naive codegen for a[k] = 0 in loop body • moveax, 4// elemsize = 4 bytes • imuleax, [ebp+offsetk]// k * elemsize • add eax, [ebp+offseta]// &a[0] + k * elemsize • mov [eax], 0 // a[k] = 0 • Better! • moveax, [ebp+offseta] // &a[0], once-off • mov [eax], 0 // a[0]=0, a[1]=0, etc • add eax, 4 // eax = &a[1], &a[2], etc Note: pointers allow a user to do this directly in C or C++ Eg: for (p = a; p < a + n; ) *p++ = 0; Jim Hogg - UW - CSE - P501

  5. Example Optimization: A[row, col] column row-major layout row int A[row, col] = @A + (row * numcols + col) * 4 // 0-based A[row, col] = @A + (row – rowmin) * (colmax – colmin + 1) * elemsize + (col – colmin) * elemsize Jim Hogg - UW - CSE - P501

  6. Optimize A[,] = 0 for (i = 0; i < ncols; i++) { for (j = 0; j < nrows; j++) { a[i, j] = 0; } } • A[i, j] = @A + (i* ncols+ j) * 4 • Naive codegen for A[i, j] = 0 • moveax, [ebp+offseti] • imuleax, [ebp+offsetncols] • add eax, [ebp+offsetj] • imuleax, 4 • lea edx, [ebp+offsetA] • add eax, edx • mov [eax], 0 • Better! • lea eax, [ebp+offsetA] // &A[0], once-off • mov [eax], 0 // A[i, j] = 0 • add eax, 4 • Strength Reduction • C programmers can do this with pointers Jim Hogg - UW - CSE - P501

  7. Optimization Phase • Goal • Discover, at compile time, information about the runtime behavior of the program, and use that information to improve the generated code Jim Hogg - UW - CSE - P501

  8. Common Problems in Optimization • Strategy is typical of most compiler optimizations • Discover opportunities through program analysis • Modify the IR to take advantage of the opportunities • Historically, goal usually was to decrease execution time • Other possibilities: reduce space, power, … Jim Hogg - UW - CSE - P501

  9. Issues (1) • Safety – transformation must not change program meaning • Must generate correct results • Can’t generate spurious errors (ie, report error when there is none) • Optimizations must be conservative, and always correct • never 'take a risk': 99% != 100% Eg: • fetch &o.mand remember m's address • if call to m then call remembered m, so long as obj.vtbl == o.vtbl • Large part of analysis goes towards proving safety • However, it’s not black&white. Eg: • hoist bounds-check • elide exception thru DCE Jim Hogg - UW - CSE - P501

  10. Issues (2) • Profitability • If a transformation is possible, is it profitable? • Example: loop unrolling • Can increase efficiency by reducing loop overhead (fewer tests & branches) • Can eliminate duplicate operations done on separate iterations • But, increases code size Jim Hogg - UW - CSE - P501

  11. Issues (3) • Downside risks • Even if a transformation is worthwhile, consider its side-effects • Eg: • Transformation might need more temporaries, increasing register pressure • Increased code size could cause cache misses • Increased code size could increase paging (a disaster) Jim Hogg - UW - CSE - P501

  12. Eliminating Redundancy • An expression x+y is redundant at a program point iff: • along every path from the procedure’s entry, it has been evaluated and its constituent sub-expressions (x and y) have not been re-defined (ie, assigned new values) since that evaluation • If the compiler can prove the expression is redundant • Can store the result of the earlier evaluation: t123 = x + y; • Can replace the redundant computation with a reference to the earlier (stored) result, as in = t123 ... Jim Hogg - UW - CSE - P501

  13. Value Numbering • Technique for eliminating redundant expressions • Assign an identifying number VN(n) to each expression • VN(x+y) == VN(j) if x+y and j have the same value • Use hashing over value numbers for efficiency • But hashing is not crucial to the algorithm • Old idea (Balke 1968, Ershov 1954) • Invented for low-level, linear IRs • Equivalent methods exist for tree IRs. Eg: build a DAG Jim Hogg - UW - CSE - P501

  14. Redundancies? Basic Block a = x * y b = x * y a = 7 c = x * y x = 3 d = x * y 1 2 3 4 5 6 • Are any of these IR instructions "redundant"? • In 2, can I re-use a in place of b? • In 4, can I re-use a in place of c? • In 6, can I re-use a in place of d? Jim Hogg - UW - CSE - P501

  15. Local Value Numbering (LVN) a3 = x1 * y2 b3 = x1 * y2 a4 = 74 c3 = x1 * y2 x5 = 35 d6 = x5 * y2 • VN of a variable is not the actual value of that variable at runtime! • Two variables with same VN are guaranteed to have the same value at runtime • Two variables with different VN may have same value at runtime, by happenstance, but it's not guaranteed Jim Hogg - UW - CSE - P501

  16. Transform Code Remove Redundancies VN'd a3 = x1 * y2 b3 = x1 * y2 a4 = 74 c3 = x1 * y2 x5 = 35 d6 = x5 * y2 a3 = x1 * y2 b3= a3 a4 = 74 c3 = a3 x5 = 35 d6 = x5 * y2 • Oops!a3 was overwritten Jim Hogg - UW - CSE - P501

  17. Solution- SSA • Could remember a3 into a temporary • But realize that a4 is really a different variable from a3 • Eg: we could store a3 in register R7 and a4 in a different register R2; they are totally unrelated • Use subscript to denote different instances of a variable • Bump the lifetime subscript each time we assign into that variable • This is the essence of SSA form - see later a03 = x01 * y02 b03= a03 a14 = 74 c03 = a03 x15 = 35 d06 = x15 * y02 • Fixed: a03 still alive

  18. Extensions to LVN • Constant Folding • Note whether each operand is a constant • Evaluate any arithmetic of those constants at compile-time • Replace instruction with a load-immediate (eg: moveax, 42) • Algebraic Identities • x+0 = x, x - x = 0, etc, etc • Humans don't write these - but may be created by index calculation, etc • Commutativity • x8 = a5 + b7 • y9 = b7 + a5 • Recognize a+b = b+a; eg: order by increasing VNs a03 = x01 * y02 b03= a03 a14 = 74 c13 = a03 x15 = 35 d06 = x15 * y02

  19. LVN in Detail (1) Basic Block g = x + y h = u - v i = x + y u = 3 x = u – v u = g + h v = i + x w = u + v 1 2 3 4 5 6 7 8 • Are any of these IR instructions "redundant"? • In 3 can I re-use already-calculated value of g? • What about 5 - can I replace x with a re-use of h from 2? • Any others? Jim Hogg - UW - CSE - P501

  20. LVN in Detail (2) Expression ValueNumber 1 2 3 3 x y <1,+,2> g Basic Block u v <4,-,5> h 4 5 6 6 g = x + y h = u - v i = x + y u = 3 x = u – v u = g + h v = i + x w = u + v 1 2 3 3 x y <1,+,2> i • VN of a variable is not the actual value of that variable at runtime! • Two variables with same VN are guaranteed to have the same value at runtime • Two variables with different VN may have same value at runtime, by happenstance, but it's not guaranteed Jim Hogg CSE P 501

  21. LVN in Detail (3) Expression ValueNumber 6 2 3 3 x y <1,+,2> g g = x + y h = u - v i = x + y u = 3 x = u – v u = g + h v = i + x w = u + v u v <4,-,5> h 4 7 5 6 6 1 2 3 3 x y <1,+,2> i 7 3 Jim Hogg CSE P 501

  22. LVN in Detail (4) Expression Expression ValueNumber ValueNumber 6 8 2 3 3 8 x y <1,+,2> g <7,-,5> <3,+,6> g = x + y h = u - v i = x + y u = 3 x = u – v u = g + h v = i + x w = u + v 9 10 <3,+,8> u v <4,-,5> h 47 9 5 10 6 6 11 11 <9,+,10> w 1 2 3 3 x y <1,+,2> i 7 3 Jim Hogg CSE P 501

  23. LVN in Detail: Summary Exp VN g h i u v w x y 3 <1,+,2> <3,+,6> <3,+,8> <4,-,5> <7,-,5> <9,+,10> Basic Block 3 6 3 47 9 5 10 11 1 8 2 7 3 9 10 6 8 11 g = x + y h = u - v i = x + y u = 3 x = u – v u = g + h v = i + x w = u + v Jim Hogg CSE P 501

  24. LVN in Detail - Superscript Notation Exp VN Basic Block g h i u v w x y <1,+,2> <3,+,6> <3,+,8> <4,-,5> <7,-,5> <9,+,10> 3 6 3 47 9 5 10 11 1 8 2 3 9 10 6 8 11 g3 = x1 + y2 h6 = u4 - v5 i3 = x1 + y2 u7 = 37 x8 = u7 – v5 u9 = g3 + h6 v10 = i3 + x8 w11 = u9 + v10 Jim Hogg CSE P 501

  25. LVN Algorithm foreachinstr in BasicBlock, with instr ≡ lhs = <rhs1,op,rhs2> lookup rhs1 in EVN-Table, giving vn1 if vn1 < 0 then {vn1 = ++V; insert <rhs1,vn1> into EVN-Table} lookup rhs2in EVN-Table, giving vn2 if vn2 < 0 then {vn2= ++V; insert <rhs2,vn2>into EVN-Table} lookup <vn1,op,vn2> in EVN-Table, giving vn if vn < 0 then insert <<vn1,op,vn2>,vn>into EVN-Table else update EVN-Table to hold <<vn1,op,vn2>,vn> next V is a sequence counter, starting at 1 BasicBlock EVN-Table g = x + y h = u - v . . . g h ... <1,+,2> <3,+,6> ... 3 6 ... 3 9 ... For efficiency, use hash table for EVN-Table Jim Hogg CSE P 501

  26. Basic Block • A basic block is a maximal length sequence of straight-line code • A branch-free sequence of instructions • Properties • Statements are executed sequentially • If any statement executes, they all do (barring exceptions) • In linear IR, first instruction of a basic block is called a leader • Procedure entry; jump targets; instruction following jump or call • Implementation • Array of quads (d = x op y) • Linked list of quads Jim Hogg - UW - CSE - P501

  27. Reminder: Flowgraph B0 B2 • A Flowgraph is: • Directed graph of Nodes and Edges • Node = Basic Block • Edge = any possible flow between nodes B3 B1 B4 B5 B6 Jim Hogg - UW - CSE - P501

  28. ENTRY Real Example Flowgraph B1 i = 1 j = 1 B2 Control Flow Graph ("CFG", again!) B3 t1 = 10 * i t2 = t1 + j t3 = 8 * t2 t4 = t3 - 88 a[t4] = 0 j = j + 1 if j <= 10 goto B3 • 3 loops total • 2 of the loops are nested B4 i = i + 1 if i <= 10 goto B2 B5 i = 1 B6 t5 = i - 1 t6 = 88 * t5 a[t6] = 1 i = i + 1 if i <= 10 goto B6 Most of the executions likely spent in loop bodies; that's where to focus efforts at optimization Jim Hogg - UW - CSE - P501 EXIT

  29. EBBs: Formal Definition • A set of blocks b1, b2, …, bn where b1 has multiple predecessors and each of the remaining blocks bi (2≤i≤n) have only bi-1 as its unique predecessor • The EBB is entered only at b1, but may have multiple exits • A single block bi can be the head of multiple EBBs (these EBBs form a tree rooted at bi) Jim Hogg - UW - CSE - P501

  30. Example Extended Basic Blocks (EBBs) B0 Intuition: Maximal straight-line sequence of Blocks, with no join points B2 EBBs cannot represent loops! B3 B1 B4 • { B0, B1 } • { B0, B2, B3 } • { B0, B2, B4 } • { B5 } • { B6 } B5 B6 Jim Hogg - UW - CSE - P501

  31. Optimization Scopes • Local • Within a Basic Block • Regional • Bigger than block, smaller than function • Eg: loop or Extended Basic Block (EBB) • Global • Whole function • Whole Program • Spans two or more functions • Maybe large part of a program (excludes libraries) • BasicBlock< Region < Function < Program Jim Hogg - UW - CSE - P501

  32. Local Optimization • Local optimizations ≡ within a Basic Block • Algebraic simplifications (eg: 2*x = x+x; x+0 = x) • Constant folding (eg: FeetPerMile/MinutesPerHour = 88) • Common sub-expression elimination (CSE) • Dead code elimination (DCE) (unreachable or unused) • Specialize computation based on context • Etc Jim Hogg - UW - CSE - P501

  33. Regional Optimization • Bigger than a Basic-Block; less than whole function • Eg: Loops • Hoist invariant code out of loop • Unroll-and-jam (unroll outer loop; merge inners) • Loop Interchange, for better caching • Array-index strength reduction • Eg: Extended-Basic-Blocks (EBBs) - no-join paths • Super-Local Value Numbering (SVN) • Etc EBB Jim Hogg - UW - CSE - P501

  34. Global Optimization • Misnomer - really means whole-function • Global common sub-expression elimination (CSE) • Register allocation • Choose which variables, for how long, are enregistered • Choose which registers - not so important with 'orthogonal' ISAs • Result memoization / caching • Remember last 50, say, results calculated; re-use where possible • Eg: can speed a naive Fibonacci calculation 10x or more • Etc Jim Hogg - UW - CSE - P501

  35. Whole-Program Optimization (WPO) • a.k.a Inter-Procedural Optimization (IPO) • Spans all, or much, of a program; certainly >1 function • At-odds with "separate compilation" • Inlining (assisted by profile 'training') • Hot/Cold code separation • Whole-Program DCE, or "tree shaking" • No program uses an entire library! • Remove unused classes and/or methods • Can make drastic reductions in program size Jim Hogg - UW - CSE - P501

  36. Local Value Numbering 1 block at a time Strong local results No cross-block effects LVN finds these redundancies LVN misses these Super-Local Value Numbering (SVN) A m = a + b n = a + b B C p = c + d r = c + d q = a + b r = c + d D E e = b + 18 s = a + b u = e + f e = a + 17 t = c + d u = e + f F v = a + b w = c + d x = e + f G y = a + b z = c + d Jim Hogg - UW - CSE - P501

  37. Apply LVN to EBBs: {A,B}, {A,C,D}, {A,C,E} Value Numbering over EBBs A m = a + b n = a + b • Final info from A is initial info for B, C • Final info from C is initial for D, E B C p = c + d r = c + d q = a + b r = c + d • Avoid re-analyzing A, C • Doesn’t help with F, G D E e = b + 18 s = a + b u = e + f e = a + 17 t = c + d u = e + f F v = a + b w = c + d x = e + f • LVN • SVN • Missed G y = a + b z = c + d Jim Hogg - UW - CSE - P501

  38. SSA Name Space (from before) a03 = x01 * y02 b03 = a03 a14 = 74 c03 = a03 x15 = 35 d06 = x15 * y02 • Unique name for each definition • Subscript denotes variable's SSA number • The variable's "generation" number • c03 = a03 now works, since a1= 7 happened to a 'different' variable

  39. SSA Name Space • Two Principles • Each name is defined (assigned-to) by exactly one instruction • Each operand refers to exactly one definition • Need to deal with merge points • Add Φ functions at merge points to reconcile names • Use subscripts on variable names for uniqueness Jim Hogg - UW - CSE - P501

  40. φ Functions a3 a6 A B a7 = φ(a3, a6) C • Result of a7 = φ(a3, a6) is either a3 or a6 depending on whether we reached node C from node A or from node B • Not really a mathematical function at all Jim Hogg - UW - CSE - P501

  41. Insert φ functions at join nodes F, G Little extra cost Still does nothing for F and G SVN with Bells & Whistles A m0 = a0 + b0 n0 = a0 + b0 B C p0 = c0 + d0 r0 = c0 + d0 q0 = a0 + b0 r1 = c0 + d0 D E e0 = b0 + 18 s0 = a0 + b0 u0 = e0 + f0 e1 = a0 + 17 t0 = c0 + d0 u1 = e1 + f0 F e2 = Φ(e0,e1) u2 = Φ(u0,u1) v0 = a0 + b0 w0 = c0 + d0 x0 = e2 + f0 G r2 = Φ(r0,r1) y0 = a0 + b0 z0 = c0 + d0 Jim Hogg - UW - CSE - P501

  42. Still have not helped F, G Problem: multiple predecessors Must decide what facts hold in F and in G For G, combine B & F? Merging states is expensive Fall back on what we know Larger Scopes A m0 = a0 + b0 n0 = a0 + b0 B C p0 = c0 + d0 r0 = c0 + d0 q0 = a0 + b0 r1 = c0 + d0 D E e0 = b0 + 18 s0 = a0 + b0 u0 = e0 + f0 e1 = a0 + 17 t0 = c0 + d0 u1 = e1 + f0 F e2 = Φ(e0,e1) u2 = Φ(u0,u1) v0 = a0 + b0 w0 = c0 + d0 x0 = e2 + f0 G r2 = Φ(r0,r1) y0 = a0 + b0 z0 = c0 + d0 Jim Hogg - UW - CSE - P501

  43. Dominators • Definition • xdominatesyiff every path from the entry of the flowgraph to y includes x • By definition, x dominates x • Associate a Dom set with each node • | Dom(x) |≥ 1 • Many uses in analysis and transformation • Finding loops, building SSA form, code motion Jim Hogg - UW - CSE - P501

  44. Dominator Sets A m0 = a0 + b0 n0 = a0 + b0 B C p0 = c0 + d0 r0 = c0 + d0 q0 = a0 + b0 r1 = c0 + d0 Dom(n) n A B C D E F G {A} {A,B} {A,C} {A,C,D} {A,C,E} {A,C,F} {A,G} D E e0 = b0 + 18 s0 = a0 + b0 u0 = e0 + f0 e1 = a0 + 17 t0 = c0 + d0 u1 = e1 + f0 F e2 = Φ(e0,e1) u2 = Φ(u0,u1) v0 = a0 + b0 w0 = c0 + d0 x0 = e2 + f0 G r2 = Φ(r0,r1) y0 = a0 + b0 z0 = c0 + d0 Jim Hogg - UW - CSE - P501

  45. Dominator Tree & IDom Dom(n) IDom(n) n A A B C D E F G {A} {A,B} {A,C} {A,C,D} {A,C,E} {A,C,F} {A,G} - A A C C C A B C G D E F • Closest dominator is called the Immediate Dominator • Denoted IDom(n) Jim Hogg - UW - CSE - P501

  46. Still looking for a way to handle F, G Use info from IDom(x) to start analysis of x Use C for F and A for G Dominator VNTechnique (DVNT) Dominator Value Numbering A m0 = a0 + b0 n0 = a0 + b0 B C p0 = c0 + d0 r0 = c0 + d0 q0 = a0 + b0 r1 = c0 + d0 D E e0 = b0 + 18 s0 = a0 + b0 u0 = e0 + f0 e1 = a0 + 17 t0 = c0 + d0 u1 = e1 + f0 F e2 = Φ(e0,e1) u2 = Φ(u0,u1) v0 = a0 + b0 w0 = c0 + d0 x0 = e2 + f0 G r2 = Φ(r0,r1) y0 = a0 + b0 z0 = c0 + d0 Jim Hogg - UW - CSE - P501

  47. DVN algorithm • Use SVN on EBBs • Use scoped hash tables & SSA namespace as before • Start each node with EVN-Table from its IDOM • No values flow along back edges (i.e., loops) • Constant folding, algebraic identities as before Jim Hogg - UW - CSE - P501

  48. Advantages Finds more redundancies Little extra cost Shortcomings Misses some opportunities (common calculations in ancestors that are not IDOMs) Doesn’t handle loops or other back edges Dominator Value Numbering A m0 = a0 + b0 n0 = a0 + b0 B C p0 = c0 + d0 r0 = c0 + d0 q0 = a0 + b0 r1 = c0 + d0 D E e0 = b0 + 18 s0 = a0 + b0 u0 = e0 + f0 e1 = a0 + 17 t0 = c0 + d0 u1 = e1 + f0 F e2 = Φ(e0,e1) u2 = Φ(u0,u1) v0 = a0 + b0 w0 = c0 + d0 x0 = e2 + f0 LVN SVN DVN Missed G r2 = Φ(r0,r1) y0 = a0 + b0 z0 = c0 + d0 Jim Hogg - UW - CSE - P501

  49. The Story So Far… • Local algorithm: LVN • Superlocal extension: SVN • Some local methods extend cleanly to superlocal scopes • Dominator VN Technique (DVN) • All of these propagate along forward edges • None are global (can't cope with loops) Jim Hogg - UW - CSE - P501

  50. Next • Dataflow analysis • Global solution to redundant expression analysis • Transformations • Catalog of transforms a compiler can do with dataflow analysis Jim Hogg - UW - CSE - P501

More Related