1 / 20

Classical Optimization

Classical Optimization. Types of classical optimizations Operation level : one operation in isolation Local : optimize pairs of operations in same basic block (with or without dataflow analysis), e.g. peephole optimization

gerodi
Download Presentation

Classical Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Classical Optimization • Types of classical optimizations • Operation level: one operation in isolation • Local: optimize pairs of operations in same basic block (with or without dataflow analysis), e.g. peephole optimization • Global: optimize pairs of operations spanning multiple basic blocks and must use dataflow analysis in this case, e.g. reaching definitions, UD/DU chains, or SSA forms • Loop: optimize loop body and nested loops

  2. Local Constant Folding • Goal: eliminate unnecessary operations • Rules: • X is an arithmetic operation • If src1(X) and src2(X) are constant, then change X by applying the operation r7 = 4 + 1 src2(X) = 1 r5 = 2 * r4r6 = r5 * 2 src1(X) = 4

  3. Local Constant Combining • Goal: eliminate unnecessary operations • First operation often becomes dead • Rules: • Operations X and Y in same basic block • X and Y have at least one literal src • Y uses dest(X) • None of the srcs of X have defs between X and Y (excluding Y) r7 = 5 r5 = 2 * r4r6 = r5 * 2 r6 = r4 * 4

  4. Local Strength Reduction • Goal: replace expensive operations with cheaper ones • Rules (example): • X is an multiplication operation where src1(X) or src2(X) is a const 2k integer literal • Change X by using shift operation • For k=1 can use add r7 = 5 r5 = 2 * r4r6 = r4 * 4 r5 = r4 + r4 r6 = r4 << 2

  5. Local Constant Propagation r1 = 5r2 = _xr3 = 7r4 = r4 + r1r1 = r1 + r2r1 = r1 + 1r3 = 12r8 = r1 - r2r9 = r3 + r5r3 = r2 + 1r7 = r3 - r1M[r7] = 0 • Goal: replace register uses with literals (constants) in single basic block • Rules: • Operation X is a move to register with src1(X) literal • Operation Y uses dest(X) • There is no def of dest(X) between X and Y (excluding defs at X and Y) • Replace dest(X) in Y with src1(X)

  6. Local Common Subexpression Elimination (CSE) r1 = r2 + r3r4 = r4 + 1r1 = 6r6 = r2 + r3r2 = r1 - 1r5 = r4 + 1r7 = r2 + r3r5 = r1 - 1 • Goal: eliminate recomputations of an expression • More efficient code • Resulting moves can get copy propagated (see later) • Rules: • Operations X and Y have the same opcode and Y follows X • src(X) = src(Y) for all srcs • For all srcs, no def of a src between X and Y (excluding Y) • No def of dest(X) between X and Y (excluding X and Y) • Replace Y with move dest(Y) = dest(X)

  7. Dead Code Elimination r1 = 3r2 = 10 • Goal: eliminate any operation who’s result is never used • Rules (dataflow required) • X is an operation with no use in DU chain, i.e. dest(X) is not live • Delete X if removable (not a mem store or branch) • Rules too simple! • Misses deletion of r4, even after deleting r7, since r4 is live in loop • Better is to trace UD chains backwards from “critical” operations r4 = r4 + 1r7 = r1 * r4 r3 = r3 + 1 r2 = 0 r3 = r2 + r1 M[r1] = r3

  8. Local Backward Copy Propagation r1 = r8 + r9r2 = r9 + r1r4 = r2r6 = r2 + 1r9 = r1r7 = r6r5 = r6 + 1r4 = 0r8 = r2 + r7 • Goal: propagate LHS of moves backward • Eliminates useless moves • Rules (dataflow required) • X and Y in same block • Y is a move to register • dest(X) is a register that is not live out of the block • Y uses dest(X) • dest(Y) not used or defined between X and Y (excluding X and Y) • No uses of dest(X) after the first redef of dest(Y) • Replace src(Y) on path from X to Y with dest(X) and remove Y

  9. Global Constant Propagation r1 = 4r2 = 10 • Goal: globally replace register uses with literals • Rules (dataflow required) • X is a move to a register with src1(X) literal • Y uses dest(X) • dest(X) has only one def at X for UD chains to Y • Replace dest(X) in Y with src1(X) r5 = 2r7 = r1 * r5 r3 = r3 + r5 r2 = 0 r3 = r2 + r1r6 = r7 * r4 M[r1] = r3

  10. Global Constant Propagation with SSA r1 = 4r2 = 10 • Goal: globally replace register uses with literals • Rules (high level) • For operation X with a register src(X) • Find def of src(X) in chain • If def is move of literal, src(X) is constant: done • If RHS of def is an operation, including node, recurse on all srcs • Apply rule for operation to determine src(X) constant • Note: abstract values T (top) and (bottom) are often used to indicate unknown values r5 = 2r7 = r1 * r5 r3 = r3 + r5 r2 = 0 r3 = r2 + r1r6 = r7 * r4 M[r1] = r3 Exercise: compute SSA form and propagate constants

  11. Forward Copy Propagation • Goal: globally propagate RHS of moves forward • Reduces dependence chain • May be possible to eliminate moves • Rules (dataflow required) • X is a move with src1(X) register • Y uses dest(X) • dest(X) has only one def at X for UD chains to Y • src1(X) has no def on any path from X to Y • Replace dest(X) in Y with src1(X) r1 = r2r3 = r4 r6 = r3 + 1 r2 = 0 r5 = r2 + r3

  12. Global Common Subexpression Elimination (CSE) • Goal: eliminate recomputations of an expression • Rules: • X and Y have the same opcode and X dominates Y • src(X) = src(Y) for all srcs • For all srcs, no def of a src on any path between X and Y (excluding Y) • Insert rx = dest(X) immediately after X for new register rx • Replace Y with move dest(Y) = rx r1 = r2 * r6r3 = r4 / r7 r2 = r2 + 1 r1 = r3 * 7 r5 = r2 * r6r8 = r4 / r7 r9 = r3 * 7

  13. Loop Optimizations • Loops are the most important target for optimization • Programs spend much time in loops • Loop optimizations • Invariant code removal (aka. code motion) • Global variable migration • Induction variable strength reduction • Induction variable elimination

  14. Code Motion preheader r1 = 0 • Goal: move loop-invariant computations to preheader • Rules: • Operation X in block that dominates all exit blocks • X is the only operation to modify dest(X) in loop body • All srcs of X have no defs in any of the basic blocks in the loop body • Move X to end of preheader • Note 1: if one src of X is a memory load, need to check for stores in loop body • Note 2: X must be movable and not cause exceptions header r4 = M[r5]r7 = r4 * 3 r8 = r2 + 1r7 = r8 * r4 r3 = r2 + 1 r1 = r1 + r7 M[r1] = r3

  15. Global Variable Migration • Goal: assign a global variable to a register for the entire duration of a loop • Rules: • X is a load or store to M[x] • Address x of M[x] not modified in loop • Replace all M[x] in loop by new register rx • Add rx = M[x] to preheader • Add M[x] = rx to each loop exit • Memory disambiguation is required: all mem ops in loop whose address can equal x must use same address x r4 = M[r5]r4 = r4 + 1 r8 = M[r5]r7 = r8 * r4 M[r5] = r4 M[r5] = r7

  16. Loop Strength Reduction (1) preheader • Goal: create basic IVs from derived IVs • Rules • X is a *, <<, +, or - operation • src1(X) is a basic IV • src2(X) is invariant • No other ops modify dest(X) • dest(X) != src(X) for all srcs • dest(X) is a register header r5 = r4 - 3r4 = r4 + 1 r7 = r4 * r9 src2(X) = r9 r6 = r4 << 2 src1(X) = r4 dest(X) = r7 Basic IV r4 has triple (r4, 1, ?)

  17. Loop Strength Reduction (2) r1 = r4 * r9r2 = 1 * r9 • Transformation • Insert into the bottom of the preheader:new_reg = RHS(X) • If opcode(X) is not + or -, then insert into the bottom of the preheader:new_inc = inc(src1(X)) opcode(X) src2(X) • Elsenew_inc = inc(src1(X)) • Insert at each update of src1(X):new_reg += new_inc • Change X by:dest(X) = new_reg r5 = r4 - 3r4 = r4 + 1r1 = r1 + r2 r7 = r1 r6 = r4 << 2 Exercise: apply strength reduction to r5 and r6

  18. IV Elimination (1) r1 = 0r2 = 0 • Goal: remove unnecessary basic IVs from the loop by substituting uses with another basic IV • Rules for IVs with same increment and initial value: • Find two basic IV x and y • If x and y in same family and have same increment and initial values • Incremented at same place • x is not live at loop exit • For each basic block where x is defined, there are no uses of x between first/last def of x and last/first def of y • Replace uses of x with y r1 = r1 - 1r2 = r2 - 1 r9 = r2 + r4 r7 = r1 * r9 r4 = M[r1] M[r2] = r7 Exercise: apply IV elimination

  19. IV Elimination (2) • Many variants, from simple to complex: • Trivial cases: IV variable that is never used except by the increment operations and is not live at loop exit • IVs with same increment and same initial value • IVs with same increment and initial values are known constant offset from each other • IVs with same increment, but initial values unknown • IVs with different increments and no info on initial values • Method 1 and 2 are virtually free, so always applied • Methods 3 to 5 require preheader operations

  20. IV Elimination (3) • Example for method 4 r1 = ?r2 = ? r1 = ?r2 = ?r5 = r2-r1+8 r3 = M[r1+4]r4 = M[r2+8]…r1 = r1 + 4r2 = r2 + 4 r3 = M[r1+4]r4 = M[r1+r5]…r1 = r1 + 4

More Related