1 / 32

Ch 15. Procedure Optimizations

Ch 15. Procedure Optimizations. 2006.5.1 고급 컴파일러 발표 발표자 : 김영식. Overview. Tail-Call Optimization vs. Tail-Recursion Elimination Procedure Integration vs. In-line Expansion Leaf-routine Optimization vs. Shrink Wrapping. Drawbacks of “ Call ”. Calling convention

catori
Download Presentation

Ch 15. Procedure Optimizations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ch 15. Procedure Optimizations 2006.5.1 고급 컴파일러 발표 발표자 : 김영식

  2. Overview • Tail-Call Optimization vs. Tail-Recursion Elimination • Procedure Integration vs. In-line Expansion • Leaf-routine Optimization vs. Shrink Wrapping

  3. Drawbacks of “Call” • Calling convention • caller : parameter passing, caller-saved register, return address branch • callee • prologue : save FP, compute SP, callee-saved register • epilogue : callee-saved register, return value, SP, FP, branch • Optimization view • less chance to optimize between proceduresex) aliasing

  4. Definition • Tail-Recursion • Tail-Call void f(int x) { ... g(x); (return;) } void f(int x) { ... f(x); (return;) }

  5. Effect of tail-recursion • eliminate procedure-call overhead • enable loop optimization void insert_node(int n, struct node *l) { if(n > l→values) if(l→next==null) make_node(l,n); else insert_node(n,l→next); } void insert_node(int n, struct node *l) { Loop: if(n > l→values) if(l→next==null) make_node(p,n); else { l = l→next; goto Loop; } }

  6. Effect of tail-call void make_node(struct node *p, int n) { L0: struct node *q = malloc(...); p→next = q; ... } void insert_node(int n, struct node *l) { if(n > l→value) if(l→next==null) make_node(l, n); ... } ? goto L0: • Two problems of high level implementation 1. Branch into the body of the other procedure 2. Local scope of parameters

  7. Low-level implementation make_node : ... return insert_node : ... if !r7 goto L2 r2 ← r1 r1 ← r4 call make_node return L2 : r2 ← r2 *. next call insert_node return make_node : ... return insert_node : ... if !r7 goto L2 r2 ← r1 r1 ← r4 goto make_node L2 : r2 ← r2 *. next goto insert_node

  8. An issue about stack frame • Original stack frame implementation (function call) caller’s caller (main() or else) caller (insert_node()) fp callee (make_node()) sp

  9. An issue about stack frame • Stack frames after optimization caller’s caller caller callee • Result of optimization : one procedure (caller + callee) • We don’t know the size of stack frame needed by callee • If (stack frame size of callee) > (stack frame size of caller) → allocate the remainder of the callee’s stack frame → deallocate caller’s stack frame & reallocate callee’s stack by standard procedure prologue

  10. Determining tail-call • The routine performing the call does nothingafter the call returns except itself return • It’s easy!

  11. Performing tail-recursion elimination • Replace the recursive call by • assigning proper values to the parameters, and • branch to the beginning of the body of the procedure • Delete ‘return’ after recursive call void replace(int n) { Loop: if(n>=10) return; if(A[n]==0) A[n]=1; else { n = n+1; goto Loop; } void replace(int n) { if(n>=10) return; if(A[n]==0) A[n]=1; else replace(n+1); }

  12. Performing tail-call optimization (1) • Both procedure bodies should be visible to the compiler. • same compilation unit / saving intermediate-code • Need to know about callee. • where it expects to find its parameters. • where to branch to. • stack frame size.

  13. Performing tail-call optimization (2) • Replace the call by three things, • evaluation of the arguments and putting them where the callee expects to find them. • if callee’s stack frame is larger than caller’s, an instruction that extends stack frame as difference. • a branch to the beginning of the body of the callee • Also, delete ‘return’ after a call. insert_node : ... r2 ← r1 r1 ← r4 call make_node return insert_node : ... r2 ← r1 r1 ← r4 goto make_node

  14. An issue about architecture • Alpha Both “jmp” / “jsr” (jump to subroutine) use registers as operand • MIPS Both “jal” (jump and link) / “j” use 26-bit absolute target address • SPARC“call” : 30-bit PC-relative word displacement“ba” (branch always) : 22-bit PC-relative word displacement“jmpl” : 32-bit absolute address (sum of two registers)

  15. Procedure Integration • Also called, ‘automatic inlining’ • Replace calls to the copy of the procedure body • call : unknown effect of the objects in the procedure on aliased variableslocal code : expose effects, enable more optimization • Better than ‘inline’ of C++, optimized by user’s intuition

  16. Issues of Procedure Integration (1) • Range of inlined procedure • need to save intemediate-code representations • Languages of caller and callee (cross compilation units) • different languages require different parameter passing conventions • “external language_name procedure_name” declaration to specify source languates • Saving intermediate-code of inlined routines • depends on the purpose of saving intermediate-code

  17. Issues of Procedure Integration (2) • Need to compile a copy of the whole inlined procedure • address of the procedure has been taken • calls from other compilation units, currently invisible • Inlining on recursive procedures • until running out of calls to them - could be infinite process • can be valuable to inline once or twice

  18. Which procedures are worth inlining? (1) • Goal : reduce execution time • Inlining every procedure • decrease overhead costs of call • increase object code size → more cache misses • compilation terminates only by exhaustion of resources • We need heuristics or profiling feedback

  19. Which procedures are worth inlining? (2) Choose the procedure • whose body size is small, • that is called less, • that is called inside a loop, and • whose call includes constant-valued parameter

  20. How to perform the inlining? • Three major issues • Different parameter passing conventions • “external language_name procedure_name” declaration • call-by-reference in Fortran vs. call-by-value in C • Name conflicts • conflicts between source symbol names • detect conflicts and rename symbols of called procedure • Static variables • makes only one copy of static variable • initialized once

  21. In-Line Expansion • similar to procedure integration • low-level (assembly-language, machine code) • enables high-level operations providing templates • ex) exchange the values of two registers ra ← ra xor rb rb ← ra xor rb ra ← ra xor rb

  22. In-Line Expansion • enables to write OS, I/O device drivers using high level language • ex) template - DisableInterrupts() functionality - setting bit 15 in the PSW getpsw ra ori ra,0x8000,ra setpsw ra

  23. Providing in-line expansion • make assembly-language sequence into a template • performs inlining –“inliner” • specify real registers • definition of the template • register coalescing is needed .template ProcName, ArgBytes, regs=(r1,...,rn)...instructions....end

  24. Leaf-Routine Optimization • leaf routine • leaf node in the call graph of a program • routine that calls no procedures • many procedures are leaf routines • leaf routine optimization • simplify the way parameters are passed • remove procedure prologue / epilogue • highly desirable with little effort

  25. Candidates • the routine calls no others (obvious) • architecture-dependent • how many registers and stack the procedure requires? • requires no more registers than caller-saved registers • requires no stack space (w/ sufficient registers)

  26. Shrink Wrapping (1) • moving prologue and epilogue code to enclose the minimal part of the procedure • inside a loop • making many copies of codes prologue prologue epilogue epilogue

  27. Shrink Wrapping (2) prologue a > b a > b save a ← 1 a ← 1 c ← a c ← a c ← b save epilogue c ← b restore

  28. Shrink Wrapping (3) • Define again, • move the prologue and epilogue code to enclose the minimal code segments • not being contained in a loop • maintain correctness • data-flow analysis • similar to the problem ‘anticipation’ in PRE • similar to the problem ‘available expression’in global CSE

  29. Data-flow analysis (1) • register is anticipatable, if all execution paths from that point contain definitions or uses of the register • register is available if all execution paths to that point include definitions or uses of it • for basic block i, • RUSE(i) : set of registers used or defined in block i • RANTin(i), RANTout(i) : set of registers anticipatable • RAVin(i), RAVout(i) : set of registers availabe

  30. Data-flow analysis (2) • Anticipatable registers • backward problem • meet operator : ∩ (intersection) • initialization : RANTin(exit)={}, RANTin(b)=U • transfer function : RANTin(i) = RUSE(i) ∪ RANTout(i) • Available registers • forward problem • meet operator : ∩ (intersection) • initialization : RAVout(entry)={}, RAVout(b)=U • transfer function : RAVout(i) = RUSE(i) ∪ RAVin(i) • representation using bit vector • bit vector is a single word w/ 32 registers machine

  31. Data-flow analysis (3) • for register r, and block i, • insert save code at the earliest point leading to contiguous blocks that use r. • no previous save of r. • by symmetry,

  32. Data-flow analysis (4) • Still, suffering from two problems. • save / restore inside a loop • move save and restore code outward to surround the loop • correctness • split edge and move save code a ← 1 save save c ← a c ← b epilogue

More Related