1 / 16

Philip Brisk

Optimal Polynomial-Time Interprocedural Register Allocation for High-Level Synthesis and ASIP Design. Philip Brisk. Ajay K. Verma. Paolo Ienne. International Conference on Computer-Aided Design (ICCAD) San Jose, CA, USA November 6, 2007. Outline. Interprocedural Register Allocation

john
Download Presentation

Philip Brisk

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimal Polynomial-Time Interprocedural Register Allocation for High-Level Synthesis and ASIP Design Philip Brisk Ajay K. Verma Paolo Ienne International Conference on Computer-Aided Design (ICCAD) San Jose, CA, USA November 6, 2007

  2. Outline • Interprocedural Register Allocation • Related Work • Contribution • Optimal algorithm for interprocedural register allocation • Optimal algorithm • Runs in polynomial time; is scalable • Experimental Results • Optimal algorithm runs faster than heuristics • Conclusion 1/13

  3. Interprocedural Register Allocation • Register allocation in HLS/ASIP design • How many registers to physically allocate? • Interprocedural version – consider the whole program • Each scalar variable stored in a register • Variables whose lifetimes overlap require distinct registers • Goal: minimize the number of registers allocated • Modeled as graph coloring problem • NP-Compete for general graphs • Polynomial for certain classes of graphs 2/13

  4. X Y Z P Q V Interferences • Local Interferences • Variables in the same procedure • Overlapping lifetimes • Global Interferences • Variables live across procedure calls • Interferences are transitive Main: V  Call P  V P: … Call Q … X  Y   X Z   Y  Z Main 3/13

  5. Related Work • Interprocedural interference graph (IIG): G = (V, E) • V is all variables in the program • E includes both local and global interferences • Goal: Find a minimum coloring of G • Color IIG with heuristic • [Vemuri et al., TODAES ’02] • Scalable approach • [Beidas and Zhu, ASP-DAC ’05] • Color each procedure individually with heuristic • Propagate global interferences at call-points • Only build local interference graph for each procedure 4/13

  6. Contribution • Chordal graphs can be colored in O(|V| + |E|) time [Gavril., Siam J. Comput., ’72] • Local interference graph for a procedure in SSA Form is chordal • [Brisk et al., IWLS ’05, TCAD ’06] • [Hack et al. Info. Proc. Letters, ’06] • [Bouchez et al., MS Thesis, ENS-Lyon ’05] • Contribution: • New SSA-based representation • Theorem: The IIG is chordal and can be colored optimally with a scalable algorithm 5/13

  7. Recursive Calls • How to handle variables live across calls in a recursive chain? • Pushed onto stack • Cannot use registers • Call graph becomes a DAG • Strongly connected components – O(|V| + |E|) • Collapse each SCC into a single node 6/13

  8. Optimal Coloring Algorithm • Allocate “global registers” • Hold variables live across procedure calls (and local variables, when free) • Max-weighted path in call graph – O(|V| + |E|) time for DAGs 2. Represent each procedure in pruned SSA Form • Local interference graph for each procedure is chordal • Copy local variables live across function calls to global registers 3. Color Assignment • Top-down color palette propagation • [Beidas and Zhu, ASP-DAC ’05] • Color each SSA-form procedure without building an interference graph • [Brisk et al., TCAD ’06; Hack and Goos, IPL ’06] 7/13

  9. Launch and Landing Pads • When Pi is called • The maximum stack size is m = δi • Global registers T1…Tm store variables live across calls in the chain • Pi calls Pj at call point ck • L(ck) – set of variables live across the call • Let n = |L(ck)| be the number of variables • Launch and Landing Pads • Parallel copy (Tm+1...Tm+n)  ψ(L(ck)) inserted before the call • Parallel copy L(ck)  ψ-1(Tm+1…Tm+n) inserted after the call 8/13

  10. Theorem: IIG is chordal CLIQUE N = 6 T1 T2 T3 T4 T5 T6 G1 G2 G3 G4 G5 G6 δ1 = 0 δ2 = 2 δ3 = 3 δ4 = 2 δ5 = 6 δ6 = 5 Tj interferes with each local variable in Gi Gi is chordal 9/13

  11. Experiments • Applications taken from Mediabench and MiBench • Compiled using Machine SUIF • Comparison • Optimal color assignment • Color palette propagation • Heuristic – cannot guarantee optimality • Top-down, Bottom-up • [Beidas and Zhu, ASP-DAC ’05] • Smallest last-ordering heuristic for coloring • [Matula and Beck, JACM ’83] 10/13

  12. Registers Allocated 11/13

  13. Runtime (Normalized to Optimal) 12/13

  14. Conclusion • Interprocedural register allocation • Optimal, polynomial-time algorithm • Runs efficiently in practice • SSA Form with Launch and Landing Pads • IIG is chordal • Color IIG with a scalable algorithm • Experiments • Optimal algorithm is faster than sub-optimal heuristics • Optimal algorithm never builds an interference graph 13/13

  15. P1 P1 P1 P1 0 0 0 0 P2 P2 P2 P2 0 2 2 2 P1 P1 P1 P3 P3 P3 P3 P4 0 3 3 2 3 P4 P4 P4 2 2 0 c8 c10 c11 c7 c9 P5 P5 P5 6 0 6 P6 P6 P6 5 0 5 P2 P2 P2 P3 P3 P3 P4 P4 P4 c7 c7 c7 c7 1 1 0 1 c8 c8 c8 c8 0 2 2 2 c13 c13 c14 c14 c12 c12 c9 c9 c9 c9 3 3 3 0 c10 c10 c10 c10 2 0 2 2 P5 P5 P6 P6 c11 c11 c11 c11 5 5 0 5 c12 c12 c12 c12 5 5 5 0 δ6 = MAX{δ11, δ14} δ10 = |L(c10)| + δ1 δ7 = |L(c7)| + δ1 δ9 = |L(c9)| + δ1 δ11 = |L(c11)| + δ1 δ13 = |L(c13)| + δ3 δ2 = MAX{δ7, δ8} δ12 = |L(c12)| + δ2 δ8 = |L(c8)| + δ1 δ5 = MAX{δ12, δ13} δ3 = MAX{δ9} δ14 = |L(c14)| + δ4 δ4 = MAX{δ10} c13 c13 c13 c13 0 6 6 6 δ14 = 2 + 2 = 4 δ3 = MAX{3} = 3 δ4 = MAX{2} = 2 δ13 = 3 + 3 = 6 δ10 = 2 + 0= 2 δ2 = MAX{1, 2} = 2 δ5 = MAX{5, 6} = 6 δ8 = 2 + 0= 2 δ9 = 3 + 0= 3 δ11 = 5 + 0= 5 δ6 = MAX{5, 4} = 5 δ7 = 1 + 0 = 1 δ12 = 3 + 2 = 5 c14 c14 c14 c14 4 4 4 0 i δi Example ci |L(ci)| c7 1 1 c8 2 2 c7 c8 c8 c9 c10 c10 c11 c11 c7 c9 c9 3 3 c10 2 2 c11 5 5 c12 3 3 c13 c14 c12 c13 3 3 c14 2 2 δ1 = 0 # vars live across calls

  16. Procedure: A V  … Call B W  … …  V X  … …  W Y  … …  X Call B …  Y Procedure: A V  … T1Ψ(V) Call B V  Ψ-1(T1) W  … …  V X  … …  W Y  … …  X T1Ψ(Y) Call B Y Ψ-1(T1) …  Y Procedure: B Z  … …  Z Procedure: B Z  … …  Z V V T1 W V V W W X W X V Y Y X Z X Y T1 Z Y T1 Chromatic Number = 3 Chromatic Number = 2 Reducing the Chromatic Number 11/16

More Related