1 / 32

Yilin Zhang and David Z. Pan ECE, Univ. of Texas at Austin

ISPD’ 2014. Timing-Driven, Over-the-Block Rectilinear Steiner Tree Construction with Pre-Buffering and Slew Constraints. Yilin Zhang and David Z. Pan ECE, Univ. of Texas at Austin. Outline. Background & Motivation T OB-RSMT Problem Formulation T OB-RSMT Algorithms

ehren
Download Presentation

Yilin Zhang and David Z. Pan ECE, Univ. of Texas at Austin

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ISPD’ 2014 Timing-Driven, Over-the-Block Rectilinear Steiner Tree Construction with Pre-Buffering and Slew Constraints Yilin Zhang and David Z. Pan ECE, Univ. of Texas at Austin

  2. Outline • Background & Motivation • TOB-RSMT • Problem Formulation • TOB-RSMT Algorithms • Experimental Results • Conclusion

  3. History of VLSI RSMTs • Wirelength driven: BOI, BI1S, RV-based RST, FLUTE and GeoSteiner • Obstacle-avoiding RSMT (OA-RSMT) • [Chow+, VLSI14] [Liu+, DAC12][Li+, ICCAD08] • Over-the-block RSMT (OB-RSMT) are proposed since 2012 • [Huang+, ICCAD12] [Zhang+, ICCAD12] • Minimum delay routing tree (MDRT) : BA-Tree, etc. • RAT-driven RSMT: C-Tree, etc.

  4. Limitations on Previous Timing-driven RST • Cluster nodes during bottom-up method • Such as BA-Tree and C-Tree • Clustering distance metric: • spatial and slack Hard to find accurate slack: Some segments are not fixed yet All segments are not buffered yet

  5. Limitations in Dealing Blocks • Completely neglect block will have slew problem • No over-the-block buffer allowed • Obstacle avoiding • More congested outside-block • Detour means more WL and worse timing detours

  6. Post-buffering Topology Tuning is Necessary • Buffering plays a big role in delay reduction • Shielding effect; linear delay on long wire • But it is always placed after wiring • Change topology after buffering is fruitful! DSA decreased Db2 DSB unchanged

  7. Our Contributions • Use pre-buffering to find practical slack for each node in the graph • Use over-the-block routing resource to improve WL, buffering cost and timing • Apply post-buffering tuning to improve timing on critical paths with little extra cost

  8. Outline • Background & Motivation • TOB-RSMT • Problem Formulation • TOB-RSMT Algorithms • Experimental Results • Conclusion

  9. Problem Formulation • N = {s0,s1,s2,...,sn}, n sinks and source s0 • B = {b1, b2, . . . , bm}, non-overlapping rectilinear blocks in two-dimensional space R • Buffered T(V, E) connects all the pins in N to optimize WNS with the lowest buffering cost • V is the set of nodes • E is the set of horizontal and vertical edges. • Slew rate on every point in T within constraints • Slew mode buffering [Hu+, TCAD07] • No buffers are allowed over the blocks

  10. Timing Models • Elmore Delay • Slew • Peri Model + Bakoglu’s Metric • ( 4% error [Kashyap+, ISPD03] [Bakoglu+, 90] )

  11. Overall Algorithm N & B Initial timing-driven RST with Pre-buffering Find all over-the-block slew violation and fix them Buffering Tune the topology according to buffering information Return buffered T Buffering

  12. Initial Tree Generation with Pre-Buffering • Iterative method • Until converges or oscillates between several states • Feed back real delay to each node to find slack (criticality) • Identified critical sinks before topology construction are real critical ones • Practical slack on each node

  13. Initial Tree with Pre-Buffering Flow [Lin+, TCAD11]

  14. Initial Tree with Pre-Buffering Example Now, D is inserted far from source with less WL Simple model without buffering suggests D is critical However, with buffering, D is not critical

  15. Buffering-Aware Over-the-Block TD-RST • TD-RST needs over-the-block route • Better WL, buffer resources and timing • Replace obstacle-avoiding detours with shorter over-the-block connection 150ps 100ps 110ps 120ps

  16. Different with WL-driven BOB-RSMT • Move non-critical paths to save slew • Protect critical paths for timing Original WL driven WL+slack

  17. Slew Constraints in Buffering-Aware TD-RST • The hard problem with over-the-block is slew • Each topology confines a set of inside trees • Use hypothetic buffer to check if it is possible for buffering

  18. Optimization Primitives • Three optimization primitives Parallel sliding Perpendicular sliding EP merging [Zhang, ICCAD12]

  19. Formulation of Buffering-Aware TD-RST • Formulation consider slack and WL together Increase of TNS Increase of WL WijCdEPit: delay increase for every sink downstream EPit

  20. Buffer-location-based Tuning Benefits • Tuning topology after buffering benefits! • Buffering resources are costly • Improve timing without increasing buffers is tempting • With small amount of WL increase • We propose a way to post-tune the topology base on buffer location information

  21. Saturated/Un-saturated Buffers • Some buffers are “Saturated” and some are “Un-saturated” • Saturate: the slew reaches maximum • Un-saturated: slew does not reach maximum

  22. Buffer-location-based Tuning Study • Un-saturated buffer == opportunity WL increase Delay to A improves

  23. Buffer-location-based Tuning Condition • Δslew = slewmax – slewcur • Lmax is the max allowed distance to relocate • If neglecting buffer input cap, Lmax = • If consider buffer input cap, Lmax =

  24. Buffer-location-based Tuning Flow Buffered T Sort all sinks according to slack For each neg slack sink n Y n at source? Continue N n = n.parent Buffering satisfy Lmax constraint ? Return buffered T Tuning

  25. Outline • Background & Motivation • TOB-RSMT • Problem Formulation • TOB-RSMT Algorithms • Experimental Results • Conclusion

  26. Experimental Setups • C++ programming language • Intel Core 3.0GHz Linux machine with 32GB memory • Gurobi Optimizer 5.10 for mathematical optimization • RC01-RC12 are benchmarks [Feng+, ISPD06] • Two sizes of buffers: 450 ohms and 850 ohms, 3.8 fF and 1.9 fF • Interconnect RC from ITRS and slew constraints 70ps

  27. Experimental Setups • SD-OARST is baseline [Lin+, TCAD11] • TOB-RST-1 OA-RST with pre-buffering • TOB-RST-2 is over-the-block with pre-buffering • TOB-RST is over-the-block with pre-buffering and post-buffering tuning

  28. Experimental Results • TOB-RST-1 to SD-OARST • similarity of WL (buffering cost) • pre-buffering benefits the slack • TOB-RST-2 to TOB-RST-1: • 179ps on average for WNS • buffering cost and WL reduced by 6% and 5% • TOB-RST to TOB-RST-2: • 70ps in WNS on average, less than 1% more WL

  29. Experimental Results

  30. Outline • Background & Motivation • TOB-RSMT • Problem Formulation • TOB-RSMT Algorithms • Experimental Results • Conclusion

  31. Conclusion • Timing-driven over-the-block rectilinear Steiner minimum tree • Use pre-buffering to find practical slack for each node • Use over-the-block routing resources to improve WL, buffering cost and timing • Apply post-buffering tuning to improve timing on critical paths with little extra cost • Significantly improve WNS for all benchmarks along with 2% less WL and 4% less buffering cost than SD-OARST

  32. Acknowledgment • This work is supported in part by Oracle • Thanks to Dr. SalimChowdhury, Dr. Rajendran Panda and Dr. Akshay Sharma from Oracle Thank you! Questions?

More Related