1 / 62

Taming the Complexity of Coordinated Place and Route

Taming the Complexity of Coordinated Place and Route. EECS 527. Layout Synthesis and Optimization. Taming the Complexity of Coordinated Place and Route. By Jin Hu, Myung-Chul Kim and Igor Markov Presented By: Alvin Li. Taming the Complexity of Coordinated Place and Route. Introduction

terri
Download Presentation

Taming the Complexity of Coordinated Place and Route

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Taming the Complexity of Coordinated Place and Route EECS 527. Layout Synthesis and Optimization Taming the Complexity of Coordinated Place and Route By Jin Hu, Myung-Chul Kim and Igor Markov Presented By: Alvin Li

  2. Taming the Complexity of Coordinated Place and Route • Introduction • Background • LIRE: Routing Estimation • Congestion Relief • Coordinated Place and Route • Empirical Validation • Comparison to Prior Arts • Conclusions

  3. 1. Introduction Interconnects • 3 layers • Uniform pitch • More than 3 layers • Non-uniform pitch

  4. 1. Introduction • Interconnect complexities increased since 1980s • Increased to 9-12 layers(non-uniform pitch) from 3 • Longer routing times • Lower quality of IC circuits Interconnects (From Fig. 6.17 Chapter 6 VLSI Physical Design of Integrated Circuits)

  5. 1. Introduction • Interconnects Dominate • IC Performance • Power Dissipation • Size • Signal Integrity

  6. 1. Introduction: Significance of the Paper • Global Placement & Global Routing • Standalone vs. integrated • - Signal integrity and coupling capacitances in interconnect A set of individual optimizations or one simultaneous optimization? • Streamlined System: Coordinated Place-and-Route(CoPR) • Routing estimation during placement • Placement technique that addresses three types of routing congestion • Interface to congestion elimination

  7. 2. Background – Dijkstra’s Algorithm • Also known as Maze Routing • Finds shortest path from source node to target node • Graph with non-negative edge

  8. 2. Background – Dijkstra’s Algorithm

  9. 2. Background – Dijkstra’s Algorithm

  10. 2. Background – Dijkstra’s Algorithm

  11. 2. Background – Dijkstra’s Algorithm

  12. 2. Background – Dijkstra’s Algorithm

  13. 2. Background – A* Search Algorithm • Extension of Dijkstra’s Algorithm, but faster • Estimates distance to target • Node priority: Group 2 label in Dijkstra’s Algorithm + Distance estimate, including vias, to the target node 31 Nodes vs. 6 Nodes visited

  14. 2. Background – Key Characteristics of A* Search Algorithm

  15. 2. Coordinated Place-and-Route Proposed Improvement to A* Search Algorithm: Streamlined System: Coordinated Place-and-Route(CoPR) • Cache-friendly routing primitives: estimate routing congestion • Leverages incrementality in routing and congestions updates • New categorization of congestion • New congestion-relief techniques

  16. 3. LIRE: Routing Estimation • Lightweight Incremental Routing Estimator • Congestion maps like global router • 75K nets per second (can tradeoff between quality and run time)

  17. 3. LIRE: Routing Estimation

  18. 3.1 Faster Routing • Traditional Global Routing: Maze Routing • Priority queue  complex and slow • Large history based cost • Lacks incrementality • Linear-time cache-friendly routing • Avoid priority-queue-based approaches • Avoid pointers to improve cache hit rate Bellman-Ford Algorithm

  19. 3.1 Faster Routing – Bellman Ford Algorithm Bellman – Ford Algorithm(1958) • Slower than Dijkstra’s Algorithm • E * O(1) relaxation steps • Goes through all nodes • Relaxes all edges instead of greedily selecting minimum weight node not yet processed to relax • Calculates all path and repeat (N-1) times (N = number of vertices) • Visits nodes randomly

  20. 3.1 Faster Routing – Bellman Ford Algorithm Bellman – Ford Algorithm(1958)

  21. 3.1 Faster Routing – Bellman Ford Algorithm Monotonic Routing with One Linear-Time BF Pass • Consider only forward edges • Only consider the space bounded by S and T • Visit in order, going through each node once  runtime complexity is O(N) (N = number of nodes in the space bounded by S and T)

  22. 3.1 Faster Routing – Bellman Ford Algorithm Non-monotonic Routing with One Linear-Time BF Pass • Duplex-edge relaxation: relaxation in both directions • Echo-relaxation: propagate smaller cost through all recently relaxed edge incident to the point • Effective in detouring short nets (majority of nets are short)

  23. 3.1 Faster Routing – Bellman Ford Algorithm Non-monotonic Routing with One Linear-Time BF Pass

  24. 3.1 Faster Routing – Bellman Ford Algorithm Non-monotonic Routing with One Linear-Time BF Pass

  25. 3.1 Faster Routing – Bellman Ford Algorithm Non-monotonic Routing with One Linear-Time BF Pass

  26. 3.1 Faster Routing – Bellman Ford Algorithm Non-monotonic Routing with One Linear-Time BF Pass

  27. 3.1 Faster Routing – Bellman Ford Algorithm Non-monotonic Routing with One Linear-Time BF Pass

  28. 3.1 Faster Routing – Bellman Ford Algorithm Non-monotonic Routing with One Linear-Time BF Pass

  29. 3.1 Faster Routing – Bellman Ford Algorithm Non-monotonic Routing with One Linear-Time BF Pass

  30. 3.1 Faster Routing – Bellman Ford Algorithm Non-monotonic Routing with One Linear-Time BF Pass

  31. 3.1 Faster Routing – Bellman Ford Algorithm Non-monotonic Routing with One Linear-Time BF Pass

  32. 3.1 Faster Routing – Bellman Ford Algorithm Non-monotonic Routing with One Linear-Time BF Pass

  33. 3.1 Faster Routing – Bellman Ford Algorithm Non-monotonic Routing with One Linear-Time BF Pass

  34. 3.1 Faster Routing • Bellman-Ford with Yen’s improvement (1970) • J.Y. Yen suggested reversing the node ordering between BF passes • Reduces the number of passes required to find optimal path • BFY finds optimal paths faster than A*-search for most nets in the experiment (Theorem 1)

  35. 3.1 Faster Routing – Bellman Ford Algorithm Bellman-Ford with Yen’s improvement • First forward pass finds optimal monotonic path

  36. 3.1 Faster Routing – Bellman Ford Algorithm Bellman-Ford with Yen’s improvement • Backward pass finds a detour

  37. 3.1 Faster Routing – Bellman Ford Algorithm Bellman-Ford with Yen’s improvement • Second forward pass finds optimal path

  38. 3.1 Faster Routing • Bellman-Ford with Yen’s improvement (1970) • With m passes, runtime complexity is O(mN) (N = number of nodes in the space bounded by S and T) • Limit m to reduce runtime • Small loss of optimality • Focus on incremental calls to BFY • Incremental Routing with BFY • Records partial costs along an existing route to reduce runtime(rip-up-and-reroute and repeated invocations of LIRE during placement) • Faster!

  39. 3.1 Faster Routing – Bellman Ford Algorithm Incremental Routing with BFY • Initial route with BFY

  40. 3.1 Faster Routing – Bellman Ford Algorithm Incremental Routing with BFY • Through relaxation, BFY preserve part of the route • and find a better partial segment

  41. 4. Congestion Relief • Main Goal: To increase the porosity of placement regions with high routing congestion • How? • After global placement, shift cell locations and use congestion driven detailed placement • During global placement, inflate cells based on early congestion estimates and pin density

  42. 4. Congestion Relief Traditional ways are insufficient: • After global placement, shift cell locations and use congestion driven detailed placement • Must preserve the structure of resulting placement or risk unbearable deterioration of interconnect length • During global placement, inflate cells based on early congestion estimates and pin density • When they move outside the congest region, new cells must be inflated, which may consume all whitespace without solving root cause

  43. 4. Congestion Relief – Further Analysis • 3 Types of Routing Congestion: • Cell based congestion caused by cell-to-cell proximity • Local layout-based congestion caused by static design properties, such as blockages and reduced routing capacities • Remotely-induced layout based congestion attributed to non-local factors such as long net

  44. 4. Congestion Relief – Further Analysis • Cell based congestion caused by cell-to-cell proximity • Mitigated by cell inflation(only top5% most congested GCellsto avoid exhausting whitespace) • Local layout-based congestion caused by static design properties, such as blockages and reduced routing capacities • Locally inject whitespace(move cells out of congested region) • Remotely-induced layout based congestion attributed to non-local factors such as long net • Enforce non-uniform target density by: • i) Creating a packing peanut(fixed cell) at the center of every GCell • ii) Modify its size based on congestion

  45. 5. Coordinated Place and Route Integration of Routing and Placement • Incremental placement updates • After its first invocation, LIRE maintains the overall congestion map and keeps track of the GCells traversed by each point by point connection • In next invocation, if the endpoints remain the same, it is left unchanged • Has pronounced effect in later iterations and during detailed placement, when locations are stabilized

  46. 5. Coordinated Place and Route Integration of Routing and Placement • Incremental-routing updates • When invoked for first time, LIRE generates routes from scratch. • After that, it reuses existing routes where possible • Nets whose terminals relocated to different Gcells are rerouted using the original net ordering • Remaining nets are checked if their routes are congested, and it is mitigated by single incremental BFY passes • Replicates accuracy of maize router, but a better runtime

  47. 6. Empirical Validation Verifying Result • Implemented in CoPR in C++ using the OpenMP library, compiled with g++4.7.0 • Global placer derived from SimPL • Used by three of the top four teams at the ICCAD 2012 Contest • Reported on the ICCAD 2012 benchmark by IBM researchers

  48. 6. Empirical Validation • Based on same run-time, CoPR outperforms the finalists of ICCAD 2012 Contest by 7% and 2% in quality metrics. It is 5.7 faster than another contestant with same quality. • With respect to scoring formulas used at the ICCAD 2012 Contest, CoPR outperforms the winner.

  49. 7. Comparisons to Prior Art • Fast Routing:“A Fast Maze-free Routing Congestion Estimator With Hybrid Unilateral Monotonic Routing” by W.-H. Liu, Y.-L. Li and C.-K. Kok • Replaces A* - Search with fast linear-time routing algorithms that exploit a different notion of monotonic routes • Uses multiple passes to find non-monotonic routes and does not claim optimality • Doesn’t consider CPU cache effects and the connection with BFY • Not used to drive competitive global placer in comparison to the successful results for coordinated place-and-route by CoPR • CoPR’s authors completed their work before this paper was published or made available

  50. 7. Comparisons to Prior Art • Fast Routing:“BonnTools: Mathematical Innovation for Layout and Timing Closure of Systems on a Chip” by B. Korte, D. Rautenbach and J. Vygen • Speeds up Dijkstra’s algorithm with sophisticated data structures and algorithms • Uses more memory for advanced data structure and requires significant up-front set-up • Singled-threaded version of LIRE takes <15% of runtime in the entire place-and-route flow • CoPR’s authors avoided sophisticated routing algorithms and data structures

More Related