1 / 31

Optimality, Scalability and Stability study of Partitioning and Placement Algorithms

This study examines the optimality, scalability, and stability of partitioning and placement algorithms in VLSI design. The authors present construction examples with known upper bounds and explore the room for further improvement. The study also compares existing heuristics and algorithms to understand their suboptimality.

freddiep
Download Presentation

Optimality, Scalability and Stability study of Partitioning and Placement Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimality, Scalability and Stability study of Partitioning and Placement Algorithms Jason Cong, Michail Romesis, Min Xie UCLA Computer Science Department This work is partially supported by Semiconductor Research Corporation and National Science Foundation

  2. Overview • Motivation and related work • Our contribution • Construction of Partitioning Examples with Known Upper bound • Construction of Placement Examples with Known Upper bound • Optimality, Scalability and Stability study • Conclusions and future work

  3. Overview • Motivation and related work • Our contribution • Construction of Partitioning Examples with Known Upper bound • Construction of Placement Examples with Known Upper bound • Optimality, Scalability and Stability study • Conclusions and future work

  4. Motivation Partitioning • Significant progress in partitioning during the mid-to-late 90’s • No significant improvement in the last 5 years • Have we reached a plateau?

  5. Motivation Placement • Lack of significant progress in wirelength reduction • Rate of reduction is about 5-10% every 2-3 years • Latest developments in placement differ mainly in runtime • Capo [A. Caldwell et al, 2000] • Dragon [M. Wang et al, 2000] • Mongrel [S. Hur et al, 2000] • mPL [T. Chan et al, 2000] • mPG [C. Chang et al, 2002] • How much is the room for further improvement?

  6. Motivation • Most work compare only with known heuristics • Use real design based benchmarks • ISPD98 [C. Alpert 1998] • WSI [D. Ghosh et al, 1997] • Use synthetic benchmarks • circ and gen [M. D. Hutton et al, 1998] • gnl [D. Stroobandt et al, 2000] • Little understanding about the divergence from the optimal

  7. x x x x x x x x x x Related Work • Quantified Suboptimality of VLSI Layout Heuristics [L. Hagen et al, 1995] • Construct scaled instance with known upperbound from an initial problem ? • Over 10% area suboptimality in TimberWolf • Notable wirelength suboptimality in GORDIAN-L • Significant improvement was possible for placement and partitioning • But test cases are small, the largest netlist is less than 40K

  8. Related Work • Optimality and Scalability of Existing Placement Algorithms [C. Chang et al, 2003] • Construct instances with known optimal using the characteristic of the original problem ? • Existing placement algorithms can be 70% to 150% away from the optimal • Average solution quality deteriorates by an additional 4% to 25% when the problem size increases by a factor of 10 • All the connections are local, no global connections

  9. Overview • Motivation and related work • Our contribution • Construction of Partitioning Examples with Known Upper bound • Construction of Placement Examples with Known Upper bound • Optimality, Scalability and Stability study • Conclusions and future work

  10. P1 P2 • Create two partitions of size 8 • Generate 9 2-pin nets that do not cross the partition line A B • Generate 3 2-pin nets that cross the partition line • Generate 6 3-pin nets that do not cross the partition line • Generate 2 3-pin nets that cross the partition line C D BEKU Construction Example Input: t = 16, D={12,8} B = 5 • Cutsize improved to 4 after FM • Cutsize = 5

  11. Construction of Multiway Partitioning Examples with Known Upper Bounds (MEKU) • Divide the nodes into m partitions of equal size • Create B nets that cross at least two partitions. The remaining nets stay in one partition • Improve by multiway FM

  12. BEKU and MEKU Suite • 2-way partitions occupy 45-55% of the total area • 8-way partitions occupy 11.8-13.3% of the total area URL : http://cadlab.cs.ucla.edu/~pubbench/partitioning/

  13. Tested three State-of-the-Art Partitioning Tools • hMetis [G. Karypis et al, 1997] • Based on multilevel framework • MHEC and FC clustering algorithms • Variations of FM for refinement at each level • MLPart [A. Caldwell et al, 2000] • Based on multilevel framework • Different algorithms for coarsening (PinEC) and refinement (VRW) • Flare [J. Cong et al, 2000] • Two-level hierarchy created by the ESC clustering algorithm • Based on the LR bipartitioning engine and the PM multiway partitioning framework

  14. Experimental Results on BEKU • MLPart produces the best results (very close to our estimated upper bound), and Flare the worst • The value of the bound (as a percentage of nets) influences the quality of hMetis and Flare

  15. Experimental Results on BEKU • The runtime scale well (almost linearly) • Flare runs out of memory when problem size exceeds 1M nodes

  16. Experimental Results on MEKU • hMetis is worse by only 2% when the initial bound is 30%, but the gap increases to 18% for a bound of 35% • MLPart does not support multiway partitioning

  17. Placement Examples with Global Connections • Produced by Dragon on ISPD98 • The wirelength contribution from global connections can be significant! • Need to consider the impact of global connections

  18. Placement Examples with Global Connections only • Each net connects either a row or column • Obvious upper bound • Sum the length of each row and column • Similar to datapath examples

  19. Placement Examples with Non-local Connections • Extend PEKO [ C.Chang 2003] by introducing non-local nets to mimic global connections • All the modules are of equal size, and there is no space between rows and adjacent modules • For nets of degree i, *diof them are generated by randomly conneting i modules, the rest are generated optimally as in PEKO

  20. Generate 28 2-pin optimally Generate 6 2-pin randomly Generate 16 3-pin optimally Generate 4 3-pin randomly Generate 6 4-pin randomly Generate 1 4-pin randomly Generate 4 5-pin optimally Generate 2 6-pin optimally Generate 1 7-pin optimally Placement Examples with Non-local Connections Input : t = 64, D = {d2=34,d3=20,d4=7,d5=4,d6=2, d7=1} =0.2 Total WL = 160

  21. G-PEKU Suite • Module number extracted from ISPD98 URL: http://cadlab.cs.ucla.edu/~pubbench/peku.htm

  22. PEKU Suite • Module number t and NDVs extracted from ISPD98 • Remove connections with pads • Vary  from 0 to 10% • 15% white space by expanding one dimension of the chip

  23. PEKU Suite URL: http://cadlab.cs.ucla.edu/~pubbench/peku.htm

  24. Tested four State-of-the-Art Placers • Capo [A. Caldwell et al, 2000] • Based on multilevel partitioner • Aims to enhance the routability • Dragon [M. Wang et al, 2000] • Uses hMetis for initial partition • SA with bin-based swapping • mPL [T. Chan et al, 2000] • Nonlinear programming on the coarsest level • Goto based relaxation • mPG [C. Chang et al, 2002] • Uses FC clustering and hierarchical density control • Incremental A-tree for routability

  25. Experimental Results on G-PEKU • The gap between their solutions and the upper bound varies between 79% and 102% in the worst case • Another validation that there is significant room for improvement for the placement problem

  26. Experimental Results on PEKU • mPL’s QR increases when  is increased from 0 to 0.75%, while for the other three placers, QRs are steadily decreasing • Absolute value of the QRs may not be meaningful, but it helps to identify the technique that works best under each scenario

  27. Overview • Motivation and related work • Our contribution • Partitioning Examples with Known Upper bound • Placement Examples with Known Upper bound • Optimality, Scalability and Stability study • Conclusions and future work

  28. Conclusions • Bipartitioning techniques seem fairly mature • The best available algorithms perform and scale very well on examples by our construction • The best available multiway partitioning algorithms do not perform equally well • The worst divergence from upperbound is 18% by hMetis • There is still significant room for improvement in circuit placement • Existing placement algorithms may produce solutions far away from the optimal (or upper bound) • Their effectiveness depends much on the characteristic of circuits

  29. Future Work • Construction of more synthetic examples • Measure routability optimality • Measure timing optimality • Understand the deficiencies of existing algorithms using these examples • Guide the development of new VLSI CAD algorithms

  30. Acknowledgement • Prof. I. Markov for providing Capo’s latest version • Prof. S. Lim for providing Flare’s latest version • X. Yuan for providing the data of mPG • J. Shinnerl and K. Sze for providing the experimental data of mPL

  31. THE END THANK YOU

More Related