Optimality, Scalability and Stability study of Partitioning and Placement Algorithms

Optimality, Scalability and Stability study of Partitioning and Placement Algorithms Jason Cong, Michail Romesis, Min Xie UCLA Computer Science Department This work is partially supported by Semiconductor Research Corporation and National Science Foundation

Overview • Motivation and related work • Our contribution • Construction of Partitioning Examples with Known Upper bound • Construction of Placement Examples with Known Upper bound • Optimality, Scalability and Stability study • Conclusions and future work

Motivation Partitioning • Significant progress in partitioning during the mid-to-late 90’s • No significant improvement in the last 5 years • Have we reached a plateau?

Motivation Placement • Lack of significant progress in wirelength reduction • Rate of reduction is about 5-10% every 2-3 years • Latest developments in placement differ mainly in runtime • Capo [A. Caldwell et al, 2000] • Dragon [M. Wang et al, 2000] • Mongrel [S. Hur et al, 2000] • mPL [T. Chan et al, 2000] • mPG [C. Chang et al, 2002] • How much is the room for further improvement?

Motivation • Most work compare only with known heuristics • Use real design based benchmarks • ISPD98 [C. Alpert 1998] • WSI [D. Ghosh et al, 1997] • Use synthetic benchmarks • circ and gen [M. D. Hutton et al, 1998] • gnl [D. Stroobandt et al, 2000] • Little understanding about the divergence from the optimal

x x x x x x x x x x Related Work • Quantified Suboptimality of VLSI Layout Heuristics [L. Hagen et al, 1995] • Construct scaled instance with known upperbound from an initial problem ? • Over 10% area suboptimality in TimberWolf • Notable wirelength suboptimality in GORDIAN-L • Significant improvement was possible for placement and partitioning • But test cases are small, the largest netlist is less than 40K

Related Work • Optimality and Scalability of Existing Placement Algorithms [C. Chang et al, 2003] • Construct instances with known optimal using the characteristic of the original problem ? • Existing placement algorithms can be 70% to 150% away from the optimal • Average solution quality deteriorates by an additional 4% to 25% when the problem size increases by a factor of 10 • All the connections are local, no global connections

Overview • Motivation and related work • Our contribution • Construction of Partitioning Examples with Known Upper bound • Construction of Placement Examples with Known Upper bound • Optimality, Scalability and Stability study • Conclusions and future work

P1 P2 • Create two partitions of size 8 • Generate 9 2-pin nets that do not cross the partition line A B • Generate 3 2-pin nets that cross the partition line • Generate 6 3-pin nets that do not cross the partition line • Generate 2 3-pin nets that cross the partition line C D BEKU Construction Example Input: t = 16, D={12,8} B = 5 • Cutsize improved to 4 after FM • Cutsize = 5

Construction of Multiway Partitioning Examples with Known Upper Bounds (MEKU) • Divide the nodes into m partitions of equal size • Create B nets that cross at least two partitions. The remaining nets stay in one partition • Improve by multiway FM

BEKU and MEKU Suite • 2-way partitions occupy 45-55% of the total area • 8-way partitions occupy 11.8-13.3% of the total area URL : http://cadlab.cs.ucla.edu/~pubbench/partitioning/

Tested three State-of-the-Art Partitioning Tools • hMetis [G. Karypis et al, 1997] • Based on multilevel framework • MHEC and FC clustering algorithms • Variations of FM for refinement at each level • MLPart [A. Caldwell et al, 2000] • Based on multilevel framework • Different algorithms for coarsening (PinEC) and refinement (VRW) • Flare [J. Cong et al, 2000] • Two-level hierarchy created by the ESC clustering algorithm • Based on the LR bipartitioning engine and the PM multiway partitioning framework

Experimental Results on BEKU • MLPart produces the best results (very close to our estimated upper bound), and Flare the worst • The value of the bound (as a percentage of nets) influences the quality of hMetis and Flare

Experimental Results on BEKU • The runtime scale well (almost linearly) • Flare runs out of memory when problem size exceeds 1M nodes

Experimental Results on MEKU • hMetis is worse by only 2% when the initial bound is 30%, but the gap increases to 18% for a bound of 35% • MLPart does not support multiway partitioning

Placement Examples with Global Connections • Produced by Dragon on ISPD98 • The wirelength contribution from global connections can be significant! • Need to consider the impact of global connections

Placement Examples with Global Connections only • Each net connects either a row or column • Obvious upper bound • Sum the length of each row and column • Similar to datapath examples

Placement Examples with Non-local Connections • Extend PEKO [ C.Chang 2003] by introducing non-local nets to mimic global connections • All the modules are of equal size, and there is no space between rows and adjacent modules • For nets of degree i, *diof them are generated by randomly conneting i modules, the rest are generated optimally as in PEKO

Generate 28 2-pin optimally Generate 6 2-pin randomly Generate 16 3-pin optimally Generate 4 3-pin randomly Generate 6 4-pin randomly Generate 1 4-pin randomly Generate 4 5-pin optimally Generate 2 6-pin optimally Generate 1 7-pin optimally Placement Examples with Non-local Connections Input : t = 64, D = {d2=34,d3=20,d4=7,d5=4,d6=2, d7=1} =0.2 Total WL = 160

G-PEKU Suite • Module number extracted from ISPD98 URL: http://cadlab.cs.ucla.edu/~pubbench/peku.htm

PEKU Suite • Module number t and NDVs extracted from ISPD98 • Remove connections with pads • Vary  from 0 to 10% • 15% white space by expanding one dimension of the chip

PEKU Suite URL: http://cadlab.cs.ucla.edu/~pubbench/peku.htm

Tested four State-of-the-Art Placers • Capo [A. Caldwell et al, 2000] • Based on multilevel partitioner • Aims to enhance the routability • Dragon [M. Wang et al, 2000] • Uses hMetis for initial partition • SA with bin-based swapping • mPL [T. Chan et al, 2000] • Nonlinear programming on the coarsest level • Goto based relaxation • mPG [C. Chang et al, 2002] • Uses FC clustering and hierarchical density control • Incremental A-tree for routability

Experimental Results on G-PEKU • The gap between their solutions and the upper bound varies between 79% and 102% in the worst case • Another validation that there is significant room for improvement for the placement problem

Experimental Results on PEKU • mPL’s QR increases when  is increased from 0 to 0.75%, while for the other three placers, QRs are steadily decreasing • Absolute value of the QRs may not be meaningful, but it helps to identify the technique that works best under each scenario

Overview • Motivation and related work • Our contribution • Partitioning Examples with Known Upper bound • Placement Examples with Known Upper bound • Optimality, Scalability and Stability study • Conclusions and future work

Conclusions • Bipartitioning techniques seem fairly mature • The best available algorithms perform and scale very well on examples by our construction • The best available multiway partitioning algorithms do not perform equally well • The worst divergence from upperbound is 18% by hMetis • There is still significant room for improvement in circuit placement • Existing placement algorithms may produce solutions far away from the optimal (or upper bound) • Their effectiveness depends much on the characteristic of circuits

Future Work • Construction of more synthetic examples • Measure routability optimality • Measure timing optimality • Understand the deficiencies of existing algorithms using these examples • Guide the development of new VLSI CAD algorithms

Acknowledgement • Prof. I. Markov for providing Capo’s latest version • Prof. S. Lim for providing Flare’s latest version • X. Yuan for providing the data of mPG • J. Shinnerl and K. Sze for providing the experimental data of mPL

THE END THANK YOU

Optimality, Scalability and Stability study of Partitioning and Placement Algorithms