An Analytic Placer for Mixed-Size Placement and Timing-Driven Placement

An Analytic Placer for Mixed-Size Placement and Timing-Driven Placement Andrew B. Kahng and Qinke Wang UCSD CSE Department {abk, qiwang}@cs.ucsd.edu Work partially supported by the MARCO Gigascale Systems Research Center, NSF MIP-9987678 and the Semiconductor Research Corporation.

Motivation • Mixed-size placement • design productivity increasingly requires IP reuse • processing / interface cores, embedded memories, etc. • “boulders and dust” challenge:sizes of placeable objects can vary by factors of 10,000 or more • placement is particularly complex in fixed-die context • Timing-driven placement • more critical with device and interconnect scaling

Our Work • APlace[Kahng/Wang ISPD04]: an analytic placer for wirelength-driven standard-cell placement • [Naylor et al., US Patent 6301693, 2001] • superior wirelength quality compared to Cadence QPlace, Dragon and Capo • strong extensibility: congestion-directed placement, I/O-core co-placement, constraint handling for mixed-signal, etc. • poor scalability: average 13.2 X slower than Capo • This work: extend APlace to address mixed-size placement and timing-driven placement

Outline • APlace Background • Extension to Mixed-Size Placement • Extension to Timing-Driven Placement • Conclusions and Ongoing Work

Outline • APlace Background • Formulations • wirelength minimization • cell spreading = density control • Implementation • Extension to Mixed-Size Placement • Extension to Timing-Driven Placement • Conclusion and Ongoing Work

Wirelength Formulation • Placement objective: HPWL • Smooth approximation Naylor et al., US Patent 6301693, 2001 • log-sum-exp formula: pick the most dominant terms among pin coordinates •  : smoothing parameter • closer to HPWL when α→ 0 • precise • strictly convex • continuously differentiable

Density Control • Common strategy • divide the placement area into grids • equalize the total cell area in each grid • Penalty of an uneven cell distribution • not smooth or differentiable • difficult to optimize

p(d) 2 2 1-2d /r 2 2 2(r-d) /r d r r/2 r/2 r Cell Potential Function • Bell-shaped cell potential function [Naylor et al., US Patent 6301693, 2001] • Cell c has potential(c, g) with respect to grid g • Cell c at (x, y) has area A • Grid point g = (x', y') • p(d) : bell-shaped function • r : the radius of cells' potential • C : a proportionality factor, s.t.

Implementation • Cells are spread by minimizing the smooth density penalty function • APlace combines the above two objectives and optimizes the following function using a Conjugate Gradient optimizer: • Density term drives cell spreading • Wirelength term draws connected components back toward each other

Wirelength vs. Density Objectives • Density weight: fixed • larger  spread cells out hastily without good wirelength • Wirelength weight: variable • larger  contract cells together and prevent them from spreading out • initially set to be large • repeat until all cells are spread out evenly: • execute conjugate-gradient solver until convergence • reduce the weight by half Objective:

Outline • APlace Background • Extension to Mixed-Size Placement • Density control for macros • Legalization • Experimental results • Extension to Timing-Driven Placement • Conclusion and Ongoing Work

Previous Works • Capo flow: a three stage placement-floorplanning-placement flow that uses Capo [Adya et al., ISPD02, ICCAD03] • mPG-MS: a simulated annealing based multi-level placer[Chang et al., ASPDAC03] • Feng Shui: a recursive bisection based placement tool using fractional cuts[Khatkhate et al., ISPD04]

Potential Function for Macros (I) • Each module has a potential or influence with respect to nearby grids • APlace seeks to equalize the total module potential at each grid • rm is the radius of module’s potential • Standard-cell placement: rm is a constant r • Mixed-size placement: rm changes according to the module's dimension • A larger block will have potential with respect to more nearby grids

p(d) 2 1-a*d 2 b*(r-d) d w/2+r w/2+r/2 w/2+r/2 w/2+r Potential Function for Macros (II) • p(d) : potential function d : distance from module to grid • Radius rm = w/2 + r for a block with width w • Convex curved < w/2 + r/2 • Concave curvew/2 + r/2 < d < w/2+ r • p(d) is smooth atd = w/2 + r/2

Legalization • Simplified Tetris algorithm[Hill, US Patent 6370673, 2002] • sort modules based on a linear combination of vertical coordinate and width • search the current nearest available position for each module • Pros and cons •  fast •  larger blocks are fixed at a position ahead of nearby small cells •  best applied when modules are distributed evenly •  may fail if the global placement has many overlaps among macros

circuit APlace-MS detailed placement WL WL_l inc. (%) CPU WL_dp impr. (%) CPU ibm01 0.20 0.24 18.5 15 0.23 5.7 1 ibm02 0.51 0.52 0.7 45 0.50 2.5 3 ibm03 0.70 0.74 6.2 56 0.72 3.5 3 ibm04 0.81 0.85 4.8 48 0.83 2.8 4 ibm05 1.01 1.00 -0.5 15 0.98 2.0 5 ibm06 0.65 0.71 9.6 76 0.68 4.4 5 ibm07 1.03 1.09 5.8 98 1.05 3.7 8 ibm08 1.49 1.50 0.6 128 1.46 2.7 8 ibm09 1.25 1.45 15.7 113 1.38 5.2 9 ibm10 2.97 3.07 3.3 206 3.00 2.2 11 APlace-MS Results • Ten ISPD02 Mixed-Size Benchmarks (10K-70K cells) • Average wirelength increase after legalization: 6.5% Detailed placement by Feng Shui: 3.5% avg. WL improvement

HPWL Comparison • Capo flow[ICCAD03] 26.0% (11.5% ~ 34.0%) • mPG-MS [ASPDAC03]24.7% (9.9% ~ 40.1%) • Feng Shui [ISPD04] 4.0% (-7.3% ~ 20.0%) • Runtime • Xeon server (2.4GHz CPU, double-threaded) • much slower than Feng Shui

Placements Before and After Legalization

Outline • APlace Background • Extension to Mixed-Size Placement • Extension to Timing-Driven Placement • Slack-derived edge weights • Timing-driven placement flow • Experimental results • Conclusion and Ongoing Work

Timing-Driven Approaches • Path based methods • consider all or a subset of paths directly • maintain an accurate timing view during optimization • complexity is relatively high • Net based methods • transform timing constraints or requirements into either net weight or net length (or delay) constraints

Net Based Methods • Delay budgeting • distribute slacks from the end-points to constituent nets along the path • may severely over-constrain the problem without consideration of physical feasibility • Net weighting • assign weights to nets based on timing criticality • low complexity, strong flexibility and easy implementation • more attractive as circuit sizes increase and timing constraints become more complex

Slack-Derived Edge Weights • Net weighting in TD-APlace • β: timing criticality exponent • slack(π) : the slack of path π • T : longest path delay • Heavy net weights are assigned to: • timing critical nets  exponential function [Marquardt et al. 2000] • nets included in many critical paths [Kong ICCAD02]

Timing-Driven Placement Flow • Final placement stage • TrialRoute (SoC Encounter v3.2): a fast global and detailed routing • Extract RC • Pearl (SE v5.4): static timing analysis (STA) • Import critical path delays to decide net weights • Minimize weighted WL objective

Timing Results: Indust1 Testcase • Indust1: ~ 7k cells • Xeon 2.4GHz CPU, double-threaded • Minimum cycle time • measures quality of TD placements • initially decreases with criticality exponent • gradually deteriorates as criticality exponent continues to increase Results with varying criticality exponents (β)

Comparison vs. Industry Placers (I) • Two industry placers • QPlace (SE v5.4) • amoebaPlace (SoC Encounter v3.2) • Six industry circuits • 7k ~ 40k cells • two from the ISPD 2001 Circuit Benchmarks • Experimental flow • TD or non-TD placements • WarpRoute (SoC Encounter v3.2) : timing-driven routing • Extract RC • Pearl (SE v5.4): static timing analysis (STA)

Comparison vs. Industry Placers (II) • Comparison to TD-QPlace and TD-amoebaPlace • Final HPWL • TD-QPlace: 7.2%(-1.2% ~ 7.1%) • TD-amoebaPlace: 6.5%(-11.1% ~ 23.2%) • Min Cycle • TD-QPlace: 9.6%(-1.2% ~ 14.8%) • TD-amoebaPlace: 8.5%(-0.8% ~ 28.5%) • APlace: 2%(0.1% ~ 3.8%)

Conclusions • APlace analytic placement framework extended to address mixed-size and timing-driven placement • Mixed-size placement • HPWL outperforms mPG-MS, Feng Shui and the Capo flow respectively by 24.7%, 4.0% and 26.0% on average • Timing-driven placement • Minimum cycle time outperforms that of TD-QPlace and TD-amoebaPlace respectively by 9.6% and 8.5% • Routed WL outperforms that of TD-QPlace and TD-amoebaPlace respectively by 7.2% and 6.5%

Ongoing Work • Scalability issue • APlace currently does not scale to large instances • control scheme for larger circuits • Augmented Lagrangian method for constrained nonlinear optimization • multigrid algorithm • Extension to low power or IR drop directed placement • Extension to 3D or thermal-aware placement

Acknowledgments • We thank Brent Gregory, Will Naylor and Synopsys, Inc. for a research and educational license pertaining to U.S. Patents 6282693, 6662348, 6301693, 6671859 and 6665851.

Thank You !

HPWL Results Comparison • Comparison (HPWL) • the Capo flow[ICCAD03] 26.0% (11.5% ~ 34.0%) • mPG-MS [ASPDAC03]24.7% (9.9% ~ 40.1%) • Feng Shui [ISPD04] 4.0% (-7.3% ~ 20.0%) • Comparison (Running Time) • Xeon server (2.4GHz CPU, double-threaded) • much slower than Feng Shui Comparison of our results with the Capo flow, mPG-MS and Feng Shui

An Analytic Placer for Mixed-Size Placement and Timing-Driven Placement

An Analytic Placer for Mixed-Size Placement and Timing-Driven Placement

Presentation Transcript

Parallelized Analytic Placer

MAPLE: Multilevel Adaptive PLacEment for Mixed-Size Designs

A Difference Logic Formulation and SMT Solver for Timing-Driven Placement

Routability Driven Analytical Placement for Mixed-Size Circuit Designs

A Size Scaling Approach for Mixed-size Placement

An Analytical Placer for Mixed-Size 3D Placement

Scalable and Deterministic Timing-Driven Parallel Placement for FPGAs

A SimPLR Method for Routability-driven Placement

A SimPLR Method for Routability -driven Placement

Timing-Driven Placement for Heterogeneous FPGA

Mixed-Size Placement with Fixed Macrocells using Grid-Warping

Lens Aberration Aware Timing-Driven Placement

Handling Complexities in Modern Large-Scale Mixed-Size Placement

An Effective Clustering Algorithm for Mixed-size Placement

An Effective Congestion Driven Placement Framework

Optimizing Routability in Large-Scale Mixed-Size Placement

Mixed Integer Programming Models for Detailed Placement

Placement and Timing for FPGAs Considering Variations

An Analytic Placer for Mixed-Size Placement and Timing-Driven Placement

Analytic Placement Algorithms

An Effective Congestion Driven Placement Framework

Mixed-Size Placement with Fixed Macrocells using Grid-Warping