用預留原件及技術重新映射做工程修改命令的時序最佳化台灣大學電機工程研究所陳彥賓指導教授 : 張耀文教授

ECO Timing Optimization Using Spare Cells and Technology Remapping 用預留原件及技術重新映射做工程修改命令的時序最佳化台灣大學電機工程研究所陳彥賓指導教授: 張耀文教授 July 6, 2006

Outline • Introduction & problem formulation • Previous work and preliminaries • Algorithm • Experimental results • Conclusions

Introduction • ECO (Engineering Change Order) isusually performed during the chip implementation cycle. • Change the design incrementally. • When performing ECO to a placed design, change a small portion of netlist to • optimize the chip timing. • Functionality is unchanged. • change chip functions. • Logic bugs. • New versions.

Netlist Change Using Spare Cells • Spare cells are designed for design changes after placement, and they are distributed evenly on the chip layout. • Using spare cells is an efficient way to do netlist changes. • Save time and effort of re-placing the netlist • Save production cost of masks • It is getting more and more difficult in the nanometer technology. • Circuit size is increasing substantially. • Timing issues are hard to be considered when changing netlist locally.

Problem Formulation • Given a placed chip layout, • rewire the circuit using spare cells. There are several techniques: • gate sizing • buffer insertion • technology mapping • shorten the delays and minimize the total negative slack of all ECO timing paths. slack: -0.7 slack: 0.0 slack: -0.5 slack: 0.0 before after

Dynamic Programming • Buffer insertion to a single net. • van Ginneken et al. proposed a dynamic programming framework for slack optimal buffer insertion to a net. b3 Load Load gT2 b2 RAT gS RAT b1 Load Load Load b4 gT3 RAT RAT RAT gT1

Path Based Buffer Insertion • Shi et al. proposed a dynamic programming method to perform buffer insertion and gate sizing to a path by : • Cut the timing violated paths into distinct paths • View the gates on the path as special type “buffers” and merge the whole path into a “big routing tree”. • Perform gate sizing and buffer insertion simultaneously to the routing tree. Start point End point OR type buffer OR NAND type buffer NAND AND type buffer AND

Logic Physical Co-synthesis • Layout driven technology mapping • Proposed by Stok et al. • Place the base gates as an initial placement. • Map the base gates using the coordinates as cost. • Local netlist transformation • Proposed by Lou et al. • Identify parts of the placed netlist that violate some target cost. • Extract those critical parts from the chip placement. • Re-synthesis and re-place the extracted netlist according to the target cost.

Timing Model • Synopsys’ Liberty library format • Use lookup table to calculate gate delays. • The gate delay and the output transition time are functions of the output loading and the input transition time. Input Transition Time Output capacitive loading

Timing Model (cont’d) • Output loading consists of • input pin capacitance • output pin capacitance • wire loading • ΦIs the amount of capacitance per unit wirelengh.

Properties of The Timing Model • Loading dominance • Output loading has a larger effect on gate delay and output transition time than input transition time. (6.74x vs 1.48x) • Shielding • Change of the netlist effects delay of neighbor gates only. gk gj gi gk gi

Properties of The Timing Model (cont’) • A buffer chain with the same type BUFX1 Input slope Output slope delay output slope

Outline • Introduction & problem formulation • Previous work and preliminaries • Algorithm • Overview • Tracing ECO paths • Dynamic cost programming • Example • Timing complexity analysis • Technology remapping • Experimental results • Conclusions

Optimization Flow • Iterate the optimization loop until the total negative slack reaches zero or no path can be improved. Extension

Tracing ECO paths • When doing STA (static timing analysis), • store a pointer at each gate to point one of its fan-ins with the largest arrival time. • Obtain the ECO path • Trace this pointer from the end-point of the path to the corresponding start-point. Start point End point

Dynamic Cost Programming (DCP) • Dynamic programming framework with dynamic cost (3 steps): • View the gate as a special type “buffer” and merge the whole ECO path as a “big routing tree”. • Perform gate sizing and buffer insertion simultaneously from the end-point to the start-point. • Perform one buffer insertion operation for each net and one gate sizing operation for each gate. Start point End point OR type buffer OR NAND type buffer NAND AND type buffer AND

Dynamic Cost • Unlike the traditional buffer insertion problem, the buffering/sizing cost is dynamic because • all spare cells are candidates for buffering/sizing. • number of spare cells are changing during the optimization process. • Optimum solutions of sub-problems do not necessarily result in the optimum one of the overall problem. • Need to store a set of solutions for each gate/net. b1 ECO path 1 # inserted buffer S3 S2 b2 S1:No buffer insertion 1 S1 S2:Insert buffer b1 0 ECO path 2 S3:Insert buffer b2 Path delay

Solution Propagation during DCP • Store each solution as a point on a plane if it shortens the ECO timing path delays. • The two coordinates are • # inserted buffer • approximated sub-path delays from the current gate to the end point of the path. • Sized gates are not counted. • Estimate the effect of operations without actually applying them. • Generate solutions based on the solutions of the driven gate/net. # inserted buffer # inserted buffer b1 S3 S2 S6 S5 S3 S2 1 g1 S1 1 0 S1 S4 0 g2 Path delay b2 Path delay

Judgment of Operations • The timing effect of a sizing/buffering operation can be estimated by its effect on its fanins. • Buffer insertion operaion to net ni • If delay’(source of ni)+delay(buffer)<delay(source of ni), store the solutions corresponding to the operation. • Gate sizing operation to gate gi • If delay(spare cell)<delay(gi) and If delay’(fanin of gi)< delay(fanin of gi), store the solutions corresponding to the operation. • Timing of non-ECO paths are preserved after optimization. Net ni gi Buffer insertion Gate sizing

Bounding Box Theorem • We find a theorem to greatly reduce buffering/sizing candidates. • Assumption: • Gate delays are independent of the input transition time. • The driving capabilities of the sized gate and the sizing spare cell are the same.

width=dis(gE1,gE2)+dis(gE1,gE3)+(CEi1+CEi2 )/Φ, center: gE1 gE2 nE1 gE1 gE3

Bounding Box Theorem

Bounding polygon width=dis(gE1,gE2)+dis(gE1,gE3) +(CEo1 )/Φ, center: gE2 width=dis(gE1,gE4) +(CEi1)/Φ, center: gE4 gE2 gE1 gE4 gE3 width=dis(gE1,gE2)+dis(gE1,gE3) +(CEo1 )/Φ, center: gE3

Solution Pruning during DCP • For each set of solutions, we keep at most k solutions. (k is a user-defined parameter) • Discard non-dominant solutions. • Classify these solutions by the number of used buffers. • Keep the best solutions for each class. # inserted buffer 3 1 2 1 1 0 Path delay 0 0

End of DCP • At the start point of the ECO path, choose the solution which • meets the timing constraint • uses the least number of buffers • Change netlist according to the solution • Run STA to update the timing information. # inserted buffer 3 Start point 2 End point 1 0 Path delay clock cycle

An Example for Complex ECO Paths : buffer type spare cell : gate type spare cell zero large T1 zero small S2 zero S1 P1 P1 Slack P2 P2 P2 P2 P2 P3 P3 T2 FINISH ≥0 T3 LIST

Timing Complexity Analysis of phase 1 • Parameters • Gate count: V • # spare cells: N • # iterations of DCP: L • Max # gates of ECO path: M • Keep at most k solutions per operation • Complexity of DCP=O(kMN) • Complexity of STA=O(V) • Complexity of phase 1=O( (kMN+V)L )

Extension: Technology Remapping • After DCP, we can further improve the circuit timing by following steps: • Identify timing critical parts of the netlist. • Extract those parts from the netlist. • Re-synthesize and map the extracted netlist. • Decomposition by MVSIS • Ideal mapping locations • Technology mapping • Run STA to update the timing information.

Optimal Buffering to a Line • The optimal buffering to a line is to insert buffers with equal distance • No gate drives a too large loading. Optimal buffering Non-optimal buffering

Ideal Mapping Locations • Given locations of the input and output pins, map the base gates evenly between the input and output pins. • No gate drives a too large loading, and the path delay is smaller. (Delay is proportional to square of wirelength) • Makes buffer insertion easier. # inserted buffers delay Input A Output Input B Input A Output Input B

Calculating Ideal Mapping Locations • From each path from one input pin to one output pin, calculate ideal locations of every passed base gate by equal distance. • If a base gate has more than one ideal location, average these values and get a final ideal location. Input A Output Input B Input A Output Input B

Technology Mapping • Consider actual locations of spare cells as costs. • Cut the network into trees. • Apply dynamic programming method to map each tree. • Locations of mapped base gates are locations of corresponding spare cells. • Locations of unmapped base gates are ideal locations of base gates. • Insert buffers into mapped circuit to further improve timing. Input A Output Input B

Maximum Independent Set • For choosing global optimum solution of the technology remapping, we store a set of match solutions for each tree and use MIS to find the best assignments. Tree T2 Tree T1 g1 M2_2 M1_2 M2_3 g5 M1_1 g4 M2_1 M3_2 Tree T3 g2 g3 g6 M3_1

Experimental Results • The five benchmarks are industrial designs. • Our tool is run on Linux workstation with 3.2Ghz CPU and 3GB memory.

Experimental Results (cont’d) • Our tool beat all competitors with the same subject in the CAD contest ’05. • We compare the results of our algorithm with: • the case without the aid of the bounding box theorem. • a greedy wire cost heuristic.

Experimental Results (cont’d) • Layout of Case 2 Before optimization After optimization

Conclusions • We proposed a dynamic programming method considering dynamic cost to solve the ECO timing optimization problem. • Functional change considering timing is a tougher work, and we will extend our work in this direction.

用預留原件及技術重新映射做 工程修改命令的時序最佳化 台灣大學電機工程研究所 陳彥賓 指導教授 : 張耀文教授