1 / 42

A Faster Approximation Scheme for Timing Driven Minimum Cost Layer Assignment

A Faster Approximation Scheme for Timing Driven Minimum Cost Layer Assignment. Shiyan Hu *, Zhuo Li**, and Charles J. Alpert** *Dept of ECE, Michigan Technological University **IBM Austin Research Lab. Outline. 4 X. 2 X. 1 X. Layer Assignment.

damienp
Download Presentation

A Faster Approximation Scheme for Timing Driven Minimum Cost Layer Assignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Faster Approximation Scheme for Timing Driven Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, and Charles J. Alpert** *Dept of ECE, Michigan Technological University **IBM Austin Research Lab

  2. Outline

  3. 4X 2X 1X Layer Assignment • In 45nm technology, layer assignment is critical for timing and buffer area optimization

  4. Wire RC and Delay Wire in higher layer has much smaller delay

  5. Impact to Buffering • A buffer can drive longer distance in higher layer • Timing is improved • Fewer buffers are needed

  6. IP IP Impact to Routing/Buffering

  7. Can be different layers Same Layer Problem Formulation • A layer refers to a pair of horizontal and vertical layers with similar RC characteristics • Between any buffers, one layer is used • In early design stage, when buffering effect is considered, wire shaping is not important [Alpert TCAD’01] • In post-routing stage, wire shaping could improve timing, reduce vias and reduce coupling and so forth • Given • A buffered Steiner tree with n wire segments • Timing constraint • m wire layers with RC parameters and cost • Find a minimal cost layer assignment such that the timing constraint is satisfied.

  8. 4X 2X 1X Fully Polynomial Time Approximation Scheme (FPTAS) • A Fully Polynomial Time Approximation Scheme • Provably good • Within (1+ɛ) optimal cost for any ɛ>0 • Runs in time polynomial in n (segments), m (layers) and 1/ɛ • Ultimate solution for an NP-hard problem in theory • Highly practical

  9. Ratio between upper and lower bounds of the cost of optimal layer assignment Previous Work in ICCAD’08 • It depends on M and uses a DP of O(mn3/ɛ2) time • An iterative DP with incremental W Our DP needs one run for all W Bound independent oracle query • New FPTAS runs in O(mn2/ɛ) time

  10. The Rough Picture W*: the cost of optimal solution Key 1: Efficient checking Key 2: Smart guess Make guess on W* Not Good Check it Good (close to W*) Return the solution 10

  11. Key 1: Efficient Checking Benefit of guess • Only maintain the solutions with cost no greater than the guessed cost • Accelerate DP

  12. The Oracle Setup upper and lower bounds of cost W* Guess x within the bounds Update the bounds Oracle (x) • Oracle (x): the checker, able to decide whether x>W* or not • Without knowing W* • Answer efficiently 12

  13. Construction of Oracle(x) Scale and round each wire cost Dynamic Programming Only interested in whether there is a solution with cost up to x satisfying timing constraint Perform DP to scaled problem with cost bound n/ɛ. Time polynomial in n/ɛ 13

  14. Scaling and Rounding • Rounding error at each wire xɛ/n, total rounding error xɛ. • Larger x: larger error, fewer distinct costs and faster • Smaller x: smaller error, more distinct costs and slower • Rounding is the reason of acceleration Wire cost is integer after scaling and rounding with upper bound n/ɛ. Total # solutions is bounded in DP Wire cost xɛ/n 2xɛ/n 3xɛ/n 4xɛ/n 0 14

  15. Dynamic Programming Results DP result w/ all w are integers  n/ɛ • Yes, there is a solution satisfying timing constraint • No, no such solution • With cost rounding back, the solution has cost at most n/ɛ • xɛ/n + xɛ= (1+ɛ)x > W* • With cost rounding back, the solution has cost at least n/ɛ • xɛ/n = x  W*

  16. Solution Characterization • To model effect to upstream, a candidate solution is associated with • v: a node • Q: required arrival time • W: cumulative wire cost

  17. Candidate solutions are propagated toward the source Cost (W)-Bounded Dynamic Programming (DP) • Start from sinks • Candidate solutions are generated • Two operations • Subtree processing • Solution update at buffer • Solution Pruning

  18. Subtree Processing • Three paths • pa: a -> u • Pb: b -> u • Pc: c -> u • Qu(l)=min{Qa-d(pa,l),Qb-d(pb,l),Qc-d(pc,l)} • Wu(l)=Wa+Wb+Wc+w(T,l) • Wires are in the same layer l (Qu,Wu) (Qc,Wc) (Qb,Wb) (Qa,Wa)

  19. Exponential # of Solutions • For two solutions at a node with the same W, the one with smaller Q is dominated • Try to only generate non-dominated solutions since most of O(Wk) solutions are dominated solutions • W (=n/ɛ) solutions at each downstream buffer • Naïve merging takes O(Wk) time with k branches (Qu,Wa) (Qc,1,Wc,1) (Qc,2,Wc,2) (Qc,3,Wc,3) (Qc,4,Wc,4) k (Qa,1,Wa,1) (Qa,2,Wa,2) (Qa,3,Wa,3) (Qa,4,Wa,4) (Qb,1,Wb,1) (Qb,2,Wb,2) (Qb,3,Wb,3) (Qb,4,Wb,4)

  20. Multi-Way Merging • If best Q for cost w is obtained by merging Q(a1i1), Q(a2i2),..., Q(akik), where i1+i2+…ik=w, best Q for cost w+1 is obtained by • max 1  r  kmin {Q(a1i1),Q(a2i2),..., Q(arir+1), ...,Q(akik)}

  21. Four-Branch Example Solution(w=8, Q=9) is shown. To compute Solution (w=9, Q)

  22. Four-Branch Example – Case 1 Candidate Solution (w=9, Q=8)

  23. Four-Branch Example – Case 2 Candidate Solution (w=9, Q=4)

  24. Four-Branch Example – Case 3 Candidate Solution (w=9, Q=5)

  25. Four-Branch Example – Case 4 Candidate Solution (w=9, Q=7)

  26. Linear Time Multi-Way Merging • Lemma: given a subtree with m layers, k branches and W non-dominated solutions at each downstream buffer, one can merge them in O(mkW) time.

  27. Solution Update at Buffer • After merging, one non-dominated solution per layer per cost, totally O(mW) solutions • For each cost, find largest Q for all layers after buffer and propagate it (Qu,Wu) (Qc,Wc) (Qb,Wb) (Qa,Wa)

  28. Cost-Bounded DP • Lemma: given a tree with n wire segments and m layers, the optimal layer assignment subject to cost budget W=n/ɛ can be computed in O(mnW)=O(mn2/ɛ) time.

  29. Key 2: Bound Independent Guess • U (L): upper (lower) bound on W* • Naive binary search style approach • Runtime depends on the initial bounds U and L Set U and L on W* x=(U+L)/2 and W=n/ɛ Oracle (x) W*<(1+ɛ)x W*  x U= (1+ɛ)x L= x

  30. Adapt ɛ • Rounding factor xɛ/n for cost • Larger ɛ: faster with rough estimation • Smaller ɛ: slower with accurate estimation • Adapt ɛ and relate it with U and L

  31. U/L Related Scale & Round Wire cost 0 U/L xɛ/n xɛ/n 31

  32. Conceptually • Begin with large ɛ’ and progressively reduce it according to U/L as x approaches W* • Set ɛ’as a geometric sequence of …, 8, 4, 2, 1, 1/2, …, ɛ • One run of DP takes about O(n/ɛ) time. Total runtime is O(… + n/8 + n/4 + n/2 + … + n/ɛ) = O(n/ɛ). Independent of # of iterations

  33. Oracle Query Till U/L<2

  34. When U/L<2 Scale and round each cost by Lɛ/n • At least one feasible solution, otherwise no solution w/ cost 2n/ɛ •Lɛ/n = 2L  U • Runs in O(mn2/ɛ) time W=2n/ɛ Run DP Pick min cost solution satisfying timing at driver

  35. FPTAS for Layer Assignment • Theorem: a (1+ ɛ) approximation to the timing constrained minimum cost layer assignment problem can be computed in O(mn2/ɛ) time for any ɛ>0.

  36. The Algorithmic Flow Set U and L of W* Adapting ɛ=[U/L-1]1/2 Update U or L Set x=[UL/(1+ ɛ)]1/2 Oracle (x) U/L<2 Compute final solution

  37. Experiments • Experimental Setup • 1000 industrial nets • Compared to Dynamic Programming and the previous FPTAS [ICCAD’08]

  38. Cost Ratio Compared to DP Wire Cost Ratio 38 Approximation Ratio ɛ

  39. Speedup Compared to DP Speedup 39 Approximation Ratio ɛ

  40. Observations • FPTAS always achieves the theoretical guarantee • Larger ɛ leads to more speedup • 3.9x faster with 2.2% additional wire area compared to DP • Up to 6.5x faster than DP • On average about 2x faster than previous FPTAS

  41. Conclusion • Propose a (1+ ɛ) approximation for timing constrained layer assignment for any ɛ > 0 running in O(mn2/ɛ) time • Linear time DP running in O(mnW) time • Bound independent oracle query • Up to 6.5x faster than DP and 2x faster than previous FPTAS • Few percent additional wire area compared to DP as guaranteed theoretically

  42. Thanks

More Related