1 / 24

Chapter 4 Retiming

Chapter 4 Retiming. Definitions. Retiming Retiming is a mapping from a given DFG, G to a retimed DFT, G r such that the corresponding transfer function of G and G r differ by a pure delay z - L . Purposes To facilitate pipelining to reduce clock cycle time

braima
Download Presentation

Chapter 4 Retiming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 4 Retiming

  2. Definitions • Retiming Retiming is a mapping from a given DFG, G to a retimed DFT, Gr such that the corresponding transfer function of G and Gr differ by a pure delay z-L. • Purposes • To facilitate pipelining to reduce clock cycle time • To reduce number of registers needed. (C) 2004-2006 by Yu Hen Hu

  3. Feed-forward cut-set: Feed-back cut-set Delay transfer theorem Adding arbitrary non-negative number of delays to each edge of a feed-forward cut-set of a DFG will not alter its output, except the output timing will be delayed. Transfer the same amount of delays from edges of the same direction across a feed-back cut set of a DFG to all edges of opposing edges across the same cut set will not alter the output, but its timing. Cut-set Retiming (C) 2004-2006 by Yu Hen Hu

  4. Consider the FIR digital filter and its DFG: y(n) = b0x(n) + b1x(n-1) Critical path length = TM+TA Select a cut set Insert a delay each to each edge in the cut set. Retiming: ynew(n) = b0x(n-1) + b1x(n-2) ynew(n) = y(n-1) Critical path = Max(TM, TA) Feed-forward Cut-Set Retiming D x(n) x(n-1) X b0 X b1 D x(n) x(n-1) + y(n) X b0 X b1 D D + y(n) (C) 2004-2006 by Yu Hen Hu

  5. Consider an IIR digital filter y(n) = a·y(n-2) + x(n) loop bound = (TM+TA)/2 clock cycle = TM+TA Shift 1 delay to the other edge across a feed-back cut set Filter remains unchanged. loop bound = (TM+TA)/2 clock cycle = Max(TM ,TA) Feed-back Cut Set Retiming x(n) y(n) x(n) y(n) + + 2D D D a a   (C) 2004-2006 by Yu Hen Hu

  6. Assume tM = tA = 1 t.u. Before retiming After retiming Timing Diagram x(1) x(2) x(3) x(4) MAC 1 2 3 4 y(1) y(2) y(3) y(4) x(1) x(2) x(3) x(4) x(5) x(6) x(7) x(7) Add 1 2 3 4 5 6 7 8 y(1) y(5) y(6) y(7) y(7) y(2) y(3) y(4) a y(1) Mul 0 1 2 3 4 5 6 7 8 (C) 2004-2006 by Yu Hen Hu

  7. Consider an IIR digital filter y(n) = ay(n-1) + x(n) loop bound = (TM+TA) throughput = 1/(TM+TA) x(2k-1)=x(k) x(2k) = 0 Clock period = (TM+TA) Throughput = 1/[2(TM+TA)] Feed-back Cut Set Retiming x(n) y(n) + x(m) y(m) + D 2D a  a  (C) 2004-2006 by Yu Hen Hu

  8. Start with y(n) = a y(n-1) + x(n) clock cycle = Max(TM ,TA) Throughput = 1/[2max(TM,TA)] Start with y(n) = a y(n-2) + x(n) loop bound = (TM+TA)/2 clock cycle = Max(TM ,TA) throughput = 1/ Max(TM ,TA) Slowdown + Retiming x(n) y(n) x(m) y(m) + + D D D D a a   (C) 2004-2006 by Yu Hen Hu

  9. Node delay = 1 t.u. Before retiming: Critical path: a3  a4  a5  a6 Clock cycle time = 4 2 delay units After cut-set retiming Critical path: a3  a5, a4  a6 Clock cycle time = 2 6 delay units After additional retiming Critical path: none Clock cycle time = 1 11 delay units D a4 a2 D a6 D a1 D D D a3 a5 Example 3.2.1 D a4 a2 a6 a1 D a5 a3 2D a4 a2 D D a6 2D a1 D D D 2D a3 a5 (C) 2004-2006 by Yu Hen Hu

  10. Slow Down for Cut-Set Retiming (C) 2004-2006 by Yu Hen Hu

  11. Transfer delay through a node in DFG: r(v) = # of delays transferred from out-going edges to incoming edges of node v w(e) = # of delays on edge e wr(e) = # of delays on edge e after retiming Retiming equation: subject to wr(e)  0. Let p be a path from v0 to vk then … e0 e1 ek v0 v1 vk Node Retiming e v u D 3D 2D r(v) = 2 v v 2D 3D D p (C) 2004-2006 by Yu Hen Hu

  12. Invariant Properties • Retiming does NOT change the total number of delays for each cycle. • Retiming does not change loop bound or iteration bound of the DFG • If the retiming values of every node v in a DFG G are added to a constant integer j, the retimed graph Gr will not be affected. That is, the weights (# of delays) of the retimed graph will remain the same. (C) 2004-2006 by Yu Hen Hu

  13. Node Retiming Examples r(2) = 1 (C) 2004-2006 by Yu Hen Hu

  14. DFG Illustration of the Example T = max. {(1+2+1)/2, (1+2+1)/3} = 2 Cr. Path Delay = max{2,2,1+1} = 2 t.u T = max. {(1+2+1)/2, (1+2+1)/3} = 2 Cr. Path delay = 2+1 = 3 t.u (C) 2004-2006 by Yu Hen Hu

  15. Note that retiming will NOT alter iteration bound T. Iteration bound is the theoretical minimum clock period to execute the algorithm. Let edge e connect node u to node v. If the node computing time t(u) + t(v) > T, then clock period T > T. For such an edge, we require that To generalize, for any path from v0 to vk, we have In other words, for any possible critical path in the DFG that is larger than T, we require wr(e)  1. Retiming for Minimizing Clock Period (C) 2004-2006 by Yu Hen Hu

  16. Retiming Example Revisited wr(e21)  0, since t(2)+t(1) = 2 = T. wr(e13)  1, since t(1)+t(3) = 3 > T. wr(e14)  1, since t(1)+t(4) = 3 > T. wr(e32)  1, since t(3)+t(2) = 3 > T. wr(e42)  1, since t(4)+t(2) = 3 > T. Use eq. wr(euv) = w(e) + r(v) – r(u), w(e21) + r(1) – r(2) = 1 + r(1) – r(2)  0 w(e13) + r(3) – r(1) = 1 + r(3) – r(1)  1 w(e14) + r(4) – r(1) = 2 + r(4) – r(1)  1 w(e32) + r(2) – r(3) = 0 + r(2) – r(3)  1 w(e42) + r(2) – r(4) = 0 + r(2) – r(4)  1 (C) 2004-2006 by Yu Hen Hu

  17. Since the retimed graph Gr remain the same if all node retiming values are added by the same constant. We thus can set r(1) = 0. The inequalities become 1 – r(2)  0 or r(2)  1 1 + r(3)  1 or r(3)  0 2 + r(4)  1 or r(4) –1 r(2) – r(3)  1 or r(3) r(2) - 1 r(2) – r(4)  1 or r(2)  r(4) + 1 Since one must have r(2) = +1. This implies r(3) 0. But we also have r(3)  0. Hence r(3)=0. These leave –1  r(4)  0. Hence the two sets of solutions are: r(0) = r(3) = 0, r(2) = +1, and r(4) = 0 or -1. Solution continues (C) 2004-2006 by Yu Hen Hu

  18. Given a systems of inequalities: r(i) – r(j)  k; 1  i,j  N Construct a constraint graph: Map each r(i) to node i. Add a node N+1. For each inequality r(i) – r(j)  k, draw an edge eji such that w(eji) = k. Draw N edges eN+1,i = 0. The system of inequalities has a solution if and only if the constraint graph contains no negative cycles If a solution exists, one solution is where ri is the minimum length path from the node N+1 to the node i. Shortest path algorithms: (Applendix A) Bellman-Ford algorithm Floyd-Warshall algorithm Systematic Solutions (C) 2004-2006 by Yu Hen Hu

  19. Find shortest path from an arbitrarily chosen origin node U to each node in a directed graphif no negative cycle exists. Given a direct graph w(m,n): weight on edge from node m to node n, =  if there is no edge from m to n r(i,j): the shortest path from node U to node i within j-1 steps. r(i,1) = w(U,i), r(i,j+1) = min {r(k,j) + w(k,i)}, j = 1, 2, …, N-1 if max(r(:,n-1)-r(:,n))>0, then there is a negative cycle. Else, r(i,n-1) gives shortest cycle length from i to U. Note that 1 > 0, hence there is at least one negative cycle. Bellman-Ford Algorithm -3 2 1 1 1 1 2 3 4 spbf.m (C) 2004-2006 by Yu Hen Hu

  20. Find shortest path between all possible pairs of nodes in the graph provided no negative cycle exists. Algorithm: Initialization: R(1) =W; For k=1 to N R(k+1)(u,v) = min{R(k)(u,:) + R(k)(:,v)} If R(k)(u,u) < 0 for any k, u, then a negative cycle exist. Else, R(N+1)(u,v) is SP from u to v Floyd-Warshall Algorithm -3 2 1 1 2 1 2 3 4 (C) 2004-2006 by Yu Hen Hu

  21. For retiming example: r(2) – r(1)  1 r(1) – r(3)  0 r(1) – r(4)  1 r(3) – r(2)  –1 r(4) – r(2)  –1 Bellman-Ford Algorithm for Shortest Path Retiming Example -1 0 1 2 3 1 1 -1 4 0 0 0 0 5 (C) 2004-2006 by Yu Hen Hu

  22. Floyd-Warshall algorithm Retiming Example (C) 2004-2006 by Yu Hen Hu

  23. Register Sharing When a node has multiple fan-out with different number of delays, the registers can be shared so that only the branch with max. # of delays will be needed. Register reduction through node delay transfer from multiple input edges to output edges (e.g. r(v) > 0) Should be done only when clock cycle constraint (if any) is not violated. D D Delay reduction D Retiming to Reduce Registers (C) 2004-2006 by Yu Hen Hu

  24. Transform each delay element (register) D to ND and reduce the sample frequency by N fold will slow down the computation N times. During slow down, the processor clock cycle time remains unchanged. Only the sampling cycle time increased. Provides opportunity for retiming, and interleaving. Time Scaling (Slow Down) …y(3) y(2) y(1) …x(3) x(2) x(1) + D  …y(3) -- y(2) -- y(1) … --x(3) -- x(2) -- x(1) + 2D  (C) 2004-2006 by Yu Hen Hu

More Related