1 / 13

Chapter 3 Parallel and Pipelined Processing

Chapter 3 Parallel and Pipelined Processing. Parallel processing. Pipelined processing. Basic Ideas. time. time. P1 P2 P3 P4. P1 P2 P3 P4. a1. a2. a3. a4. a1. b1. c1. d1. b1. b2. b3. b4. a2. b2. c2. d2. c1. c2. c3. c4. a3. b3. c3. d3. d1. d2. d3. d4. a4.

yukio
Download Presentation

Chapter 3 Parallel and Pipelined Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 3 Parallel and Pipelined Processing

  2. Parallel processing Pipelined processing Basic Ideas time time P1 P2 P3 P4 P1 P2 P3 P4 a1 a2 a3 a4 a1 b1 c1 d1 b1 b2 b3 b4 a2 b2 c2 d2 c1 c2 c3 c4 a3 b3 c3 d3 d1 d2 d3 d4 a4 b4 c4 d4 Less inter-processor communication Complicated processor hardware More inter-processor communication Simpler processor hardware Colors: different types of operations performed a, b, c, d: different data streams processed (C) 1997-2006 by Yu Hen Hu

  3. Parallel processing requires NO data dependence between processors Pipelined processing will involve inter-processor communication Data Dependence P1 P2 P3 P4 P1 P2 P3 P4 time time (C) 1997-2006 by Yu Hen Hu

  4. By inserting latches or registers between combinational logic circuits, the critical path can be shortened. Consequence: reduce clock cycle time, increase clock frequency. Suitable for DSP applications that have (infinity) long data stream. Method to incorporate pipelining: Cut-set retiming Cut set: A cut set is a set of edges of a graph. If these edges are removed from the original graph, the remaining graph will become two separate graphs. Retiming: The timing of an algorithm is re-adjusted while keeping the partial ordering of execution unchanged so that the results correct Usage of Pipelined Processing (C) 1997-2006 by Yu Hen Hu

  5. x[n] z-1 z-1 h[0] h[1] y[n] h[2] ? = Graphic Transpose Theorem • The transfer function of a signal flow graph remain unchanged if • The directions of each arc is reversed • The input and output labels are switched. u[n] y[n] z-1 z-1 h[2] h[0] h[1] x[n] (C) 1997-2006 by Yu Hen Hu

  6. Algorithm transform may lead to pipelined structure without adding additional delays. Given a FIR filter SFG Critical path TM+2TA Use graph transposition theorem: Reverse all arcs Reverse input/output We obtain Critical path TM+ TA No additional delay added! Data broadcast structure (C) 1997-2006 by Yu Hen Hu

  7. Fine-grain pipelining To further reduce TM. Critical Path = Max {TM1, TM2, TA} (C) 1997-2006 by Yu Hen Hu

  8. One form of vectorized parallel processing of DSP algorithms. (Not the parallel processing in most general sense) Block vector: [x(3k) x(3k+1) x(3k+2)] Clock cycle: can be 3 times longer Original (FIR filter): Rewrite 3 equations at a time: Define block vector Block formulation: Block Processing (C) 1997-2006 by Yu Hen Hu

  9. Block Processing (C) 1997-2006 by Yu Hen Hu

  10. General approach for block processing (C) 1997-2006 by Yu Hen Hu

  11. Original formulation: Rewrite Define block vectors Then Time indices n: sampling period k: clock period (processor) k = 2n Note: Pipelining: clock period = sampling period. Block (parallel): clock period not equal to sampling period. Block Processing for IIR Digital Filter (C) 1997-2006 by Yu Hen Hu

  12. Block IIR Filter y(2(k-1))  D x(2k) y(2k) + x(n) S/P P/S y(n) y(2k+1) + x(2k+1) y(2(k-1)+1)  D (C) 1997-2006 by Yu Hen Hu

  13. Timing Comparison x(1) x(2) x(3) x(4) MAC 1 2 3 4 y(1) y(2) y(3) y(4) • Pipelining • Block processing x(1) x(2) x(3) x(4) x(5) x(6) x(7) x(7) Add 1 2 3 4 5 6 7 8 y(1) y(2) y(3) y(4) y(5) y(6) y(7) y(7) a y(1) Mul 1 2 3 4 5 6 7 8 x(2) x(4) x(6) x(8) 2 2 4 4 6 6 8 8 x(1) x(3) x(5) x(7) 1 1 3 3 5 5 7 7 (C) 1997-2006 by Yu Hen Hu

More Related