1 / 34

CSL718 : Pipelined Processors

CSL718 : Pipelined Processors. PipelineTimings 12th Jan, 2006. Pipelined Processors. Parallel architectures. Function-parallel. Data-parallel. Instr level (ILP). Thread level. Process level. Intel’s terminology: intra ILP inter ILP. Pipelined processors. VLIWs.

nile
Download Presentation

CSL718 : Pipelined Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSL718 : Pipelined Processors PipelineTimings 12th Jan, 2006 Anshul Kumar, CSE IITD

  2. Pipelined Processors Parallel architectures Function-parallel Data-parallel Instr level (ILP) Thread level Process level • Intel’s terminology: • intra ILP • inter ILP Pipelined processors VLIWs Superscalar processors Anshul Kumar, CSE IITD

  3. Processor Performance • MIPS and MFLOPS may not truly represent performance • Execution time of a program true measure of performance • SPEC rating acceptable Anshul Kumar, CSE IITD

  4. Execution Time and Clock Period Instruction execution time = Tinst = CPI* t Program exec time = Tprog = N * Tinst = N * CPI * t N : Number of instructions CPI : Cycles per instruction(Av) t : Clock cycle time t IF D RF EX/AG M WB Anshul Kumar, CSE IITD

  5. What influences clock period? Tprog = N * CPI * t Technology - t  Software - N  Architecture - N * CPI * t  Instruction set architecture (ISA) trade-off N vs CPI * t Micro architecture (A) trade-off CPI vs t Anshul Kumar, CSE IITD

  6. Determining Clock Period Clock Period = t = Pmax Pmax = max propagation delay Comb Reg Reg Clock Pmax Anshul Kumar, CSE IITD

  7. Ideal Pipelining Tinst S stages t = Tinst / S CPI = 1 Effective time per inst Teff = 1 * Tinst / S Anshul Kumar, CSE IITD

  8. Pipelining with hazards Tinst S stages Frequency of interruptions - b t = Tinst / S CPI = 1 + (S - 1) * b Teff = (1 + (S - 1) * b) * Tinst / S Anshul Kumar, CSE IITD

  9. Anshul Kumar, CSE IITD

  10. A more realistic view t = Pmax + C Pmax = max propagation delay C = clocking overhead Comb Reg Reg Clock C Pmax Anshul Kumar, CSE IITD

  11. Clocking Overhead • Fixed overhead c • Setup time • Output delay • Variable overhead (stretching factor) k • Clock skew t = Tinst / S + k * Tinst / S + c = (1 + k) * Tinst / S + c Anshul Kumar, CSE IITD

  12. Pipelining with Clocking Overhead Teff = [1 + (S - 1) * b] * [(1 + k) * Tinst / S + c] Sopt =  [(1 - b) * (1 + k) * Tinst / (b * c)] Anshul Kumar, CSE IITD

  13. Anshul Kumar, CSE IITD

  14. Partitioning instruction into cycles with non-uniform stage times IF D RF AG T DF EX PA One action - one pipeline stage => large quantization overhead Multiple actions per stage? Multiple stages per action? Anshul Kumar, CSE IITD

  15. Example Put Away 2 ns Execute 7+7+8 ns Data - ALU 3 ns Cache Data 10 ns Cache Dir 6 ns Addr - MAR 3 ns Gen Addr 9ns Decode 6+6 ns Data - IR 3 ns Cache Data 10 ns Cache Dir 6 ns Anshul Kumar, CSE IITD PC - MAR 4 ns

  16. Optimal Pipelining Tinst = 4+6+10+3+12+9+3+6+10+3+22+2 = 90 ns b = 0.2 c = 4 ns k = 5% Sopt =  [(1 - b) * (1 + k) * Tinst / (b * c)] = 9.7  9 Tseg = 10 ns Anshul Kumar, CSE IITD

  17. Example Put Away 2 ns Execute 7+7+8 ns Data - ALU 3 ns Tseg = 10 ns Cache Data 10 ns Cache Dir 6 ns Addr - MAR 3 ns Gen Addr 9ns S = 10 t = 14.5 ns S * t = 145 ns Decode 6+6 ns Data - IR 3 ns Cache Data 10 ns Cache Dir 6 ns Anshul Kumar, CSE IITD PC - MAR 4 ns

  18. Example Put Away 2 ns Execute 7+7+8 ns Data - ALU 3 ns S = 9 Cache Data 10 ns Cache Dir 6 ns Addr - MAR 3 ns Gen Addr 9ns Tseg = 13 ns t = 17.65 ns S * t = 159 ns Decode 6+6 ns Data - IR 3 ns Cache Data 10 ns Cache Dir 6 ns Anshul Kumar, CSE IITD PC - MAR 4 ns

  19. Example Put Away 2 ns Execute 7+7+8 ns Data - ALU 3 ns Tseg = 20 ns Cache Data 10 ns Cache Dir 6 ns Addr - MAR 3 ns Gen Addr 9ns S = 5 t = 25 ns S * t = 125 ns Decode 6+6 ns Data - IR 3 ns Cache Data 10 ns Cache Dir 6 ns Anshul Kumar, CSE IITD PC - MAR 4 ns

  20. Comparison Anshul Kumar, CSE IITD

  21. Cycle Quantization Delays are not integral multiple of clock period Total overhead = clocking overhead + quantization overhead S * t  Tinst + S * C (ignoring k) quantization overhead = S * (t - C) -Tinst reduces as clock period becomes small Anshul Kumar, CSE IITD

  22. Other Timing Approaches • Self Timed Circuits • No centralized free running clock • An operation begins as soon as its inputs are available, that is, all its predecessors have completed • Higher speed, lower power consumption • Wave Pipelining • Omit inter-stage registers • Reduced clocking overhead Anshul Kumar, CSE IITD

  23. Conventional Pipeline Registers separate adjoining stages Clock period > max prop delay Inter-stage data stored in registers Wave Pipeline No registers between adjoining stages Clock period less than max prop delay Waves of data propagate through combinational network (effectively, data is stored in the combinational circuit delay!) Conventional vs Wave Pipelining Anshul Kumar, CSE IITD

  24. No pipelining Reg X X’ Reg Y Clock X X’ Y Anshul Kumar, CSE IITD

  25. Conventional pipelining Reg X X’ Y Y’ Z Z’ Reg W Clock X X’ Y Y’ Z Z’ W

  26. Wave pipelining Reg X Z’ Reg W Clock X Z’ Anshul Kumar, CSE IITD W

  27. Timing Reg Reg Comb ckt X Y Clock T  p + s T clock period X Y p propagation delay s set-up time Anshul Kumar, CSE IITD

  28. Timing with clock skew Reg Reg Comb ckt X Y Clock T Clock skew =  X Y p s   T  p + s + 2 Anshul Kumar, CSE IITD

  29. Variation in propagation delay • Different delays in different paths • Delay variation due to process / temperature/ power variations • Data-dependent delay variations Anshul Kumar, CSE IITD

  30. Timing for wave pipelining Reg Reg Comb ckt X Y Clock T  X p pmin Y pmax Anshul Kumar, CSE IITD T   p + s + 4

  31. Timing for wave pipelining(expanded view) T X p Y nT (n-1) T pmin pmax pmin  (n-1) T + 2 nT  pmax + s + 2  T   p + s + 4 Anshul Kumar, CSE IITD

  32. Conventional Pipeline T  pmax/n + s + 2 (plus cycle quantization overhead) nT  pmax + ns + 2n Wave Pipeline T   p + s + 4 nT  pmax + s + 2 Comparison Anshul Kumar, CSE IITD

  33. Problems with wave pipelining • Need to balance delays • Narrow range of clock frequencies • Control difficult • Not very suitable for non-linear pipelines Anshul Kumar, CSE IITD

  34. Additional Reading Wayne P. Burleson, Maciej Ciesielski, Fabian Klass, and Wentai Liu, “Wave-Pipelining: A Tutorial and Research Survey”, IEEE Trans. on VLSI Systems, vol. 6, no. 3, September 1998, pp. 464 – 474. Anshul Kumar, CSE IITD

More Related