1 / 119

CS5365

CS5365. Pipelining. Pipelining. Divide task into a sequence of subtasks. Each subtask is executed by a stage (segment)of the pipe. Linear Pipeline Structure. All stages execute simultaneously different subtask.

Download Presentation

CS5365

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS5365 Pipelining

  2. Pipelining • Divide task into a sequence of subtasks. Each subtask is executed by a stage (segment)of the pipe

  3. Linear Pipeline Structure • All stages execute simultaneously different subtask. • A stage is specialized hardware: combinational circuits, A/L operations, processors, etc.

  4. Pipelining • Ideally, all stages take same time to execute their task. Otherwise, the pipe operates at the speed of the slowest subtask.

  5. Pipelining

  6. Clock Period

  7. Clock Period

  8. Note: • Once the pipeline is full it will yield one result every clock period, • A linear pipeline with k stages can process n tasks in: clock periods where k cycles are used to fill up the pipe and complete the first task, and n-1 additional cycles will be needed to complete the n - 1 remaining tasks.

  9. Speedup • Denote Tseq the time required by a non-pipelined uniprocessor to execute n tasks, then: • assuming each of the k operations needs the same length of time to execute.

  10. Speedup • Otherwise

  11. Speedup • The speedup S(k) obtained by a k-pipelined processor is given as follows:

  12. Ideal Pipeline • Speedup S(k) • Note that when n >> k, then

  13. Ideal Pipeline • However, a maximum (ideal) speedup is not possible because overhead due to: • Data dependencies between tasks. • Interrupts. • Program branches.

  14. Space-time diagram

  15. Space-time diagram

  16. Efficiency • The efficiency is obtained by dividing the speedup by the number of stages k:

  17. Efficiency

  18. Efficiency • Note also that:

  19. Throughput • It is defined as the number of tasks completed per unit of time:

  20. Throughput • Techniques to increase throughput. • Consider a pipeline with the following configuration where: T1=T3=T and T2=3T Clearly the bottleneck is S2 with a 3T delay.

  21. Throughput • Recall that the throughput ω is inversely proportional to the pipe clock period and: So =3T with a total delay of 3x3T=9T and for large number of tasks (steady state operation).

  22. Throughput • How might the throughput be increased?

  23. Throughput • How might the throughput be increased? • Subdivisions?

  24. Throughput • Subdivisions • What is the total delay now?

  25. Throughput • Subdivisions • What is the total delay now? with a total delay of 5T

  26. Throughput • Subdivisions

  27. Throughput • What are the disadvantages of this solution?

  28. Throughput • What are the disadvantages of this solution? • increased hardware, additional latches.

  29. Throughput • Replication: • Stage 2 is replicated into three stages which are then interleaved

  30. Space-time diagram • replication

  31. Control strategies and configurations • Unifunctional vs. multi-functional pipelines • unifunctional pipelines execute a fixed and dedicated function. • A multi-functional pipeline may perform several functions either at the same time or at different times. Multi-functional functions are possible by interconnecting (reconfiguring) several stages at different times.

  32. Control strategies and configurations • Static vs. Dynamic pipelines • A static pipeline may assume only one functional configuration (unifunctional or multi-functional) at a time. • A dynamic pipeline allows several functional configurations at any time (multi-functional) which require a more complex control mechanisms than those required for static pipelines.

  33. Control strategies and configurations • Scalar vs. vector pipelines • Scalar pipelines processes a sequence of scalar operands under the control of a DO loop. Instructions are prefetched and stored in an instruction buffer. As instructions are executed operands are fetched from a data cache. • vector pipelines (vector processors) handle vector instructions over vector operands under firmware and hardware control.

  34. Levels of processing • Arithmetic Pipelines – ALUs are partitions for pipelined operations. Ex: 4-stage pipes are used in the Star-100, Cray-1 uses 14 pipeline stages, the Cyber 205 uses 26 stages, etc. • Instruction Pipelines – (instruction lookahead) - overlaps the execution of the current instruction with the fetch, decode and operand fetch of subsequent instructions. • Processor Pipeline – it is a cascade of processors. Each executes a different task (a job is divided into different tasks).

  35. Floating-Point Arithmetic Pipeline

  36. Processor pipelining

  37. Instruction Pipeline

  38. Instruction Pipelining

  39. Instruction Pipelining • Consider the execution of a single instruction in an uniprocessor system. • A sequence of steps can be identified and implemented using a pipeline design:

  40. Problems?

  41. Problems • Instruction dependency • Pipeline Stalling • Branching • Conflicts • Interrupts

  42. Instruction dependency. • An instruction I + 1 being fetched may need the results of a previous instruction I currently in the pipe. So I +1 must be delayed until results are known. Stalling • An instruction I +1 must not destroy data that can be needed for a previous instruction I still in the pipe.

  43. Dependency

  44. Stalling

  45. Stalling

  46. Stalling Memory Access Memory Access Memory Access

  47. Stalling

  48. Stalling Assume data cache access Stall condition? Assume in instruction cache access

  49. Branching • This problem is causes by conditional and unconditional branches

More Related