1 / 8

Computer Architecture Principles Dr. Mike Frank

Computer Architecture Principles Dr. Mike Frank. CDA 5155 Summer 2003 Module #26 Software Pipelining. Software Pipelining. Software Pipelining. In hardware pipelining, we overlap the execution of multiple instructions .

amanda
Download Presentation

Computer Architecture Principles Dr. Mike Frank

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer Architecture PrinciplesDr. Mike Frank CDA 5155Summer 2003 Module #26Software Pipelining

  2. Software Pipelining

  3. Software Pipelining • In hardware pipelining, we overlap the execution of multiple instructions. • In software pipelining, we overlap the issuing of instructions for multiple loop iterations. • Like loop unrolling, this allows us to separate the issuing of data-dependent instructions without stalling. • Unlike loop unrolling, it does not (by itself): • eliminate loop overheads (index variable updating & branches), or increase loop code size.

  4. Software Pipelining Illustration Processing of different array elements (e.g. A[0] through A[4]) In 1 iteration of the software-pipelined loop, We execute instruction 3 of element 4, instruction 4 of element 3, instruction 5 of element 2, instruction 6 of element 1, instruction 7 of element 0. Data dependence between two instructions inone iteration of the original loop (processing of an array element)

  5. Our old friend, “Mr. Loop Example” • Same old code: Loop: LD F0,0(R1) ADDD F4,F0,F2 SD 0(R1),F4 SUBI R1,R1,#8 BNEZ R1,Loop • What would a software-pipelined version of this loop look like? As-is, there would be stalls between these instructions due to the data value dependences, & the resulting RAW hazards. (Even with forwarding.)

  6. Software-Pipelined Version • Here is the new code: Loop: SD 16(R1),F4 ; M[i]=tmp2 ADDD F4,F0,F2 ; tmp2=tmp1+F2 LD F0,0(R1) ; tmp1=M[i-2] SUBI R1,R1,#8 ; i=i-1 BNEZ R1,Loop • Note: • All the value dependences now cross loop-iteration boundaries. • The whole dependence path from LD through ADDD to SD now spans 2 loop iterations. • So, we load from M[i-2] (or 0(R1)), and, two iterations later, store back to M[i] (or 16(R1)).

  7. Timing across 3 iterations • Note greater separation between LD, ADDD, and SD for a single array element - no stalls needed. • Note we decrement R1 by 8 twice between the LD and the corresponding SD, thus the need for the 16 offset in the SD. • Some antidependences (including the one through memory) are noted by green arrows. Loop: SD 16(R1),F4 ; M[i]=tmp2 ADDD F4,F0,F2 ; tmp2=tmp1+F2 LD F0,0(R1) ; tmp1=M[i-2] SUBI R1,R1,#8 ; i=i-1 BNEZ R1,Loop Loop: SD 16(R1),F4 ; M[i]=tmp2 ADDD F4,F0,F2 ; tmp2=tmp1+F2 LD F0,0(R1) ; tmp1=M[i-2] SUBI R1,R1,#8 ; i=i-1 BNEZ R1,Loop Loop: SD 16(R1),F4 ; M[i]=tmp2 ADDD F4,F0,F2 ; tmp2=tmp1+F2 LD F0,0(R1) ; tmp1=M[i-2] SUBI R1,R1,#8 ; i=i-1 BNEZ R1,Loop

  8. SW Pipelining vs. Unrolling • Both of them: • Improve scheduling among high-latency instructions in the inner loop. • Loop unrolling also: • Reduces loop overhead (index variable updating & end-of-loop testing). • Confines most stalls to once per n iterations. • Software pipelining also: • Confines most stalls to 1st & last iteration only. • Keeps code size small. • Can use both techniques in combination.

More Related