Trying to avoid pipeline delays
Download
1 / 23

Trying to avoid pipeline delays - PowerPoint PPT Presentation


  • 75 Views
  • Uploaded on

Trying to avoid pipeline delays. Inter-leafing two sets of operations XY Compute block. Tackled today. Review of coding a hardware circular buffer Roughly understanding where pipeline delays may occur

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Trying to avoid pipeline delays ' - jaime-houston


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Trying to avoid pipeline delays

Trying to avoid pipeline delays

Inter-leafing two sets of operationsXY Compute block


Tackled today
Tackled today

  • Review of coding a hardware circular buffer

  • Roughly understanding where pipeline delays may occur

    • “Refactor” the working code to improve the speed without spending any time on examining whether delays really there – works at the moment principle

  • “Refactoring” working code to perform operations using both X and Y ALU’s – in principle twice the speed

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada


Dcremoval
DCRemoval( )

MemoryintensiveAdditionintensive

Loops formain code

FIFO implementedas circularbuffer

  • Not as complex as FIR, but many of the same requirements

  • Easier to handle

  • You use same ideas in optimizing FIR over Labs 2 and 3

  • Two issues – speed and accuracy. Develop suitable tests for CPP code and check that various assembly language versions satisfy the same tests

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada


Alternative approach
Alternative approach

  • Move pointers rather than memory values

  • In principle – 1 memory read, 1 memory write, pointer addition, conditional equate

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada


Note software circular buffer is not necessarily more efficient than data moves
Note: Software circular buffer is NOT necessarily more efficient than data moves

  • Now spending more time on moving / checking the software circular buffer pointers than moving the data?

SLOWERFASTER

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada


Next step hardware circular buffer
Next step – Hardware circular buffer efficient than data moves

  • Do exactly the same pointer calculations as with software circular buffers, but now the calculations are done behind the scenes – high speed – using specialized pointer features

  • Only available with J0, J1, J2 and J3 registers (On older ADSP-21061 – all pointer registers)

  • Jx -- The pointer register

  • JBx – The BASE register – set to start of the FIFO array

  • JLx – The length register – set to length of the FIFO array

  • VERY BIG WARNING? – Reset to zero. On older ADSP-21061 it was very important that the length register be reset to zero, otherwise all the other functions using this register would suddenly start using circular buffer by mistake.

    • Still advisable – but need special syntax for causing circular buffer operations to occur

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada


Store values into hardware fifo
Store values into hardware FIFO efficient than data moves

  • CB instruction ONLY works on POST-MODIFY operations

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada


Next stage in improving code speed hardware circular buffers

Set up pointers to buffers efficient than data moves

Insert values into buffers

SUM LOOP

SHIFT LOOP

Update outgoing parameters

Update FIFO

Function return

2

8 Was 4

3 + N * 4 Was 4 + N * 5

1 Was 1 + 2 * log2N

6

14 Was 3 + 6 * N

2

---------------------------

37 + 4 N Was 23 + 5 N

N = 128 – instructions = 549 cycles

549 + 300 delay cycle = 879 cyclesDelays are now >50% of useful time

Was

677 + 360 delay cycles = 1011 cycle

Next stage in improving code speedHardware circular buffers

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada


On tigersharc pipeline issue
On TigerSHARC Pipeline Issue efficient than data moves

  • After you issue the command to read from memory, then must wait for value to come

  • Problem – may be trading memory wait delays for I-ALU delays

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada


Now perform math operation using circular buffer operation
Now perform Math operation using circular buffer operation efficient than data moves

  • Note the possible memory delays

  • Memory cache helps?

Wait for read ofR2, use it, thenwait for read of R3and then use it

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada


Simple interleaving of code possible saving of memory delays
Simple interleaving of code efficient than data movesPossible saving of memory delays

Original order

1

2

3

4

New order

1

3

2

4

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada


Interleaving of code same instructions different order

Set up pointers to buffers efficient than data moves

Insert values into buffers

SUM LOOP

SHIFT LOOP

Update outgoing parameters

Update FIFO

Function return

2

8 Was 4

3 + N * 4 Was 4 + N * 5

1 Was 1 + 2 * log2N

6

14 Was 3 + 6 * N

2

---------------------------

37 + 4 N Was 23 + 5 N

N = 128 – instructions = 549 cycles

549 + 50 delay cycle = 594 cyclesDelays were 10% of useful time

Was

549 + 300 delay cycle = 879 cyclesDelays were >50% of useful time

Interleaving of codeSame instructions – different order

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada


The code is too slow because we are not taking advantage of the available resources
The code is too slow because we are not taking advantage of the available resources

  • Bring in up to 128 bits (4 instructions) per cycle

  • Ability to bring in 4 32-bit values along J data bus (data1) and 4 along K bus (data2)

  • Perform address calculations in J and K ALU – single cycle hardware circular buffers

  • Perform math operations on both X and Y compute blocks

  • Background DMA activity

  • Off-load some of the processing to the second processor

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada


Understanding how to use mimd mode process left filter in x compute right in y
Understanding how to use MIMD mode the available resourcesProcess left filter in X-Compute, right in Y

  • XR6 = 0;; Puts 0 into XR6 register

  • YR6 = 0;; Puts 0 into YR6 register

  • XYR6 = 0;; Puts 0 into XR6 and YR6 at same time

  • 1 instruction saved

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada


Understanding how to use mimd mode process left filter in x compute right in y1
Understanding how to use MIMD mode the available resourcesProcess left filter in X-Compute, right in Y

  • XR6 = R6 + R2;; Adds XR6 + XR2 registers

  • YR6 = R6 + R2;; Adds YR6 + YR2 registers

  • XYR6 = R6 + R2;; Adds XR6 + XR2, AND YR6 + YR2 at same time

  • N instructions saved

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada


Understanding how to use mimd mode process left filter in x compute right in y2
Understanding how to use MIMD mode the available resourcesProcess left filter in X-Compute, right in Y

  • XR6 = ASHIFT R6 BY -7;; XR6 = XR6 >> 7

  • YR6 = ASHIFT R6 BY -7;; YR6 = YR6 >> 7

  • XYR6 = ASHIFT R6 BY -7;; XR6 = XR6 >> 7 and YR6 = YR6 >> 7 at same time

  • 1 instruction saved

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada


Final operation dual subtraction
Final operation – dual subtraction the available resources

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada


Mimd mode

Set up pointers to buffers the available resources

Insert values into buffers

SUM LOOP

SHIFT LOOP

Update outgoing parameters

Update FIFO

Function return

2

8 Was 4

3 + N * 3 Was 4 + N * 5

1 Was 1 + 2 * log2N

6

14 Was 3 + 6 * N

2

---------------------------

37 + 3 N Was 37 + 4 N

N = 128 – instructions = 421 cycles

421 + 180 delay cycles = 590

Now delays are 50% of useful time

Was

549 + 50 delay cycle = 594 cyclesDelays were 10% of useful time

MIMD mode

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada


Why no improvement extra delays from where
Why no improvement? the available resourcesExtra delays from where?

Back to having towait for R2 to come

in from memory beforethe sum can occur

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada


The code is too slow because we are not taking advantage of the available resources1
The code is too slow because we are not taking advantage of the available resources

  • Bring in up to 128 bits (4 instructions) per cycle

  • Ability to bring in 4 32-bit values along J data bus (data1) and 4 along K bus (data2)

  • Perform address calculations in J and K ALU – single cycle hardware circular buffers

  • Perform math operations on both X and Y compute blocks

  • Background DMA activity

  • Off-load some of the processing to the second processor

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada


Multiple data busses
Multiple data busses the available resources

  • Many issues to solve before we can bring in 8 data values per cycle

    • Are the data values aligned so can access 4 values at once?

    • If they are not aligned – what can you do?

  • One step at a time – Next lecture

    • Lets us bring 1 value in along the J-Data bus and another in along the K-data bus

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada


Exercise on handling interleaving of instructions and x y compute operations
Exercise on handling interleaving of instructions and X-Y compute operations

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada


Tackled today1
Tackled today compute operations

  • Review of coding a hardware circular buffer

  • Roughly understanding where pipeline delays may occur

    • “Refactor” the working code to improve the speed without spending any time on examining whether delays really there – works at the moment principle

  • “Refactoring” working code to perform operations using both X and Y ALU’s – in principle twice the speed

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada


ad