1 / 28

Detailed look at the TigerSHARC pipeline

Detailed look at the TigerSHARC pipeline. Cycle counting for the IALU versionof the DC_Removal algorithm. To be tackled today. Expected and actual cycle count for J-IALU version of DC_Removal algorithm Understanding why the stalls occur and how to fix.

brays
Download Presentation

Detailed look at the TigerSHARC pipeline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Detailed look at the TigerSHARC pipeline Cycle counting for the IALU versionof the DC_Removal algorithm

  2. To be tackled today • Expected and actual cycle count for J-IALU version of DC_Removal algorithm • Understanding why the stalls occur and how to fix. • Differences between first time into a function (cache empty) and second time into the function DC_Removal algorithm performance

  3. Set up timeIn principle 1 cycle / instruction 2 + 4 instructions DC_Removal algorithm performance

  4. First key element – Sum Loop -- Order (N) Second key element – Shift Loop – Order (log2N) 4 instructions N * 5 instructions 1 + 2 * log2N DC_Removal algorithm performance

  5. Third key element – FIFO circular buffer -- Order (N) 6 3 6 * N 2 DC_Removal algorithm performance

  6. TigerSHARC pipeline DC_Removal algorithm performance

  7. Using the “Pipeline Viewer” • Available with the TigerSHARC simulator ONLY • VIEW | Debug Windows | Pipeline viewer • F1 to F4 – instruction fetch unit pipeline • PD, D, I -- Integer ALU pipeline • A, EX1, EX2 – Compute Block pipeline DC_Removal algorithm performance

  8. Pipeline symbols Control - click A – Abort B – Bubble H – BTB Hit (Jumps) S – Stall W – Wait X – Illegal fetch(F1 – F4)X – Illegal instruction (PD – E2) DC_Removal algorithm performance

  9. Set up pointers to buffers Insert values into buffers SUM LOOP SHIFT LOOP Update outgoing parameters Update FIFO Function return 2 4 4 + N * 5 1 + 2 * log2N 6 3 + 6 * N 2 --------------------------- 22 + 11 N + 2 log2N N = 128 – instructions = 1444 1444 cycles + 1100 delay cycles C++ debug mode – 9500 cycles??????? Time in theory Note other tests executed before this test. Means “cache filled” DC_Removal algorithm performance

  10. Test environment Examine the pipeline the 2nd time around the loop“Cache’s filled”? DC_Removal algorithm performance

  11. Set up time Expected 2 + 4 instructions Actual 2 + 4 instructions + 2 stalls Why not 4 stalls? DC_Removal algorithm performance

  12. First time round sum loop Expected 9 instructions LC0 load – 3 stalls Each memory fetch – 4 stalls Actual 9 + 11 stalls DC_Removal algorithm performance

  13. Other times around the loop Expected 5 instructions Each memory fetch – 4 stalls Actual 5 + 8 stalls DC_Removal algorithm performance

  14. Shift Loop – 1st time around Expected 3 instructions No stalls on LC0 load? 4 stall on ASHIFTR BTB hit followed by 5 aborts DC_Removal algorithm performance

  15. Shift loop2nd and later times around Expect 2 Get 2 DC_Removal algorithm performance

  16. Store back of &left, &right Expect 6 Actual 6 + 3 stalls DC_Removal algorithm performance

  17. Exercise 1 • Based on knowledge to this points – determine the expected stalls during the last piece of code – FIFO buffer operatio DC_Removal algorithm performance

  18. Third key element – FIFO circular buffer-- Order (N) 6 3 6 * N 2 DC_Removal algorithm performance

  19. Answer DC_Removal algorithm performance

  20. DC_Removal algorithm performance

  21. DC_Removal algorithm performance

  22. DC_Removal algorithm performance

  23. Second time into function DC_Removal algorithm performance

  24. What happens if cache not full? – first time function called? Was 2 + 2 stalls in loop Now 11 + 12 stalls in loop DC_Removal algorithm performance

  25. First time function called2nd time around the loopDitto 3, 4, 5, 6, 7, 8 times DC_Removal algorithm performance

  26. 9th time around the loopditto 17th, 25th, 33rd, 41st , 49th DC_Removal algorithm performance

  27. What is happening? • With cache filled – memory read accesses require 4 cycles • Unfilled – first one requires “12 cycles” • Then next 7 require 4 cycles • Total guess – is extra time associated with doing extra reads to fill the cache? DC_Removal algorithm performance

  28. Tackled today • Expected and actual cycle count for J-IALU version of DC_Removal algorithm • Understanding why the stalls occur and how to fix. • Differences between first time into a function (cache empty) and second time into the function • Further unknowns – how memory operations really work DC_Removal algorithm performance

More Related