Shobana Padmanabhan - PowerPoint PPT Presentation

slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Shobana Padmanabhan PowerPoint Presentation
Download Presentation
Shobana Padmanabhan

play fullscreen
1 / 82
Shobana Padmanabhan
83 Views
Download Presentation
dian
Download Presentation

Shobana Padmanabhan

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Shobana Padmanabhan Phillip Jones, David Schuehler, Praveen Krishnamurthy, Scott Friedman, Huakai Zhang, Ron Cytron, John Lockwood, Roger Chamberlain, Jason Fritts Washington University in St. Louis http://liquid.arl.wustl.edu Funded by NSF under grant 03-13203 Sep 22 Liquid Architecture Extracting & Improving Micro-architecture Performance onReconfigurable Architectures

  2. Application Performance Architecture Compiler Algorithm

  3. Customization cost/ performance tradeoff • Generic processor - cheap but application-agnostic; compilers exist; compiler optimization is the key • Reconfigurable logic - subject of our study;architecture and compiler research are the key • Customized logic - ideal for an application but expensive; logic/architecture research is key Generic FPGA Custom

  4. Liquid architecture combines the best of all options • Standard Architecture • Standardized ISA, existing compilers • Custom Architecture on Integrated Circuit • One-of-a-kind, nonstandard • Liquid Architecture on FPGA • ISA + extras, can use modified open-source tools

  5. Liquid architecture combines the best of all options • Standard Architecture • Standardized ISA, existing compilers • Not optimized for any specific application • Custom Architecture on Integrated Circuit • One-of-a-kind, nonstandard • Optimized for specific application • Liquid Architecture on FPGA • ISA + extras, can use modified open-source tools • Hardware can be optimized for specific application

  6. Liquid architecture combines the best of all options • Standard Architecture • Standardized ISA, existing compilers • Not optimized for any specific application • Fixed instructions and hardware • Custom Architecture on Integrated Circuit • One-of-a-kind, nonstandard • Optimized for specific application • Fixed instructions and hardware • Liquid Architecture on FPGA • ISA + extras, can use modified open-source tools • Hardware can be optimized for specific application • Reconfigurable ISA; ~100us – 100ms; person hours and not $millions

  7. Liquid architecture combines the best of all options • Standard Architecture • Standardized ISA, existing compilers • Not optimized for any specific application • Fixed instructions and hardware • ~ $200 - $500 • Custom Architecture on Integrated Circuit • One-of-a-kind, nonstandard • Optimized for specific application • Fixed instructions and hardware • ~ $500,000 - 1,000,000+ • Liquid Architecture on FPGA • ISA + extras, can use modified open-source tools • Hardware can be optimized for specific application • Reconfigurable ISA; ~100us – 100ms; person hours and not $millions • ~ $200 - $2000

  8. Hardware platform overview Development Workstation FPX FPGA Internet Instrumentation and variations Interface support modules (VHDL) Memory, Network interface chip, … Standard ISA SPARC 8 FPX research was supported by NSF: ANI-0096052 and Xilinx Corp.

  9. Hardware platform details FPX FPGA

  10. Hardware platform details FPGA Core Cache Controller I-CACHE D-CACHE FPX LEON • LEON - SPARC8 compatible & • Open soft core

  11. Hardware platform details FPGA SRAM / SDRAM Memory Controller Core Cache Controller Address/ Data bus AHB I-CACHE D-CACHE FPX LEON LEON • LEON - SPARC8 compatible & • Open soft core

  12. Application execution Workstation program FPGA gcc SRAM / SDRAM Memory Controller 001010 110110 001110 Core Cache Controller Address/ Data bus AHB I-CACHE D-CACHE Command Controller Control S/W Interface BLASTN DNA Sequence Comparison FPX LEON 001010 110110 001110

  13. Application runtime Workstation FPGA SRAM / SDRAM Memory Controller Results & Timing 001010 110110 001110 Core Cache Controller Address/ Data bus AHB I-CACHE D-CACHE Command Controller Control S/W Interface Slow! Where is time spent? FPX LEON

  14. Software approach to profiling “time” Introduce timers Run the instrumented program Execution Timings Start with the program • Timers must account for their own overhead • Instrumented program will run slower • Instrumentation skews runtime as it affects system behavior such as cache, …

  15. Profiling is free with liquid architecture!

  16. Cycle-accurate profiling for free Workstation FPGA SRAM / SDRAM Memory Controller 001010 110110 001110 Core Cache Controller Statistics Module Event monitor bus Address/ Data bus AHB I-CACHE D-CACHE Command Controller Control S/W Interface Request Timings FPX LEON pc

  17. Choose methods to profile from the user interface Method Time / Cycles Liquid architecture: cycle-accurate profiling for free .text main findMatch addQuery computeKey computeBase coreLoop fillQuery Rnd

  18. Method Address Range Liquid architecture: cycle-accurate profiling for free .text main Lo findMatch 0x4000027C 0x400003EF Hi addQuery computeKey computeBase coreLoop fillQuery Rnd

  19. Liquid architecture: cycle-accurate profiling for free Method Event Monitor Bus PC CLK .text Stats Module main 0x4000035A Lo findMatch 0x4000027C 0x400003EF Hi addQuery computeKey computeBase coreLoop fillQuery Rnd

  20. Liquid architecture: cycle-accurate profiling for free Function Event Monitor Bus PC CLK .text Stats Module Lo main 0x4000027C 0x4000035A 0x400003EF ≤ ≤ Hi findMatch Counter addQuery INCR computeKey computeBase coreLoop fillQuery Rnd

  21. Liquid architecture: cycle-accurate profiling for free Function Event Monitor Bus PC CLK .text Stats Module Lo main 0x4000027C 0x4000035A 0x400003EF ≤ ≤ Hi addQuery Counter findMatch INCR computeKey computeBase Lo 0x400005D8 0x4000035A 0x4000061F ≤ ≤ Hi coreLoop fillQuery Counter INCR Rnd

  22. Liquid architecture: cycle-accurate profiling for free Event Monitor Bus PC CLK Stats Module Lo 0x4000027C 0x4000035A 0x400003EF ≤ ≤ Hi Counter To Command Controller INCR Lo 0x400005D8 0x4000035A 0x4000061F ≤ ≤ Hi Counter INCR

  23. Cycle-accurate profiling for free Workstation FPGA SRAM / SDRAM Memory Controller 001010 110110 001110 Core Cache Controller Statistics Module Event monitor bus Address/ Data bus AHB I-CACHE D-CACHE Command Controller Control S/W Interface Request Timings FPX findMatch 500ms coreLoop 300ms LEON pc

  24. “Where time was spent” for BLASTN…

  25. “Where time was spent” for BLASTN… • Cycle-accurate profiling • No application overhead • Hence, at full speed

  26. Cycle-accurate profiling for free Workstation FPGA pc SRAM / SDRAM Memory Controller 001010 110110 001110 Core Cache Controller Statistics Module Event monitor bus Address/ Data bus AHB I-CACHE D-CACHE Command Controller Control S/W Interface FPX Is cache the problem? LEON

  27. Software approach to profiling cache Simulate cache behavior CacheSimulator Timings Not possible to profile by coding!! Slow !!

  28. Software approach to profiling “cache” Not possible to profile by coding!! Simulate cache behavior Scale down the program CacheSimulator Timings • Cannot afford to simulate the entire program

  29. How do we detect and report cache behaviorusing Liquid Architecture?

  30. Function Time / Cycles Liquid architecture: cache behavior for free .text main • Interface extends to include cache behavior options… findMatch addQuery computeKey computeBase coreLoop fillQuery Rnd

  31. Cache Hits / Misses Read Write Function Time / Cycles .text main findMatch addQuery computeKey computeBase coreLoop fillQuery Rnd

  32. Cache profiling Workstation FPGA pc SRAM / SDRAM Memory Controller 001010 110110 001110 Core Cache Controller Statistics Module Event monitor bus Address/ Data bus AHB I-CACHE D-CACHE Command Controller Control S/W Interface FPX LEON

  33. Cache behavior Hits and misses in LEON

  34. Cache behavior These signals are fed into the Event Monitoring Bus

  35. Cache behavior Statistics Module

  36. Cache behavior Statistics Module Statistics Module counts events

  37. Cache profiling Workstation FPGA pc SRAM / SDRAM Memory Controller 001010 110110 001110 Core Cache Controller Statistics Module Event monitor bus Address/ Data bus AHB I-CACHE D-CACHE Command Controller Control S/W Interface Reads hits misses Writes hits misses FPX LEON

  38. % Cache hit rate for D-cache: 1KB Function-wise cache profiling, in reasonable time

  39. Liquid architecture enables fast, accurate results Seconds: fast, but no cache performance data available

  40. Liquid architecture enables fast, accurate results Days: so slow you wouldn’t do this on the whole program

  41. Liquid architecture enables fast, accurate results ½ hour: Practical, reasonably fast, totally accurate

  42. Pipeline Stalls Branch Predict Function Time / Cycles Cache Hits / Misses Read Write .text main findMatch Can profile all other aspects of micro-architecture too… addQuery computeKey computeBase coreLoop fillQuery Rnd

  43. How do we use the profiling info to improve application performance?

  44. Reconfigure micro-architecture

  45. Reconfigure micro-architecture

  46. Reconfigure micro-architecture

  47. Reconfiguration Workstation FPGA SRAM / SDRAM Memory Controller 001010 110110 001110 Statistics Module Event monitor bus Address/ Data bus AHB Command Controller Control S/W Interface Cache Controller I-CACHE D-CACHE program FPX gcc Core Cache Controller I-CACHE D-CACHE

  48. Cache hits after D-cache reconfiguration

  49. Cache hits after D-cache reconfiguration

  50. Cache hits after D-cache reconfiguration Conclusion for “large” run: D-cache doesn’t make much difference. Hit rate is already very high