1 / 26

Profiling-Based Hardware/Software Co-Exploration for the Design of Video Coding Architectures

Profiling-Based Hardware/Software Co-Exploration for the Design of Video Coding Architectures. Heiko Hübert and Benno Stabernack. Contents. 1. Background. 2. MEMTRACE profiler. 3. Software/Hardware Optimization. 4. Conclusion. Background -- profiling.

cadee
Download Presentation

Profiling-Based Hardware/Software Co-Exploration for the Design of Video Coding Architectures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Profiling-Based Hardware/Software Co-Exploration for the Design of Video Coding Architectures Heiko Hübert and Benno Stabernack

  2. Contents 1. Background 2. MEMTRACE profiler 3. Software/Hardware Optimization 4.Conclusion

  3. Background -- profiling • Profiling is used to understand the run-time behavior of applications

  4. Efficient profiling approaches • Software profiling • Sampling, Instrumentation • Flexible but have high overhead • Hardware profiling • Performance counter • inexpensive but more rigid and may not be universally available • Hybrid Combinations of the above • Hold great potential since they combine the advantages of both without the drawbacks

  5. An example of hardware profiling • PC – Performance Counter

  6. Background – system analysis • Why we need profiling? • It is very important to adapt the system to the application in order to find an efficient solution. • Video coding

  7. Contents 1. Background 2. MEMTRACE profiler 3. Software/Hardware Optimization 4. Conclusion

  8. MEMTRACE profiler • MEMTRACE delivers cycle-accurate profiling results on a C function level. • The results include clock cycles, various memory access statistics, and optionally energy consumption estimation for reduced instruction set computer (RISC)-based processors. • A focus is placed on memory access analysis, as for data-intensive applications this aspect has a high potential for increasing system efficiency.

  9. MEMTRACE profiling toolflow

  10. MEMTRACE -- Initialization

  11. MEMTRACE – Performance Analysis

  12. MEMTRACE – Post Processing

  13. MEMTRACE backend

  14. MEMTRACE -- Profiling data acquisition

  15. MEMTRACE -- Profiling data acquisition • init() • Initialize the profiler. • Creates a list of all functions and global variables • nextInstruction() • Checks if the program execution has changed from one function to another • If so, the cycle count of the previous function is recalculated and the call count of the new function is incremented • memoryAccess() • It is decided if a load or store access was performed, and which bit-width (8, 16, or 32-bit) was used.

  16. MEMTRACE -- Profiling data acquisition • busActivity() • Identifies the bus status (idle cycle, core access or DMA access) and increments the appropriate counter of the current function • cacheMiss() • Is called each time a cache miss occurs • finish() • When the ISS terminates the simulation

  17. Processor model generator

  18. Interconnection

  19. What can we do by using the result of MEMTRACE profiler?

  20. Contents 1. Background 2. MEMTRACE profiler 3. Software/Hardware Optimization 4. Conclusion

  21. System partitioning • Computationally intensive functions are well-suited for hardware acceleration in a coprocessor • Control-intensive functions are better suited for software implementation on ASIPs (Application Specific Instruction set Processors)

  22. Software Optimization • Loop unrolling • For computational intensive parts, arithmetic optimizations or SIMD instructions can be applied, if such instructions are available in the processor • Video applications

  23. Hardware Optimization • Memory Subsystem Optimizations • External memory • Cache (Cache miss) • The data areas with the most cache misses and the smallest size should be stored in on-chip memory • SRAM • Instruction Set Architecture Optimizations • Frequently used instructions should be consideredas targets for optimization during the processor architecturedevelopment.

  24. Conclusion • Profiling and system analysis • MEMTRACE architecture • Initialization • Performance analysis • Post processing • Hardware/Software optimization • Software • Hardware

  25. Thank You ! And questions?

  26. References • [1] H Hübert, B Stabernack. Profiling-based hardware/software co-exploration for the design of video coding architectures. IEEE Transactions on Circuits and Systems for Video Technology, 2009, Pages: 1680-1691 • [2]ST Microelectronics: Nomadik STn8820 Mobile Multimedia Application Processor (2008, Feb.). Data brief. [Online]. Available: www.st.com • [3] Broadcom: BCM2820 Low Power, High Performance Application Processor (2006, Sep.). Product brief. [Online]. Available: www.broadcom.com • [4] G. de Micheli and L. Benini, Network on Chips. San Francisco, CA: Morgan Kaufmann, 2006. • [5] H. H¨ubert, “MEMTRACE: A memory, performance and energy profiler targeting RISC-based embedded systems for dataintensive applications,” Ph.D. dissertation, Dept. Elect. Eng. Comput. Sci., Tech. Univ. Berlin, Germany, 2009. [Online]. Available: http://opus.kobv.de/tuberlin/volltexte/2009/2261

More Related