1 / 26

Non-Intrusive Dynamic Application Profiling for Detailed Loop Execution Characterization

Non-Intrusive Dynamic Application Profiling for Detailed Loop Execution Characterization . Ajay Nair, Roman Lysecky Department of Electrical and Computer Engineering University of Arizona Tucson, AZ USA {ajaynair, rlysecky}@ece.arizona.edu. Introduction Application Profiling.

frayne
Download Presentation

Non-Intrusive Dynamic Application Profiling for Detailed Loop Execution Characterization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Non-Intrusive Dynamic Application Profiling for Detailed Loop Execution Characterization Ajay Nair, Roman Lysecky Department of Electrical and Computer Engineering University of Arizona Tucson, AZ USA {ajaynair, rlysecky}@ece.arizona.edu

  2. IntroductionApplication Profiling • Application profiling is useful for many purposes • Often used to identify frequently executed code regions • Allowing a designer to focus on optimizing those regions • Map frequently executed code and data regions to non-interfering cache regions • Used within binary translation approaches to store translation results • x86, Transmeta Crusoe • Can be used to create optimized SW or HW implementations selected at runtime • And many others…. University of Arizona

  3. IntroductionApplication Profiling – HW/SW Partitioning Hardware/software Partitioning Profiling is a critical step within hardware/software partitioning Often utilized to determine critical software region Frequently executed loops or functions Critical kernels can be re-implemented in hardware Speedup of 2X to 10X Speedup of 1000X possible Energy reduction of 25% to 95% µP HW COPROCESSOR (ASIC/FPGA) I$ D$ Software Application (C/C++) Application Profiling Critical Kernels Partitioning HW SW University of Arizona

  4. IntroductionApplication Profiling – Warp Processing Overview PROFILER DYNAMICALLY DETECTS APPLICATION’S KERNELS 4 5 3 2 1 APPLICATION INITIALLY EXECUTES ON MICROPROCESSOR I$ D$ ON-CHIP CADMAPS KERNELS ONTO FPGA WARPED EXECUTION IS 2-100XFASTER – OR – CONSUMES 75%LESS POWER CONFIGURE FPGA AND UPDATE APPLICATION BINARY Profiler µP W-FPGA On-chip CAD University of Arizona

  5. IntroductionApplication Profiling – Warp Processing Warp Processing - Dynamic Hardware/Software Partitioning Dynamically re-implements critical kernels as HW within W-FPGA Requires non-intrusive profiling to determine critical kernels at runtime Incorporated Frequent Loop Detection Profiler [Gordon-Ross, Vahid – TC 2005] Monitors short backwards branches Maintains a small list of branch executions frequency May lead to sub-optimal partitioning as it does not provide detailed loop execution statistics Profiler µP I$ D$ W-FPGA On-chip CAD University of Arizona

  6. IntroductionApplication Profiling – HW/SW Partitioning Loop iteration count alone may not provide sufficient information for accurate performance estimation Example Assume we want to partition only one of the following two loops to HW: With profile data from Frequent Loop Detection Profiler, kernel B appears to be the better candidate Software Application (C/C++) Application Profiling µP Critical Kernels Partitioning HW COPROCESSOR (ASIC/FPGA) I$ D$ HW SW University of Arizona

  7. IntroductionApplication Profiling – Warp Processing However, communication requirements can significantly impact overall performance Kernel A may in fact be the better choice Software Application (C/C++) Application Profiling µP Critical Kernels Partitioning HW COPROCESSOR (ASIC/FPGA) I$ D$ HW SW University of Arizona

  8. IntroductionApplication Profiling – Goal: Non-Intrusive Profiling Non-intrusive Application Profiling Goal: Profile application at runtime to determine detailed loop execution statistics with no impact on application execution Runtime overhead cannot be tolerated by many applications at runtime E.g. Real-time and embedded systems May lead to missed deadlines and potentially system failure Software Application (C/C++) Application Profiling µP Critical Kernels Partitioning HW COPROCESSOR (ASIC/FPGA) I$ D$ HW SW University of Arizona

  9. IntroductionApplication Profiling – Existing Profiling Methods Software Based Profiling Instrumenting - insert code directly within software E.g., monitor branches, basic blocks, functions, etc. Intrusive: Increases code size and introduces runtime overhead Statistical Sampling Periodically interrupt processor – or execute additional software task – to monitor program counter Statistically determine the application profile Very good accuracy with reduced overhead compared to instrumentation Intrusive: Introduces runtime overhead Software Application (C/C++) Application Profiling µP Critical Kernels Partitioning HW COPROCESSOR (ASIC/FPGA) I$ D$ HW SW University of Arizona

  10. IntroductionApplication Profiling – Existing Profiling Methods Hardware Based Profiling Processor Support – Event Counters Many processors include event counters that can be used to profile an application Intrusive: Requires additional software support to process event counters to profile application JTAG – Joint Test Action Group Standard interface for reading register within hardware devices Intrusive: Requires the processor to be halted to read the values Software Application (C/C++) Application Profiling µP Critical Kernels Partitioning HW COPROCESSOR (ASIC/FPGA) I$ D$ HW SW University of Arizona

  11. Dynamic Application Profiler (DAProf)Non-intrusive Dynamic Application Profiling Dynamic Application Profiler (DAProf) Non-intrusively monitors both loop executions and iterations Monitors processor’s instruction bus and branch execution behavior to build application profile Requires a short backwards branch (sbb) signal from microprocessor SBB SBB FOUND FOUNDINDEX IOFFSET IADDR IADDR I$ REPLACEINDEX D$ PROFILE CACHE TAG (30) OFFSET (8) CURRITER (10) AVGITER (13) EXECS (16) INLOOP (1) FRESH-NESS (3) Profiler FIFO Profiler Controller DYNAMIC APPLICATION PROFILER (DAPROF) sbb DAProf µP iAddr FPGA/ASIC University of Arizona

  12. Dynamic Application Profiler (DAProf)Non-intrusive Dynamic Application Profiling Profiler FIFO Small FIFO that stores the instruction address (iAddr) and instruction offset (iOffset) of all executed sbb’s Synchronizes between processor execution frequency and slower internal profiler frequency SBB SBB FOUND FOUNDINDEX IADDR IOFFSET IADDR REPLACEINDEX PROFILE CACHE TAG (30) OFFSET (8) CURRITER (10) AVGITER (13) EXECS (16) INLOOP (1) FRESH-NESS (3) Profiler FIFO Profiler Controller DYNAMIC APPLICATION PROFILER (DAPROF) University of Arizona

  13. Dynamic Application Profiler (DAProf)Non-intrusive Dynamic Application Profiling Profile Cache Tag: Address of the short backwards branch Offset: Negative branch offset Corresponds to the size of the loop Currently supports loops with less than 256 instructions SBB SBB FOUND FOUNDINDEX IADDR IOFFSET IADDR REPLACEINDEX PROFILE CACHE TAG (30) OFFSET (8) CURRITER (10) AVGITER (13) EXECS (16) INLOOP (1) FRESH-NESS (3) Profiler FIFO Profiler Controller DYNAMIC APPLICATION PROFILER (DAPROF) University of Arizona

  14. Dynamic Application Profiler (DAProf)Non-intrusive Dynamic Application Profiling Profile Cache CurrIter: Number of iterations for the current loop execution AvgIter: Average Iterations per execution of the loop 13-bit fixed point representation with 10 bits integer and 3 bits fractional SBB SBB FOUND FOUNDINDEX IADDR IOFFSET IADDR REPLACEINDEX PROFILE CACHE TAG (30) OFFSET (8) CURRITER (10) AVGITER (13) EXECS (16) INLOOP (1) FRESH-NESS (3) Profiler FIFO Profiler Controller DYNAMIC APPLICATION PROFILER (DAPROF) University of Arizona

  15. Dynamic Application Profiler (DAProf)Non-intrusive Dynamic Application Profiling Profile Cache InLoop: Flag indicating loop is currently executing Utilized to distinguish between loop iterations and loop executions Freshness: Indicates how recently a loop has been executed Utilized to ensure newly identified loops are not immediately replaced from the profile cache SBB SBB FOUND FOUNDINDEX IADDR IOFFSET IADDR REPLACEINDEX PROFILE CACHE TAG (30) OFFSET (8) CURRITER (10) AVGITER (13) EXECS (16) INLOOP (1) FRESH-NESS (3) Profiler FIFO Profiler Controller DYNAMIC APPLICATION PROFILER (DAPROF) University of Arizona

  16. Dynamic Application Profiler (DAProf)Non-intrusive Dynamic Application Profiling Profile Cache Outputs found: Indicates if current loop (identified by iAddr) is found within the profile cache foundIndex: Location of loop within profile cache, if found replaceIndex:Loop that will be replaced upon new loop execution Loop not identified as fresh with least total iterations SBB SBB FOUND FOUNDINDEX IADDR IOFFSET IADDR REPLACEINDEX PROFILE CACHE TAG (30) OFFSET (8) CURRITER (10) AVGITER (13) EXECS (16) INLOOP (1) FRESH-NESS (3) Profiler FIFO Profiler Controller DYNAMIC APPLICATION PROFILER (DAPROF) University of Arizona

  17. Dynamic Application Profiler (DAProf)Non-intrusive Dynamic Application Profiling Profiler Controller If loop is found within cache If InLoop flag is set New iteration Increment current iterations Otherwise New execution Increment executions Set current iterations to 1 Set InLoop flag Update Freshness University of Arizona

  18. Dynamic Application Profiler (DAProf)Non-intrusive Dynamic Application Profiling Profiler Controller If loop is not found within cache Replace profile cache entry Initialize execution and current iterations to 1 Set InLoop flag Update Freshness University of Arizona

  19. Dynamic Application Profiler (DAProf)Non-intrusive Dynamic Application Profiling Profiler Controller If current sbb (iAddr) is detected outside a loop within the profile cache AND, the loop’s InLoop flag is set Reset InLoop flag Update average iterations Ratio based average iteration calculation Simple hardware requirements Good accuracy for applications considered University of Arizona

  20. Dynamic Application Profiler (DAProf)Hardware Implementation PROFILE CACHE SBB SBB TAG (30) OFFSET (8) CURRITER (10) AVGITER (13) EXECS (16) INLOOP (1) FRESH-NESS (3) Profiler FIFO Profiler Controller FOUND FOUNDINDEX IADDR IADDR IOFFSET DYNAMIC APPLICATION PROFILER (DAPROF) REPLACEINDEX • DAProf Hardware • Implemented fully associative, 16-way associative, and 8-way associative profiler design in Verilog • Synthesized using Synopsys Design Compiler targeted at UMC .18µm University of Arizona

  21. Dynamic Application Profiler (DAProf)Profiling Accuracy • DAProf Profiling Accuracy • Compared profiling accuracy of top tens loops for several MiBench applications – compared to detailed simulation based profiling • Results presented for 8-way DAProf design • All three associativity performed similarly well 90% accuracy for average iterations 97% accuracy for executions 95% accuracy for % execution time University of Arizona

  22. Dynamic Application Profiler (DAProf)Profiling Accuracy – Function Call Interference • DAProf Profiling Accuracy • Some applications are affected by function call interference • Loop execution within functions called from within a loop may lead to InLoop flag being incorrectly reset for calling loop • Average iterations will be incorrectly updated Function Call Interference University of Arizona

  23. Current Work – Dynamic Application Profiler Function Call Support Extended DAProf Profiler with Function Call Support Monitors function calls and returns to avoid function call interference InFunc: Flag within Profile Cache to determine is a loop has called a function Will not update average iterations until function call returns FUNC SBB SBB FUNC RET RET SBB SBB FOUND FOUND FOUNDINDEX FOUNDINDEX IOFFSET IADDR IADDR IOFFSET IADDR IADDR REPLACEINDEX REPLACEINDEX PROFILE CACHE PROFILE CACHE TAG (30) OFFSET (30) CURRITER (10) AVGITER (13) EXECS (16) INLOOP (1) FRESH- NESS (3) INFUNC (1) TAG (30) OFFSET (8) CURRITER (10) AVGITER (13) EXECS (16) INLOOP (1) FRESH-NESS (3) Profiler FIFO Profiler Controller Profiler FIFO Profiler Controller DYNAMIC APPLICATION PROFILER (DAPROF) DYNAMIC APPLICATION PROFILER (DAPROF) University of Arizona

  24. Current Work – Dynamic Application ProfilerProfiling Accuracy with Function Call Support • DAProf Profiling Accuracy with Function Support • Compared profiling accuracy of top tens loops for several MiBench applications – compared to detailed simulation based profiling • Results presented for 8-way DAProf design • All three associativity performed similarly well 95% accurate for average iterations, executions, and % execution time University of Arizona

  25. Conclusions • Conclusions • Developed a non-intrusive dynamic application profiler (DAProf) • Profiles an application at runtime providing detailed loop execution characteristics • Developed efficient methods for identifying loop executions from loop iterations • Developed Freshness based replacement policy to ensure newly executed loops are not immediately replaced • Developed efficient method for monitoring function call executions • Achieves excellent profiling accuracy • On average, better than 95% accurate for average iterations per executions, loops executions, and estimated percentage of total application execution time • Efficient Hardware Implementation • Area requirement as little as 11% of an ARM9 processor • Maximum operating frequency of 495 MHz University of Arizona

  26. Current/Future Work Current/Future Work Current DAProf performs excellently for profiling single threaded software applications However, multitasked/multithreaded applications may lead to context switch interference Similar implications as that of function call interferences Need for task/thread aware profiling University of Arizona

More Related