1 / 94

Profiling tools

Profiling tools. By Vitaly Kroivets for Software Design Seminar. Contents. Introduction Software optimization process , optimization traps and pitfalls Benchmark Performance tools overview Optimizing compilers

eolande
Download Presentation

Profiling tools

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Profiling tools By Vitaly Kroivets for Software Design Seminar Profiling Tools

  2. Contents • Introduction • Software optimization process , optimization traps and pitfalls • Benchmark • Performance tools overview • Optimizing compilers • System Performance monitors • Profiling tools • GNU gprof • INTEL VTune • Valgrind • What does it mean to use system efficiently Profiling Tools

  3. The Problem • PC speed increased 500 times since 1981, but today’s software is more complex and still hungry for more resources • How to run faster on same hardware and OS architecture? • Highly optimized applications run tens times faster than poorly written ones. • Using efficient algorithms and well-designed implementations leads to high performance applications Profiling Tools

  4. The Software Optimization Process Hotspots are areas in your code that take a long time to execute Create benchmark Find hotspots Retest using benchmark Investigate causes Modify application Profiling Tools

  5. Extreme Optimization Pitfalls • Large application’s performance cannot be improved before it runs • Build the application then see what machine it runs on • Runs great on my computer… • Debug versus release builds • Performance requires assembly language programming • Code features first then optimize if there is time leftover Profiling Tools

  6. Key Point: Software optimization doesn’t begin where coding ends – It is ongoing process that starts at design stage and continues all the way through development Profiling Tools

  7. The Benchmark • The benchmark is program that used to • Objectively evaluate performance of an application • Provide repeatable application behavior for use with performance analysis tools • Industry standard benchmarks : • TPC-C 3D-Winbench • http://www.specbench.com/ • Enterprise Services • Graphics/Applications • HPC/OMP • Java Client/Server • Mail Servers • Network File System • Web Servers Profiling Tools

  8. Attributes of good benchmark • Repeatable (consistent measurements) • Remember system tasks , caching issues • “incoming fax” problem : use minimum performance number • Representative • Execution of typical code path, mimic how customer uses the application • Poor benchmarks : Using QA tests Profiling Tools

  9. Benchmark attributes (cont.) • Easy to run • Verifiable • need QA for benchmark! • Measure Elapsed Time vs. other number • Use benchmark to test functionality • Algorithmic tricks to gain performance may break the application… Profiling Tools

  10. How to find performance bottlenecks • Determine how your system resources, such as memory and processor, are being utilized to identify system-level bottlenecks • Measure the execution time for each module and function in your application • Determine how the various modules running on your system affect the performance of each other • Identify the most time-consuming function calls and call sequences within your application • Determine how your application is executing at the processor level to identify microarchitecture-level performance problems Profiling Tools

  11. Performance Tools Overview • Timing mechanisms • Stopwatch : UNIX time tool • Optimizing compiler (easy way) • System load monitors • vmstat , iostat , perfmon.exe, Vtune Counter • Software profiler • Gprof, VTune, Visual C++ Profiler, IBM Quantify • Memory debugger/profiler • Valgrind , IBM Purify, Parasoft Insure++ Profiling Tools

  12. Using Optimizing Compilers • Always use compiler optimization settings to build an application for use with performance tools • Understanding and using all the features of an optimizing compiler is required for maximum performance with the least effort Profiling Tools

  13. Optimizing Compiler : choosing optimization flags combination Profiling Tools

  14. Optimizing Compiler’s effect Profiling Tools

  15. Optimizing Compilers: Conclusions • Some processor-specific options still do not appear to be a major factor in producing fast code • More optimizations do not guarantee faster code • Different algorithms are most effective with different optimizations • Idea : using statistics gathered by profiler as input for compiler/linker Profiling Tools

  16. Windows Performance Monitor • Sampling “profiler” • Uses OS timer interrupt to wake up and record the value of software counters – disk reads, free memory • Maximum resolution : 1 sec • Cannot identify piece of code that caused event to occur • Good for finding system issues • Unix tools : vmstat, iostat, xos, top, oprofile, etc. Profiling Tools

  17. Performance Monitor Counters Profiling Tools

  18. Profilers • Profiler may show time elapsed in each function and its descendants • number of calls , call-graph (some) • Profilers use either instrumentation or sampling to identify performance issues Profiling Tools

  19. Sampling vs. Instrumentation Profiling Tools

  20. Profiling Tools Old, buggy and inaccurate • Gprof • Intel VTune • Valgrind $700. Unstable Is not profiler really … Profiling Tools

  21. GNU gprof Instrumenting profiler for every UNIX-like system Profiling Tools

  22. Using gprof GNU profiler • Compile and link your program with profiling enabled cc -g -c myprog.c utils.c -pg cc -o myprog myprog.o utils.o -pg • Execute your program to generate a profile data file • Program will run normally (but slower) and will write the profile data into a file calledgmon.out just before exiting • Program should exit using exit() function • Run gprof to analyze the profile data • gprof a.out Profiling Tools

  23. Example Program Profiling Tools

  24. Understanding Flat Profile • The flat profile shows the total amount of time your program spent executing each function. • If a function was not compiled for profiling, and didn't run long enough to show up on the program counter histogram, it will be indistinguishable from a function that was never called Profiling Tools

  25. Percentage of the total execution time your program spent in this function. These should all add up to 100%. Flat profile : %time Profiling Tools

  26. This is cumulative total number of seconds the spent in this functions, plus the time spent in all the functions above this one Flat profile: Cumulative seconds Profiling Tools

  27. Flat profile: Self seconds Numberof seconds accounted for this function alone Profiling Tools

  28. Flat profile: Calls Number of times was invoked Profiling Tools

  29. Flat profile: Self seconds per call Average number of sec per call Spent in this function alone Profiling Tools

  30. Flat profile: Total seconds per call Average number of seconds spent in this function and its descendents per call Profiling Tools

  31. Call Graph : call tree of the program Called by : main ( ) Descendants: doit ( ) Current Function: g( ) Profiling Tools

  32. Call Graph : understanding each line Total time propagated into this function by its children Unique index of this function Number of times was called Current Function: g( ) total amount of time spent in this function Percentage of the `total‘ time spent in this function and its children. Profiling Tools

  33. Call Graph : parents numbers Time that was propagated from the function's children into this parent Number of times this parent called the function `/‘ total number of times the function was called Call Graph : understanding each line Time that was propagated directly from the function into this parent Current Function: g( ) Profiling Tools

  34. Call Graph : “children” numbers Number of times this function called the child `/‘ total number of times this child was called Current Function: g( ) Amount of time that was propagated directly from the child into function Amount of time that was propagated from the child's children to the function Profiling Tools

  35. How gprof works • Instruments program to count calls • Watches the program running, samples the PC every 0.01 sec • Statistical inaccuracy : fast function may take 0 or 1 samples • Run should be long enough comparing with sampling period • Combine several gmon.out files into single report • The output from gprof gives no indication of parts of your program that are limited by I/O or swapping bandwidth. This is because samples of the program counter are taken at fixed intervals of run time • number-of-calls figures are derived by counting, not sampling. They are completely accurate and will not vary from run to run if your program is deterministic • Profiling with inlining and other optimizations needs care Profiling Tools

  36. VTune performance analyzer To squeeze every bit of power out of Intel architecture ! Profiling Tools

  37. VTune Modes/Features • Time- and Event-Based, System-WideSampling provides developers with the most accurate representation of their software's actual performance with negligible overhead • Call Graph Profiling provides developers with a pictorial view of program flow to quickly identify critical functions and call sequences • Counter Monitor allows developers to readily track system activity during runtime which helps them identify system level performance issues Profiling Tools

  38. Sampling mode • Monitors all active software on your system • including your application, the OS , JIT-compiled Java* class files, Microsoft* .NET files, 16-bit applications, 32-bit applications, device drivers • Application performance is not impacted during data collection Profiling Tools

  39. Sampling Mode Benefits • Low-overhead, system-wide profiling helps you identify which modules and functions are consuming the most time, giving you a detailed look at your operating system and application • Benefits of sampling: • Profiling to find hotspots. Find the module, functions, lines of source code and assembly instructions that are consuming the most time • Low overhead. Overhead incurred by sampling is typically about one percent • No need to instrument code. You do not need to make any changes to code to profile with sampling Profiling Tools

  40. How does sampling work? • Sampling interrupts the processor after a certain number of events and records the execution information in a buffer area. When the buffer is full, the information is copied to a file. After saving the information, the program resumes operation. In this way, the VTune™ maintains very low overhead (about one percent) while sampling • Time-based sampling: collects samples of active instruction addresses at regular time-based intervals (1ms. by default) • Event-based sampling: collects samples of active instruction addresses after a specified number of processor events • After the program finishes, the samples are mapped to modules and stored in a database within the analyzer program. Profiling Tools

  41. Starting the Sampling Wizard Profiling Tools

  42. Starting the Sampling Wizard Hardware prevents from sampling of many counters simultaneously Profiling Tools

  43. Starting the Sampling Wizard Profiling Tools

  44. Starting the Sampling Wizard Unsupported CPU ? Ha-ha-ha… Profiling Tools

  45. EBS : choosing events Profiling Tools

  46. Events counted by VTune • Basic Events: clock cycles, retired instructions • Instruction Execution: instruction decode, issue and execution, data and control speculation, and memory operations • Cycle Accounting Events: stall cycle breakdowns • Branch Events: branch prediction • Memory Hierarchy: instruction prefetch, instruction and data caches • System Events: operating system monitors, instruction and data TLBs About 130 different events in Pentium 4 architecture ! Profiling Tools

  47. Sampling … Profiling Tools

  48. Viewing Sampling Results • Process view • all the processes that ran on the system during data collection • Thread view • the threads that ran within the processes you select in Process view • Module view • the modules that ran within the selected processes and threads • Hotspot view • the functions within the modules you select in Module view Profiling Tools

  49. Different events collected – modules view System-wide look at software running on the system Our program CPI- good average indication Profiling Tools

  50. Hotspot Graph Click on hotspot bar VTune displays source code view Each bar represents one of the functions of our program Profiling Tools

More Related