1 / 19

Introduction to Open Source Performance Tool --Linux Tool Perf

Introduction to Open Source Performance Tool --Linux Tool Perf. Yiqi Ju (Fred) Sep. 13, 2012. Task. 07/09~09/14 Verizon Box Embedded System Software Environment Open Source Performance Tools Kernel Profiling. Kernel Profiling?.

jamuna
Download Presentation

Introduction to Open Source Performance Tool --Linux Tool Perf

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Open Source Performance Tool --Linux Tool Perf Yiqi Ju (Fred) Sep. 13, 2012

  2. Task • 07/09~09/14 • Verizon Box • Embedded System • Software Environment • Open Source Performance Tools • Kernel Profiling

  3. Kernel Profiling? • Collect and analyze kernel space system-wide resource statistic • HW trend—increasing core numbers • SW performance—find the bottleneck • Solution—full use of available tools

  4. Available Tools • Top(on board)/htop --real-time monitoring • Sysstat utilities --sar, iostat (on board), vmstat… • SS—socket statistics • Lttng—kernel tracing • Perf—counting and sampling • …

  5. Perf Tool • Perf_event kernel interface • Linux kernel subsystem, merged into v2.6.31 and after

  6. Perf_event Kernel Interface • Performance counter—hardware counter, no bother register, often called PMU (Performance Measurement Unit) • Event-oriented API—do not use HW register but relies on PMU ready CPUs • Support Events grouping, measure simultaneously Source: Perf File Format, Urs Fassler. CERN openlab

  7. Sampling • Perf record initializes sampling through perf_event interface • Create blank mmap pages to kernel space • Kernel writes record and send back to perf, perf record *.data file and save to current directory

  8. Sampling cont. Blank mmap pages generated through perf_events Written mmap page Source: Perf File Format, Urs Fassler. CERN openlab

  9. Advantage • Low overhead—compare to instrumenting profiling • Fast—counting is done at the time the load is off, even cannot tell delays • Bunch of usages, provides much information

  10. Perf usage metro-root-perf_record> perf usage: perf [--version] [--help] COMMAND [ARGS] The most commonly used perf commands are: annotate Read perf.data (created by perf record) and display annotated code diff Read two perf.data files and display the differential profile list List all symbolic event types lock Analyze lock events probe Define new dynamic tracepoints record Run a command and record its profile into perf.data report Read perf.data (created by perf record) and display the profile sched Tool to trace/measure scheduler properties (latencies) stat Run a command and gather performance counter statistics timechart Tool to visualize total system behavior during a workload top System profiling tool. trace Read perf.data (created by perf record) and display trace output …

  11. List of pre-defined events (to be used in -e): cpu-cycles OR cycles [Hardware event] instructions [Hardware event] cache-references [Hardware event] cache-misses [Hardware event] branch-instructions OR branches [Hardware event] branch-misses [Hardware event] bus-cycles [Hardware event] cpu-clock [Software event] task-clock [Software event] page-faults OR faults [Software event] minor-faults [Software event] major-faults [Software event] context-switches OR cs [Software event] cpu-migrations OR migrations [Software event] alignment-faults [Software event] emulation-faults [Software event] L1-dcache-loads [Hardware cache event] L1-dcache-load-misses [Hardware cache event] L1-dcache-stores [Hardware cache event] L1-dcache-store-misses [Hardware cache event] L1-dcache-prefetches [Hardware cache event] L1-dcache-prefetch-misses [Hardware cache event] L1-icache-loads [Hardware cache event] L1-icache-load-misses [Hardware cache event] L1-icache-prefetches [Hardware cache event] L1-icache-prefetch-misses [Hardware cache event] LLC-loads [Hardware cache event] LLC-load-misses [Hardware cache event] LLC-stores [Hardware cache event] LLC-store-misses [Hardware cache event] LLC-prefetches [Hardware cache event] LLC-prefetch-misses [Hardware cache event] dTLB-loads [Hardware cache event] dTLB-load-misses [Hardware cache event] dTLB-stores [Hardware cache event] dTLB-store-misses [Hardware cache event] dTLB-prefetches [Hardware cache event] dTLB-prefetch-misses [Hardware cache event] iTLB-loads [Hardware cache event] iTLB-load-misses [Hardware cache event] branch-loads [Hardware cache event] branch-load-misses [Hardware cache event] … List of Events

  12. Perf stat metro-root-perf_record> perf stat -e L1-dcache-loads -e L1-dcache-load-misses -e dTLB-loads -e dTLB-load-misses -e L1-icache-loads -e L1-icache-misses start_appli Start_appli… Performance counter stats for 'start_appli': 354543239 L1-dcache-loads (scaled from 80.54%) <not counted> L1-dcache-load-misses 507073444 dTLB-loads (scaled from 83.87%) 305313 dTLB-load-misses (scaled from 83.89%) missrate: 0.0602% 2303127335 L1-icache-loads (scaled from 83.80%) 7994049 L1-icache-load-misses (scaled from 84.33%) missrate: 0.347% 74.850334944 seconds time elapsed ----- (Data from mt2179, P1.0 board, 12:25AM, 9/12/2012)

  13. Perf stat cont. metro-root-perf_record> perf stat -e dTLB-loads -e dTLB-load-misses -e L1- icache-loads -e L1-icache-misses start_appli … Performance counter stats for 'start_appli': 534611783 dTLB-loads 308219 dTLB-load-misses missrate: 0.0577% 2375996954 L1-icache-loads 7810360 L1-icache-load-misses missrate: 0.329% 55.029461151 seconds time elapsed ----- (Data collected from mt2179, P1.0 board, 12:35PM, 9/12/2012)

  14. Perf record/report metro-root-perf_record> perf record -F 3000 -o startapp.data start_appli … [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.560 MB startapp.data (~24470 samples) ] … metro-root-perf_record> perf report -i startapp.data > startapp.txt

  15. (Data collected from mt2179, P1.0 board, 12:35PM, 9/12/2012)

  16. Perf diff metro-root-perf_record> perf diff lsactive.data lslactive.data (Data collected from mt2179, P1.0 board, 12:35PM, 9/12/2012)

  17. More on future • Perf timechart—visualize total system behavior in time sequence • Perf trace—enable script tracing, Perl support from 2.6.33-rc, Python support patches available • Perf annotate—source code allocation • Perf event converter, web-based GUI enable remote profiling

  18. Source: Scripting support for perf. Jake Edge, Feb 10, 2010

  19. References • Perf_event project http://web.eecs.utk.edu/~vweaver1/projects/perf-events/index.html • Perf File Format by CERN openlab http://openlab.web.cern.ch/sites/openlab.web.cern.ch/files/technical_documents/Urs_Fassler_report.pdf • Perf wiki https://perf.wiki.kernel.org/index.php • perf_events status update by Stephane Eranian, Google, Inc. • Kenel mailing list http://lwn.net/Articles/373842/

More Related