slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
PAPI Evaluation PowerPoint Presentation
Download Presentation
PAPI Evaluation

Loading in 2 Seconds...

play fullscreen
1 / 17

PAPI Evaluation - PowerPoint PPT Presentation


  • 344 Views
  • Uploaded on

PAPI Evaluation. Patricia J. Teller , Maria G. Aguilera, Thientam Pham, and Roberto Araiza ( Leonardo Salayandia, Alonso Bayona, Manuel Nieto, and Michael Maxwell) University of Texas-El Paso . Supported by the Department of Defense PET Program. Main Objectives.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'PAPI Evaluation' - albert


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

PAPI Evaluation

Patricia J. Teller, Maria G. Aguilera, Thientam Pham, and Roberto Araiza

(Leonardo Salayandia, Alonso Bayona, Manuel Nieto, and Michael Maxwell)

University of Texas-El Paso

.

Supported by the Department of Defense PET Program

SC 2003, Phoenix, AZ – November 17-20, 2003

main objectives
Main Objectives
  • Provide DoD users with a set of documentation that enables them to easily collect, analyze, and interpret hardware performance data that is highly relevant for analyzing and improving performance of applications on HPC platforms.

SC 2003, Phoenix, AZ – November 17-20, 2003

evaluation objectives
Evaluation: Objectives
  • Understand and explain counts obtained for various PAPI metrics
  • Determine reasons why counts may be different from what is expected
  • Calibrate counts, excluding PAPI overhead
  • Work with vendors and/or the PAPI team to fix errors
  • Provide DoD users with information that will allow them to effectively use collected performance data

SC 2003, Phoenix, AZ – November 17-20, 2003

evaluation methodology 1
Evaluation: Methodology - 1
  • Micro-benchmark: design and implement a micro-benchmark that facilitates event count prediction
  • Prediction: predict event counts using tools and/or mathematical models
  • Data collection-1: collect hardware-reported event counts using PAPI
  • Data collection-2: collect predicted event counts using a simulator (not always necessary or possible)

SC 2003, Phoenix, AZ – November 17-20, 2003

evaluation methodology 2
Evaluation: Methodology - 2
  • Comparison: compare predicted and hardware-reported event counts
  • Analysis: analyze results to identify and possibly quantify differences
  • Alternate approach: when analysis indicates that prediction is not possible, use an alternate means to either verify reported event count accuracy or demonstrate that the reported event count seems reasonable

SC 2003, Phoenix, AZ – November 17-20, 2003

example findings 1
Example Findings - 1
  • Some hardware-reported event counts mirror expected behavior, e.g., number of floating-point instructions on the MIPS R10K and R12K.
  • Other hardware-reported events can be calibrated, by subtracting that part of the event count associated with the interface (overhead or bias error), to mirror expected behavior , e.g., number of load instructions on the MIPS and POWER processors and instructions completed on the POWER3.
  • In some cases, compiler optimizations effect event counts, e.g.,the number of floating-point instructions on the IBM POWER platforms.

SC 2003, Phoenix, AZ – November 17-20, 2003

example findings 2
Example Findings - 2
  • Very-long instruction words can affect event counts, e.g., on the Itanium architecture the number of instruction cache misses and instructions retired are dilated by no-ops used to compose very long instruction words.
  • The definition of the event count may be non-standard and, thus, the associated performance data may be misleading, e.g., instruction cache hits on the POWER3.
  • The complexity of hardware features and lack of documentation can make it difficult to understand how to tune performance based on information gleaned from event counts—example: data prefetching, page walker.

SC 2003, Phoenix, AZ – November 17-20, 2003

example findings 3
Example Findings - 3
  • Although wehave not been able to determine the algorithms used for prefetching, the ingenuity and performance of these mechanisms is striking.
  • In some cases, more instructions are completed than issued on the R10K.
  • The DTLB miss count on the POWER3varies depending upon the method used to allocate memory (i.e., static, calloc or malloc).
  • Hardware SQRT on POWER3not counted in total floating-point operations unless combined with another floating-point operation.

SC 2003, Phoenix, AZ – November 17-20, 2003

publications
Publications
  • Papers
    • DoD Users Group Conference (with Shirley Moore), June 2003.
    • LACSI 2002 (with Shirley Moore), October 2002.
    • DoD Users Group Conference (with members of PAPI team), June 2002.
    • “Hardware Performance Metrics and Compiler Switches: What you see is not always what you get,” with Luiz Derose, submitted for publication.
  • Posters
    • “Hardware Performance Counters: Is what you see, what you get?, Poster SC2003.
  • Presentations
    • PTools Workshop, September 2002.
    • Conference Presentations for Papers above.

SC 2003, Phoenix, AZ – November 17-20, 2003

calibration example 1
Calibration Example - 1
  • Instructions completed
    • PAPI overhead: 139 on POWER3-II

SC 2003, Phoenix, AZ – November 17-20, 2003

calibration example 2
Calibration Example - 2
  • Instructions completed
    • PAPI overhead: 141 for small micro-benchmarks

SC 2003, Phoenix, AZ – November 17-20, 2003

rib okc for evaluation resources
RIB/OKC for Evaluation Resources
  • Object-oriented data model to store benchmarks, results and analyses
  • Information organized for ease of use by colleagues external to PCAT
  • To be web-accessible to members
  • Objects linked between them as appropriate

Benchmark

General description of a benchmark

Case

Specific implementation and results

Machine

Description of platform

Organization

Contact information

SC 2003, Phoenix, AZ – November 17-20, 2003

pcat rib okc data repository example
PCAT RIB/OKC Data Repository Example
  • Benchmark name: DTLB misses
  • Development date: 12/2002
  • Benchmark type: Array
  • Abstract: Code traverses though an array of integers once at regular strides of PAGESIZE. The intention is to create compulsory misses on each array access. Input parameters are: Page size (bytes) and array size (bytes). The number of misses normally expected should be: Array Size / Page Size.
  • Files included:dtlbmiss.c, dtlbmiss.pl
  • About included files: dtlbmiss.c, benchmark source code in C, requires pagesize and arraysize parameters for input and outputs PAPI event count.
    • dtlbmiss.pl, perl script that executes the benchmark 100 times for increasing arraysize parameters and saves benchmark output to text file. Script should be customized for pagesize parameter and arraysize range.

Links to files

SC 2003, Phoenix, AZ – November 17-20, 2003

pcat rib okc example case object
PCAT RIB/OKC Example Case Object

Name: DTLB misses on Itanium

Date: 12/2002

Compiler and options: gcc ver 2.96 20000731 (Red Hat Linux 7.1 2.96-101) –O0

PAPI Event: PAPI_TLB_DM, Data TLB misses

Native Event: DTLB_MISSES

Experimental methodology: Ran benchmark 100 times with perl script, averages and standard deviations reported

Input parameters used: Page size = 16K, Array size = 16K – 160M (increments by multiples of 10)

Platform used:HP01.cs.utk.edu (Itanium)

Developed by:PCAT

Benchmark used:DTLB misses

Links to other objects

SC 2003, Phoenix, AZ – November 17-20, 2003

pcat rib okc example case object1
PCAT RIB/OKC Example Case Object
  • Results summary: Reported counts closely match the predicted counts, showing differences close to 0% even in the cases with a small number of data references, which may be more susceptible to external perturbation. The counts indicate that prefetching is not performed at the DTLB level.
  • Included files and description:
  • dtlbmiss.itanium.c: Source code of benchmark, instrumented with PAPI to count PAPI_TLB_DM
  • dtlbmiss.itanium.pl: Perl script used to run the benchmark
  • dtlbmiss.itanium.txt: Raw data obtained, each column contains results for a particular array size, each case is run 100 times (i.e., 100 rows included)
  • dtlbmiss.itanium.xls: Includes raw data, averages of runs, standard deviations and graph of % difference between reported and predicted counts
  • dtlbmiss.itanium.pdf: Same as dtlbmiss.itanium.xls

SC 2003, Phoenix, AZ – November 17-20, 2003

contributions
Contributions
  • Infrastructure that facilitates user access of hardware performance data that is highly relevant for analyzing and improving the performance of their applications on HPC platforms.
  • Information that allows users to effectively use the data with confidence.

SC 2003, Phoenix, AZ – November 17-20, 2003

questions
QUESTIONS?

SC 2003, Phoenix, AZ – November 17-20, 2003