papi 3 0 8 1 on blue gene l l.
Skip this Video
Loading SlideShow in 5 Seconds..
PAPI on Blue Gene L PowerPoint Presentation
Download Presentation
PAPI on Blue Gene L

Loading in 2 Seconds...

play fullscreen
1 / 26

PAPI on Blue Gene L - PowerPoint PPT Presentation

  • Uploaded on

PAPI on Blue Gene L. Using network performance counters to layout tasks for improved performance. Presentation overview. Project objectives PAPI explanation Blue Gene L explanation Current state of research. Project objectives. Upgrade PAPI on BG/L

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PAPI on Blue Gene L

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
papi 3 0 8 1 on blue gene l

PAPI on Blue Gene L

Using network performance counters to layout tasks for improved performance

presentation overview
Presentation overview
  • Project objectives
  • PAPI explanation
  • Blue Gene L explanation
  • Current state of research
project objectives
Project objectives
  • Upgrade PAPI on BG/L
    • Provide interface for network counters
    • Allow Lawrence Livermore National Lab users to also have access to PAPI
  • Using network counters to place tasks optimally on BG/L
papi intro
PAPI – Intro

Courtesy of

papi intro5
PAPI – Intro
  • PAPI useful to profile your own programs.
  • Many tools based on PAPI
    • PapiEx – Command line measurement tool
    • PerfSuite – Aggregate measurement and statistical profiling package and API
    • HPCToolkit – Statistical profiling package
    • Many more!
papi supported platforms
PAPI – Supported platforms
  • IBM – POWER3, 604, 604e, POWER4
  • Cray T3E, Cray X1
  • AMD – Athlon, Opteron
  • Intel – P1 to P4, Itanium I and II
  • UltraSparc I, II & III
  • MIPS R10K, R12K, R14K
  • Alpha
papi generic interface
PAPI – Generic Interface
  • Call sequence for generic interface
    • PAPI_library_init – Initialize memory for PAPI’s data structures
    • PAPI_create_eventset – Create an empty list of events
    • PAPI_add_event – Add events to be counted
    • PAPI_start – Begin counting all events within the specified eventset
    • PAPI_stop – Stop all counters and read their current values
papi events presets
PAPI – Events: Presets
  • Presets – list of predefined events implemented on all systems where they can be supported
    • Not all presets available on every architecture (e.g. BG/L has no cache lower than L3 – thus L1 cache hit preset not applicable)
    • Native events form the basic building blocks for PAPI presets
papi events presets9
PAPI – Events: Presets

Courtesy of

papi events native
PAPI – Events: Native
  • In addition to the predefined PAPI preset events, the PAPI library also exposes a majority of the events native to each platform
  • Can be added to eventsets in the same manner as presets
papi internals
PAPI – Internals
  • Array of eventsets is the main portion
papi other features
PAPI – Other features
  • Multiplexing – If there are not enough hardware counters
  • Thread safe – Profiling is thread safe
  • Overflow detection – Hardware counters have limited space
papi papi2 vs papi3
  • PAPI 3 significantly reduced overheads for starting, stopping and reading the counters

Courtesy of

papi papi2 vs papi315
  • Better native event support in PAPI3
  • Better thread support in PAPI3
  • Overflow and Profiling enhancements in PAPI3
  • Myriad bug fixes and code cleanup in PAPI3
papi papi2 vs papi316
  • Overlapping eventsets supported in PAPI2
  • Minor changes in the API – mostly dereferencing variables
blue gene l intro
Blue Gene L – Intro
  • 65,536 nodes connected in 64 x 32 x 32 3D torus
  • Nodes made up of PowerPC 440 embedded processors
  • Smaller than most super computers
  • Consumes less power
blue gene l networks
Blue Gene L - Networks
  • 3D torus network

(node to node)

  • Tree network


blue gene l hw counters
Blue Gene L – HW counters
  • 48 universal performance counters
  • 4 floating point unit counters
  • Counters 32 bit – must use virtual counters to prevent overflow
research overall goals
Research – Overall goals
  • Network hardware counters new
  • Use network counters to determine traffic between tasks
  • Try to optimize placement of tasks to minimize communication latency
  • Given counts and distances: cost = counts * distance. Minimize over all nodes
research counting
Research – Counting
  • First goal to determine what is being counted
research networks
Research – Networks
  • For each MPI call – determine which network counters are being used
    • Tree is supposed to be for broadcasts
    • Torus is supposed to be for point to point communication
  • Ambiguities in the specification
research future decisions
Research – Future decisions
  • How to profile a target application
    • Manually insert PAPI instrumentation: a lot of work
    • Instrument binaries with counting code
  • What information to store
    • All counts on each node: a lot of data
    • Sample of all nodes: not as accurate (what if the tasks behave / communicate differently?
research future decisions26
Research – Future decisions
  • How to use collected information
    • Profile an application to obtain counter feedback to determine optimized static task layout
    • Dynamically migrate tasks in response to counters