Using multiple energy gears in mpi programs on a power scalable cluster
Download
1 / 17

Using Multiple Energy Gears in MPI Programs on a Power-Scalable Cluster - PowerPoint PPT Presentation


  • 116 Views
  • Uploaded on

Using Multiple Energy Gears in MPI Programs on a Power-Scalable Cluster. Vincent W. Freeh, David K. Lowenthal, Feng Pan, and Nandani Kappiah Presented by: Huaxia Xia CSAG, CSE of UCSD. Introduction. Power-aware Computing HPC Uses Large-scale Systems, Has High Power Consumption

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Using Multiple Energy Gears in MPI Programs on a Power-Scalable Cluster' - doctor


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Using multiple energy gears in mpi programs on a power scalable cluster

Using Multiple Energy Gears in MPI Programs on a Power-Scalable Cluster

Vincent W. Freeh, David K. Lowenthal, Feng Pan, and Nandani Kappiah

Presented by: Huaxia Xia

CSAG, CSE of UCSD


Introduction
Introduction Power-Scalable Cluster

  • Power-aware Computing

    • HPC Uses Large-scale Systems, Has High Power Consumption

    • Two extremes:

      • Performance-at-all-costs

      • Low-performance but more energy efficient

  • This paper targets to save energy with little performance penalty


Related work
Related Work Power-Scalable Cluster

  • Server/Desktop Systems

    • Minimize the number of servers needed to handle the load, and set other servers into low-energy state (standby or power-off)

    • Set node voltage independently

    • Disk:

      • Modulate the speed of disks dynamically

      • Improve cache policy

      • Aggregate disk accesses to have burst requests

  • Mobile Systems

    • Energy-aware OS

    • Voltage-changeable CPU

    • Disk spindown

    • Memory

    • Network


Assumptions
Assumptions Power-Scalable Cluster

  • HPC Applications

    • Performance is the Primary Concern

    • Highly Regular and Predictable

  • CPU has Multiple “Gears”

    • Variable Frequency

    • Variable Voltage

  • CPU is a Major Power Consumer

    • Energy consumption of disks/memory/network is not considered


Methodology profile directed
Methodology: Profile-Directed Power-Scalable Cluster

  • Get Program Trace

  • Divide the Program into Blocks

  • Merge the Blocks into Phases

  • Search the Best Gear for Each Phase Heuristically


Divide codes into blocks
Divide Codes into “Blocks” Power-Scalable Cluster

  • Rule 1: Any MPI operation demarcates a block boundary.

  • Rule 2: If the memory pressure changes abruptly, a block boundary occurs at this change.

    • Use operations per miss (OPM) as a measure of the memory pressure


Merge blocks into phases
Merge “Blocks” into “Phases” Power-Scalable Cluster

  • Two adjacent blocks are merged into a phase if their corresponding memory pressure is within the same threshold

  • OPM in Trace of LU (Class C):


Data collection
Data Collection Power-Scalable Cluster

  • Use MPI-jack

    • Intercept any MPI call transparently

    • Can execute arbitrary codes before/after an intercepted call

    • Insert pseudo MPI calls at non-MPI phase boundaries

    • Collect information of time, operations, L2 misses

  • Question: Mutual Dependence?

    • Trace data  Block boundaries


Solution search 1
Solution Search (1) Power-Scalable Cluster

  • Metrics: Energy-Time Tradeoff

    • Normalized energy and time

    • Total system energy

    • A larger negative number indicates a near vertical slope and a significant energy saving

    • Question: How to measure energy consumption accurately?


Solution search 2
Solution Search (2) Power-Scalable Cluster

  • Phase Prioritization

    • Sort the phases in the order of OPM (lowhigh)

    • Question: why is sorting necessary?

  • “Novel” Heuristic Search

    • Find the local optimal gear for each phase one by one

    • Running time is at most n×g


Solution search 3
Solution Search (3) Power-Scalable Cluster


Experiments
Experiments Power-Scalable Cluster

  • 10 AMD Athlon-64 CPUs

    • Frequency-scalable: 800-2000MHz

    • Voltage-scalable: 0.9-1.5V

    • 1GB main memory

    • 128KB L1 cache, 512KB L2 cache

  • 100Mb/s network

  • CPU Consumes 45-55% of Overall System Energy

  • Benchmarks: NAS Parallel Benchmarks (NPB)


Results multiple gear benefit
Results: Multiple Gear Benefit Power-Scalable Cluster

  • IS: 16% energy saving with 1% extra time

  • BT: 10% energy saving with 5% extra time

  • MG: 11% energy saving with 4% extra time


Results single gear benefit
Results: Single Gear Benefit Power-Scalable Cluster

  • CG: 8% energy saving with 3% extra time

  • SP: 15% energy saving with 7% extra time

The order of phases matters!


Results no benefit
Results: No Benefit Power-Scalable Cluster


Conclusions and future work
Conclusions and Future Work Power-Scalable Cluster

  • Use Profile-directed Method to Achieve Good Energy-Time Tradeoff for HPC Applications

  • Future work:

    • Enhance profile-directed techniques

    • Consider Inter-node bottlenecks

    • Automate the entire process


Discussion
Discussion Power-Scalable Cluster

  • How important is power consumption to HPC?

    • 10% energy  ?  5% time

  • Is Profile-directed method practical?

    • Effective for applications that run repeatedly

    • How much degree of automatic?

  • Is OPM (Operations Per Miss) a good metric to find phases?

    • Key Purpose: to identify CPU utilization

    • Other options: Instructions Per Second, CPU Usage

  • Is OPM a good metric to sort phases?