using multiple energy gears in mpi programs on a power scalable cluster
Download
Skip this Video
Download Presentation
Using Multiple Energy Gears in MPI Programs on a Power-Scalable Cluster

Loading in 2 Seconds...

play fullscreen
1 / 17

Using Multiple Energy Gears in MPI Programs on a Power-Scalable Cluster - PowerPoint PPT Presentation


  • 116 Views
  • Uploaded on

Using Multiple Energy Gears in MPI Programs on a Power-Scalable Cluster. Vincent W. Freeh, David K. Lowenthal, Feng Pan, and Nandani Kappiah Presented by: Huaxia Xia CSAG, CSE of UCSD. Introduction. Power-aware Computing HPC Uses Large-scale Systems, Has High Power Consumption

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Using Multiple Energy Gears in MPI Programs on a Power-Scalable Cluster' - doctor


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
using multiple energy gears in mpi programs on a power scalable cluster

Using Multiple Energy Gears in MPI Programs on a Power-Scalable Cluster

Vincent W. Freeh, David K. Lowenthal, Feng Pan, and Nandani Kappiah

Presented by: Huaxia Xia

CSAG, CSE of UCSD

introduction
Introduction
  • Power-aware Computing
    • HPC Uses Large-scale Systems, Has High Power Consumption
    • Two extremes:
      • Performance-at-all-costs
      • Low-performance but more energy efficient
  • This paper targets to save energy with little performance penalty
related work
Related Work
  • Server/Desktop Systems
    • Minimize the number of servers needed to handle the load, and set other servers into low-energy state (standby or power-off)
    • Set node voltage independently
    • Disk:
      • Modulate the speed of disks dynamically
      • Improve cache policy
      • Aggregate disk accesses to have burst requests
  • Mobile Systems
    • Energy-aware OS
    • Voltage-changeable CPU
    • Disk spindown
    • Memory
    • Network
assumptions
Assumptions
  • HPC Applications
    • Performance is the Primary Concern
    • Highly Regular and Predictable
  • CPU has Multiple “Gears”
    • Variable Frequency
    • Variable Voltage
  • CPU is a Major Power Consumer
    • Energy consumption of disks/memory/network is not considered
methodology profile directed
Methodology: Profile-Directed
  • Get Program Trace
  • Divide the Program into Blocks
  • Merge the Blocks into Phases
  • Search the Best Gear for Each Phase Heuristically
divide codes into blocks
Divide Codes into “Blocks”
  • Rule 1: Any MPI operation demarcates a block boundary.
  • Rule 2: If the memory pressure changes abruptly, a block boundary occurs at this change.
    • Use operations per miss (OPM) as a measure of the memory pressure
merge blocks into phases
Merge “Blocks” into “Phases”
  • Two adjacent blocks are merged into a phase if their corresponding memory pressure is within the same threshold
  • OPM in Trace of LU (Class C):
data collection
Data Collection
  • Use MPI-jack
    • Intercept any MPI call transparently
    • Can execute arbitrary codes before/after an intercepted call
    • Insert pseudo MPI calls at non-MPI phase boundaries
    • Collect information of time, operations, L2 misses
  • Question: Mutual Dependence?
    • Trace data  Block boundaries
solution search 1
Solution Search (1)
  • Metrics: Energy-Time Tradeoff
    • Normalized energy and time
    • Total system energy
    • A larger negative number indicates a near vertical slope and a significant energy saving
    • Question: How to measure energy consumption accurately?
solution search 2
Solution Search (2)
  • Phase Prioritization
    • Sort the phases in the order of OPM (lowhigh)
    • Question: why is sorting necessary?
  • “Novel” Heuristic Search
    • Find the local optimal gear for each phase one by one
    • Running time is at most n×g
experiments
Experiments
  • 10 AMD Athlon-64 CPUs
    • Frequency-scalable: 800-2000MHz
    • Voltage-scalable: 0.9-1.5V
    • 1GB main memory
    • 128KB L1 cache, 512KB L2 cache
  • 100Mb/s network
  • CPU Consumes 45-55% of Overall System Energy
  • Benchmarks: NAS Parallel Benchmarks (NPB)
results multiple gear benefit
Results: Multiple Gear Benefit
  • IS: 16% energy saving with 1% extra time
  • BT: 10% energy saving with 5% extra time
  • MG: 11% energy saving with 4% extra time
results single gear benefit
Results: Single Gear Benefit
  • CG: 8% energy saving with 3% extra time
  • SP: 15% energy saving with 7% extra time

The order of phases matters!

conclusions and future work
Conclusions and Future Work
  • Use Profile-directed Method to Achieve Good Energy-Time Tradeoff for HPC Applications
  • Future work:
    • Enhance profile-directed techniques
    • Consider Inter-node bottlenecks
    • Automate the entire process
discussion
Discussion
  • How important is power consumption to HPC?
    • 10% energy  ?  5% time
  • Is Profile-directed method practical?
    • Effective for applications that run repeatedly
    • How much degree of automatic?
  • Is OPM (Operations Per Miss) a good metric to find phases?
    • Key Purpose: to identify CPU utilization
    • Other options: Instructions Per Second, CPU Usage
  • Is OPM a good metric to sort phases?
ad