Using multiple energy gears in mpi programs on a power scalable cluster
This presentation is the property of its rightful owner.
Sponsored Links
1 / 17

Using Multiple Energy Gears in MPI Programs on a Power-Scalable Cluster PowerPoint PPT Presentation


  • 74 Views
  • Uploaded on
  • Presentation posted in: General

Using Multiple Energy Gears in MPI Programs on a Power-Scalable Cluster. Vincent W. Freeh, David K. Lowenthal, Feng Pan, and Nandani Kappiah Presented by: Huaxia Xia CSAG, CSE of UCSD. Introduction. Power-aware Computing HPC Uses Large-scale Systems, Has High Power Consumption

Download Presentation

Using Multiple Energy Gears in MPI Programs on a Power-Scalable Cluster

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Using multiple energy gears in mpi programs on a power scalable cluster

Using Multiple Energy Gears in MPI Programs on a Power-Scalable Cluster

Vincent W. Freeh, David K. Lowenthal, Feng Pan, and Nandani Kappiah

Presented by: Huaxia Xia

CSAG, CSE of UCSD


Introduction

Introduction

  • Power-aware Computing

    • HPC Uses Large-scale Systems, Has High Power Consumption

    • Two extremes:

      • Performance-at-all-costs

      • Low-performance but more energy efficient

  • This paper targets to save energy with little performance penalty


Related work

Related Work

  • Server/Desktop Systems

    • Minimize the number of servers needed to handle the load, and set other servers into low-energy state (standby or power-off)

    • Set node voltage independently

    • Disk:

      • Modulate the speed of disks dynamically

      • Improve cache policy

      • Aggregate disk accesses to have burst requests

  • Mobile Systems

    • Energy-aware OS

    • Voltage-changeable CPU

    • Disk spindown

    • Memory

    • Network


Assumptions

Assumptions

  • HPC Applications

    • Performance is the Primary Concern

    • Highly Regular and Predictable

  • CPU has Multiple “Gears”

    • Variable Frequency

    • Variable Voltage

  • CPU is a Major Power Consumer

    • Energy consumption of disks/memory/network is not considered


Methodology profile directed

Methodology: Profile-Directed

  • Get Program Trace

  • Divide the Program into Blocks

  • Merge the Blocks into Phases

  • Search the Best Gear for Each Phase Heuristically


Divide codes into blocks

Divide Codes into “Blocks”

  • Rule 1: Any MPI operation demarcates a block boundary.

  • Rule 2: If the memory pressure changes abruptly, a block boundary occurs at this change.

    • Use operations per miss (OPM) as a measure of the memory pressure


Merge blocks into phases

Merge “Blocks” into “Phases”

  • Two adjacent blocks are merged into a phase if their corresponding memory pressure is within the same threshold

  • OPM in Trace of LU (Class C):


Data collection

Data Collection

  • Use MPI-jack

    • Intercept any MPI call transparently

    • Can execute arbitrary codes before/after an intercepted call

    • Insert pseudo MPI calls at non-MPI phase boundaries

    • Collect information of time, operations, L2 misses

  • Question: Mutual Dependence?

    • Trace data  Block boundaries


Solution search 1

Solution Search (1)

  • Metrics: Energy-Time Tradeoff

    • Normalized energy and time

    • Total system energy

    • A larger negative number indicates a near vertical slope and a significant energy saving

    • Question: How to measure energy consumption accurately?


Solution search 2

Solution Search (2)

  • Phase Prioritization

    • Sort the phases in the order of OPM (lowhigh)

    • Question: why is sorting necessary?

  • “Novel” Heuristic Search

    • Find the local optimal gear for each phase one by one

    • Running time is at most n×g


Solution search 3

Solution Search (3)


Experiments

Experiments

  • 10 AMD Athlon-64 CPUs

    • Frequency-scalable: 800-2000MHz

    • Voltage-scalable: 0.9-1.5V

    • 1GB main memory

    • 128KB L1 cache, 512KB L2 cache

  • 100Mb/s network

  • CPU Consumes 45-55% of Overall System Energy

  • Benchmarks: NAS Parallel Benchmarks (NPB)


Results multiple gear benefit

Results: Multiple Gear Benefit

  • IS: 16% energy saving with 1% extra time

  • BT: 10% energy saving with 5% extra time

  • MG: 11% energy saving with 4% extra time


Results single gear benefit

Results: Single Gear Benefit

  • CG: 8% energy saving with 3% extra time

  • SP: 15% energy saving with 7% extra time

The order of phases matters!


Results no benefit

Results: No Benefit


Conclusions and future work

Conclusions and Future Work

  • Use Profile-directed Method to Achieve Good Energy-Time Tradeoff for HPC Applications

  • Future work:

    • Enhance profile-directed techniques

    • Consider Inter-node bottlenecks

    • Automate the entire process


Discussion

Discussion

  • How important is power consumption to HPC?

    • 10% energy  ?  5% time

  • Is Profile-directed method practical?

    • Effective for applications that run repeatedly

    • How much degree of automatic?

  • Is OPM (Operations Per Miss) a good metric to find phases?

    • Key Purpose: to identify CPU utilization

    • Other options: Instructions Per Second, CPU Usage

  • Is OPM a good metric to sort phases?


  • Login