Download Presentation

Loading in 3 Seconds

This presentation is the property of its rightful owner.

X

Sponsored Links

- 57 Views
- Uploaded on
- Presentation posted in: General

Accurate Power and Energy Measurement on Kepler -based Tesla GPUs

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Accurate Power and Energy Measurementon Kepler-based Tesla GPUs

Martin Burtscher

Department of Computer Science

- GPU-based accelerators
- Quickly spreading in PCs and even handheld devices
- Widely used in high-performance computing

- Power and energy efficiency
- Heat dissipation is a problem
- Electric bill and battery life are of growing concern
- Exascale requires 50x boost in performance per watt

- Important research area
- Need to develop techniques to reduce power and energy
- Have to be able to measure power/energy of programs

- Hardware
- High-end compute GPUs include power sensors
- For example, K20/K40 Tesla cards have built-in sensor
- These cards are the target of this talk

- Software
- Can query sensor with NVIDIA Management Library
- http://developer.nvidia.com/nvidia-management-library-nvml

- Power sensor data behaves strangely
- Running the same kernel twice yields different energy
- First launch: 114 J, second launch: 147 J (29% more energy)

- Running a kernel 2x as long more than doubles energy
- 1x input: 732 J, 2x input: 1579 J (8% above doubling)

- Running the same kernel twice yields different energy
- Power sensor sampling rate varies greatly
- Ranges from 0.266 ms to 130 ms (7.7 Hz to 3760 Hz)

- Hardware
- Two K20c, two K20m, two K20X, and two K40m GPUs

- Measurement
- Query power and time in loop on “idle” CPU core

- Test code
- Compute-intensive regular n-body kernel
- Constant computation rate of over 2 TFlops on a K20c
- No data dependences; vary n to adjust kernel runtime

Kernel starts executing

Kernel stops executing

GPU idle power

Measurement loop runtime

Macroscopic phenomena

3s

5s

4s

Switch to step shape

Power ramps up slowly

Power ramps down slowly

Idle power reached

Unclear how big energy is

Missing energy?

Delayed energy?

Integrateto where?

Ramp down doesn’t follow

2nd run starts higher but also follows curve

Short run same as longer run

Driver lowers power level

Shape depends on power at t2

Shape always the same

Steps down every second

Power increases after kernel done

Driver activity can prevent sampling

Very long interval

Wide range of intervals

Short intervals

Sampled power only ever changes after long interval

Identical values

Very long interval

Many short intervals

Correcting the Measurements

- Eliminate redundant samples
- Only sample once every 15 ms (66.7 Hz)
- Cannot accurately measure kernels under ~150 ms

- Account for the variation in interval length
- Use high-resolution time stamps

- Example: energy from t1 to t4
- Dotted (fixed intervals): 1205 J
- Solid (variable intervals): 1066 J
- 13% discrepancy

- Sensor hardware
- Seems to asymptotically approach true power
- Reminiscent of capacitor charging

- True instant power
- Ptrueis a function of the slope of the power profile dP/dt and the power measured by the sensor Psensor
Ptrue= Psensor + C × dPsensor/dt

- Ptrueis a function of the slope of the power profile dP/dt and the power measured by the sensor Psensor
- “Capacitance” of sensor
- C ≈ 0.84 s on all tested K20 GPUs

Minimized absolute errors to determine C

‘Capacitor’ function matches measured values perfectly

Wobbles due to sampling errors

‘Active idle’ power level

Corrected profile matches expected rectangular profile

Corrected power profile matches expected profile

Identical to original K20c

Similar profile but higher power level

Profile is good, no correction needed!

Huge 600 ms gap

K40m again requires correction

- Implementation of Barnes Hut n-body algorithm
- Taken from LonestarGPU benchmark suite
- Contains multiple regular and irregular kernels
- Highly optimized, but still suffers from load imbalance, divergence, and uncoalesced accesses
- Main kernel is ‘regularized’ (warp-based)

NASA/JPL-Caltech/SSC

Slow then fast drop-off

“Wave” in profile

Original profile is hard to interpret

Slow then fast drop-off

“Wave” in profile

Original profile is hard to interpret

Corrected profile reveals important info

Regularized main kernel

Two similar irreg. kernels

Decrease due to load imbal.

One more irreg. kernel

Very short regular kernel

- Output
- Corrected profile and corresponding ‘active’ energy

- Features
- Computes instant power using ‘capacitor’ formula
- Employs high-resolution time steps
- Samples at true frequency of 66.7 Hz

- Dissemination
- Open source, research license
- http://cs.txstate.edu/~burtscher/research/K20power/

- Tool will be part of Marcher system at Texas State
- NSF-funded green computing infrastructure

- Marcher is a power-measurable cluster system
- 832 general-purpose cores
- 12,000 GPU and MIC cores
- 1.2 TB of DDR3 with power throttling and scaling
- 50 TB of hybrid storage with hard drives and SSDs
- Component-level power measurement tools (e.g., CPU, DRAM, Disk, GPU, Xeon Phi)

- Correctly measuring K20/K40 power and energy
- Sample at 66.7 Hz and include time stamps
- Compute true power with presented formula
- Use neighboring power samples to approximate slope

- Compute true energy by integrating true power
- Over intervals where power is above ‘active idle’

- K20Power tool
- Software tool that implements this methodology

- Paper at http://cs.txstate.edu/~burtscher/papers/gpgpu14.pdf

- Collaborators
- Ivan Zecenaand ZiliangZong

- U.S. National Science Foundation
- DUE-1141022, CNS-1217231, and CNS-1305359

- NVIDIA Corporation
- Grants and equipment donations

- Texas State University
- Research Enhancement Program

Nvidia