Why do we need micro-benchmarks?

Systematic Energy Characterization of CMP/SMT Processor Systems via Automated Micro-BenchmarksR. Bertran*+, A. Buyuktosunoglu*, M. Gupta*, M. Gonzalez+, P. Bose**IBM T.J. Watson Research Center+Barcelona Supercomputing Center

Why do we need micro-benchmarks? What is the maximum power consumption? Any performance bug? Any reliability issues? … Micro-benchmarks! • Time consuming and tedious • Error prone task • Trial and error process • Several micro-benchmarks are required • Deep expertise limited to few designers • Detailed knowledge of the underlying architecture is required AUTOMATED SOLUTION NEEDED! 2

MicroProbe:a micro-benchmark generation framework

MicroProbe Workflow Inputs Outputs Micro-benchmark generation policy MicroProbe Framework User Micro- Bench-mark Micro- Bench-mark Micro- Bench-mark Micro- Bench-mark Endless loop for each instruction of the ISA Endless loop 50% INT 50% FP Max Power stressmark Architecture Definition files External tools Real platforms Simulators Models

MicroProbe: Distinguishing Features 5

MicroProbe Usage and Design Overview Research idea Micro-benchmark Micro-benchmark Micro-benchmark Micro-benchmark generation policies (user-defined scripts) Loop stressing the floating point unit Sequence of loads hitting 50% L1 and 50% L2 Generate a stressmark for each functional unit of the architecture Search for the sequence of 2 loads and 2 integer operations with maximum IPC MicroProbe Framework (Python API) Architecture module Code generation module Design space exploration module ISA definitions Micro-architecture analytical models ISA definitions Micro-architecture analytical models ISA definitions Micro-architecture analytical models Micro-benchmark synthesizer Search drivers Search drivers Search drivers Micro-architecture definitions Automatic bootstrap process Micro-architecture definitions Micro-architecture definitions Properties Properties Passes Properties Passes Passes External tools

Max-power Stressmark Generation Use MicroProbe to generate max-power stressmark Characterize energy per instruction (EPI) and IPC (Architecture Module) mulldo xvnmsubmdp lxvw4x Select N instructions with max (IPC* EPI) Loop: … mulldo mulldo lxvw4x lxvw4x xvnmsubmdp xvnmsubmdp … Form a basic endless loop (e.g. 4K) using selected instructions (Code Generation Module) Loop: … mulldo lxvw4x mulldo xvnmsubmdp lxvw4x xvnmsubmdp … Generate micro-benchmarks with different orders of the selected N instructions Evaluate using Design Space Exploration Module Pick the highest power microbenchmark 7

CASE studies MicroProbe:A Micro-benchmark Generation Framework 8

Experimental Methodology • Platform: • Processor: POWER7 @ 3GHz • 8-core 4-way SMT • 32KB L1, 256KB L2 and 4MB L3 per core • Memory: 32 GB DDR3 SDRAM @ 800MHz • OS: RHEL 5.7 + Linux 3.0.1 • EnergyScale architecture • Power measurements in miliwatts • Sampling rate up to 1ms • In-house software collects power and performance counter traces [C. Lefurgy et al, IBM] 9

Case Study 1: EPI Characterization High differences in EPI across instructions stressing different micro-architecture components High differences in EPI across instructions stressing the same micro-architecture components and at the same rate (IPC) 10

Case Study 2: Max-power Stressmark Generation Loops Loops Loops Loops Loops Loops Loops Loops Loops Loops Loops Loops Use MicroProbe Use complex instructions accessing different functional units with high IPC Use a computational intensive kernel Generate all possible combinations of complex instructions stressing different units ? MicroProbe Expert manual Loop: … mullw lxvd2x mullw xvmaddadp lxvd2x xvmaddadp … Loop: … mullw lxvd2x mullw xvmaddadp xvmaddadp lxvd2x … Loop: … mullw mullw xvmaddadp xvmaddadp lxvd2x lxvd2x … Expert DSE MicroProbe Heuristic: Max(EPI * IPC) DAXPY Selected intructions: mullw xvmaddadp lxvd2x Selected instructions: mulldo, xvnmsubmdp, lxvw4x MicroProbe 11

Max-power Stressmark Generation 12

Case Study 3: Counter-based Processor Power Model 1 Bottom-up Power modeling method Dynamic Power f(PMCs) Func.Unit micro- Benchmarks CMP1–SMT1 Intercept SMT1 Random micro- Benchmarks CMP1–SMT1 2 SMT effect Random micro- Benchmarks CMP1–SMT2/4 Intercept SMT2-4 CMP effect Random micro- Benchmarks CMP1/8–SMT2/4 Linear Regression f(CMP) 3 Uncore power Model: Dynamic Power f(PMCs) Uncore power SMT effect SMT enabled CMP effect # cores 13

Counter-based Processor Power ModelValidation Within acceptable error margins: < 4% on average

Counter-based Processor Power ModelValidation on Corner Cases • Models trained using non-micro-architecture aware training sets show high errors and variability • Models trained using the micro-architecture aware training set show acceptable error margins: < 5% on average

Conclusions • MicroProbe is a productive micro-benchmark generation framework • Adaptive and flexible • Includes micro-architecture semantics • Integrates design space exploration • Presented three case studies: • Instruction-based EPI characterization • Automated max-power stressmark generation • CMP/SMT-aware bottom-up counter-based processor power model 16

QUESTIONS? MicroProbe:A Micro-benchmark Generation Framework 17

Why do we need micro-benchmarks?

Why do we need micro-benchmarks?

Presentation Transcript

Why do we need classification??

Why do we need statistics?

Why do we need General Practice

Why Do We Need Persistent Identifiers?

Why do we need fire protection?

Why do we need classification

Why do we need sleep?

Why Do We Need This Training?

WHY DO WE NEED

Why do we need SWOT?

Why do we need art?

Why Do We Need Monetary Reform ?

Why do we need RDA ? Why now?

Why do we need Government?

Why do we need mining?

Why do we need Marketing?

WHY DO RESEARCH? WE NEED YOU!

Why do we need Public Policies?

WHY DO WE NEED ECONOMIC SUSTAINABILITY?

Why do we need evidence?

Why do we need exceptions?

WHY DO WE NEED TO DO SOMETHING?