Automated microprocessor stressmark generation
Download
1 / 31

Automated Microprocessor Stressmark Generation - PowerPoint PPT Presentation


  • 133 Views
  • Uploaded on

Automated Microprocessor Stressmark Generation. Ajay M. Joshi* Lieven Eeckhout** Lizy K. John* Ciji Isen* *The University of Texas at Austin **Ghent University, Belgium HPCA 2008, Feb 19, Salt Lake City, UT. Energy, power, power density, temperature, voltage variation, ….

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Automated Microprocessor Stressmark Generation' - alize


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Automated microprocessor stressmark generation

Automated Microprocessor Stressmark Generation

Ajay M. Joshi*

Lieven Eeckhout**

Lizy K. John*

Ciji Isen*

*The University of Texas at Austin

**Ghent University, Belgium

HPCA 2008, Feb 19, Salt Lake City, UT


Energy power power density temperature voltage variation
Energy, power, power density, temperature, voltage variation, …

  • First-class design constraints

    • Embedded processors

    • High-performance processors

  • Understanding and analysis of primary importance

    • Average: typical

    • Maximum: worst-case


Why care about worst case
Why care about worst-case? variation, …

  • Processor must operate properly under extreme conditions

  • Examples

    • Max power  power supply, DPM

    • Max temperature  thermal package, DTM

    • Max dI/dt  power delivery

    • Localized max power  hot spots  circuit failure, timing errors, etc.

    • Max temperature differentials  sensor placement


How to characterize worst case
How to characterize worst-case? variation, …

  • Stressmarks

    • Hand-coded synthetic stress codes

  • Examples

    • Max power: Alpha’s Toast

    • Max dI/dt: Alpha’s Thumper

  • Limitations

    • Time-consuming to develop

    • Requires intimate understanding of system

    • Tied to a specific processor

      • Difficult to do in early design stages


A possible solution
A possible solution variation, …

  • Automatic stressmark generation

  • In two steps

    • BenchMaker

      • Generate synthetic benchmark from abstract workload model

    • StressMaker

      • Explore workload space by ‘turning knobs’ using BenchMaker and search for stressmarks


Outline
Outline variation, …

  • BenchMaker

    • Description

    • Evaluation

  • StressMaker

    • Description

    • Evaluation through case studies

      • Max-power, max single-cycle power, dI/dt

  • Related work

  • Conclusion and future work


Benchmaker
BenchMaker variation, …

hardware

abstract workload model

instruction mix

ILP

synthetic

benchmark

I & D footprint

benchmark synthesizer

D stream strides

branch transition

simulator

BB size


Instruction mix
Instruction mix variation, …

abstract workload model

Fraction short int

Fraction long int

Fraction short fp

Fraction long fp

Fraction int loads

Fraction int stores

Fraction fp load

Fraction fp stores

instruction mix

ILP

I & D footprint

D stream strides

branch transition

BB size


ILP variation, …

abstract workload model

Probability for inter-operation dependency distance

= 1

= 2

= 3, 4

= 5, 6

= 7, 8

= 9, … , 16

= 17, … , 32

> 32

instruction mix

ILP

I & D footprint

D stream strides

branch transition

BB size


I d stream behavior
I & D stream behavior variation, …

abstract workload model

No. unique I & D addresses

Fraction memory operations with a local stride (at 32-byte block level) of 0, 1, 2, …, 8, or greater than 8

instruction mix

ILP

I & D footprint

D stream strides

branch transition

BB size


Branch behavior
Branch behavior variation, …

abstract workload model

Probability for a transition rate of 0%-10%, 10%-20%, etc.

Avg and stdev of the basic block size distribution

instruction mix

ILP

I & D footprint

D stream strides

branch transition

BB size


Abstract workload model
Abstract workload model variation, …

abstract workload model

  • Only 40 characteristics

    • Explicit goal

    • In contrast to prior work

  • Microarchitecture-independent

instruction mix

ILP

I & D footprint

D stream strides

branch transition

BB size


Synthetic benchmark generator
Synthetic benchmark generator variation, …

  • Program spine

  • Instruction types

  • Inter-operation dependencies

  • Stride assignment

  • Branch transition

  • Register assignment

  • Code generation

add

sub

br

add

ld

mul

br

add

ld

sub

ld

st

br


Synthetic benchmark generator1
Synthetic benchmark generator variation, …

  • Input: abstract workload model

  • Output: synthetic benchmark

    • C program with embedded assembly code

  • Benefit: synthetic benchmark converges after 10 million dynamic instructions


Experimental setup
Experimental setup variation, …

  • sim-alpha validated Alpha 21264 simulator

  • Wattch for power modeling

  • HotSpot for thermal modeling

  • SPEC CPU2000

    • 100M simulation points

  • Commercial workloads

    • SPECjbb2005, DBT2, DBMS


Synthetic clone benchmark preserves characteristics
Synthetic clone benchmark preserves characteristics variation, …

Original benchmark

Synthetic clone benchmark

2.0

1.5

IPC

1.0

0.5

0.0

vpr

gcc

mcf

gzip

dbt2

twolf

bzip2

crafty

dbms

vortex

perlbmk

jbb2005

Original benchmark

Synthetic clone benchmark

35

30

25

20

EPI

15

10

5

0

vpr

gcc

mcf

gzip

dbt2

twolf

bzip2

dbms

crafty

vortex

perlbmk

jbb2005


Outline1
Outline variation, …

  • BenchMaker

    • Description

    • Evaluation

  • StressMaker

    • Description

    • Evaluation using case studies

      • Max-power, max single-cycle power, dI/dt

  • Related work

  • Conclusion and future work


Stressmaker
StressMaker variation, …

BenchMaker

synthetic

benchmark

abstract workload

configuration

microprocessor

model

abstract workload

space exploration

stressmark

objective function: e.g., max power


Workload space exploration
Workload space exploration variation, …

  • Huge space

  • Heuristic search using genetic algorithm

    • Bio-inspired algorithm

    • Reduces likelihood for local optima

    • Iterative algorithm

      • Start from randomly generated solutions

      • Probabilistically retain solutions with highest objective function value

      • Generate new solutions using crossover & mutation

    • End result: stressmark


Max power stressmark
Max-power stressmark variation, …

StressMaker

SPEC CPU / commercial

art

30

25

mesa

SPECjbb2005

20

perlbmk

gzip

Power (Watts)

15

perlbmk

perlbmk

mesa

gzip

dbt2

gzip

10

eon

mcf

art

5

0

lsq

alu

fetch

clock

icache

issue

bpred

regfile

dcache

window

rename

dispatch

dcache2

resultbus

  • 8-wide OOO processor; 81.5Watts in total

  • assuming Wattch (0.18um, 1.2GHz, aggressive clock gating)


Max power stressmark chars
Max-power stressmark chars variation, …

  • Keep functional units busy

    • Uniform mix of instruction types

  • Keep issue logic busy

    • High ILP

  • No pipeline flushes

    • High branch predictability

  • Keep caches busy

    • Good locality

       similar to hand-crafted stressmarks

      [Gowan et al., DAC’98] [Vishwanath, Intel Tech Journal, 2000]


Evaluation of genetic algorithm
Evaluation of genetic algorithm variation, …

  • Speed

    • Three orders of magnitude faster than exhaustive search

  • Effectiveness

    • Max-power stressmark through StressMaker achieves 99% of max-power stressmark through exhaustive search: 48Watts for 4-wide OOO processor


Max single cycle power
Max single-cycle power variation, …

  • Estimate max instantaneous (single-cycle) current drawn from the power supply

  • StressMaker’s stressmark: 72W

    • Its average power consumption: 32W

    • [4-wide OOO processor]

  • Maximum power assuming all units are 100% active: 85W

    • StressMaker gets 85% of theoretical maximum


Di dt stressmark
dI/dt stressmark variation, …

  • Current swings cause ripples in supply voltage

  • dI/dt stressmark alternates between high and low power consumption

    [Joseph et al., HPCA’03] [Alpha’s Thumper]

  • StressMaker

    • Generate N-insn max-power stressmark: 72W

    • Generate N-insn min-power stressmark: 16W

    • Concatenate both

    • Cyclic behavior with period 2N


Thermal stressmarks
Thermal stressmarks variation, …

  • Thermal hotspots

    • Max component power

  • Thermal differentials

    • Thermal sensor placement

      [Lee et al., ICCD’05]

    • Examples

      • L2 vs. I-fetch: 44.6ºC difference

        • No stress on L2, high ILP, high branch predictability

      • L2 vs. register remap: 48.4ºC difference

        • Lots of L2 accesses: stress L2 and minimal stress on register remap


Why automate the process
Why automate the process? variation, …

2-wide OOO max-power stressmark

100

4-wide OOO max-power stressmark

80

8-wide OOO max-power stressmark

60

Power (Watts)

40

20

0

2-wide OOO

4-wide OOO

8-wide OOO

stressmark is processor-specific


Outline2
Outline variation, …

  • BenchMaker

    • Description

    • Evaluation

  • StressMaker

    • Description

    • Evaluation using case studies

      • Max-power, max single-cycle power, dI/dt

  • Related work

  • Conclusion and future work


Related work
Related work variation, …

  • VLSI test vectors

    • at circuit level, not at (micro)architectural level

  • Hand-crafted stressmarks

    • Current practice

    • Max-power, dI/dt, thermal hotspots, temp differentials

  • Performance model validation

    • Microbenchmarks

  • Benchmark synthesis

    • Statistical simulation


Conclusion two contributions
Conclusion: two contributions variation, …

  • BenchMaker

    • Abstract workload model

    • Generates proxies for real-life benchmarks

    • High accuracy

  • StressMaker

    • Automated stressmark generation

    • Case studies: max-power, max single-cycle power, dI/dt, thermal hotspots, etc.


Future work
Future work variation, …

  • Compare StressMaker against hand-crafted stressmarks

  • Fine-tune abstract workload model

    • Bit toggling data values and instruction opcodes

    • Interactions between threads and programs

      • Multi-threaded and multi-core processors


Thank you questions
Thank you. Questions? variation, …

Automated Microprocessor Stressmark Generation

Ajay M. Joshi*

Lieven Eeckhout**

Lizy K. John*

Ciji Isen*

*The University of Texas at Austin

**Ghent University, Belgium


ad