Automated microprocessor stressmark generation
This presentation is the property of its rightful owner.
Sponsored Links
1 / 31

Automated Microprocessor Stressmark Generation PowerPoint PPT Presentation


  • 94 Views
  • Uploaded on
  • Presentation posted in: General

Automated Microprocessor Stressmark Generation. Ajay M. Joshi* Lieven Eeckhout** Lizy K. John* Ciji Isen* *The University of Texas at Austin **Ghent University, Belgium HPCA 2008, Feb 19, Salt Lake City, UT. Energy, power, power density, temperature, voltage variation, ….

Download Presentation

Automated Microprocessor Stressmark Generation

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Automated microprocessor stressmark generation

Automated Microprocessor Stressmark Generation

Ajay M. Joshi*

Lieven Eeckhout**

Lizy K. John*

Ciji Isen*

*The University of Texas at Austin

**Ghent University, Belgium

HPCA 2008, Feb 19, Salt Lake City, UT


Energy power power density temperature voltage variation

Energy, power, power density, temperature, voltage variation, …

  • First-class design constraints

    • Embedded processors

    • High-performance processors

  • Understanding and analysis of primary importance

    • Average: typical

    • Maximum: worst-case


Why care about worst case

Why care about worst-case?

  • Processor must operate properly under extreme conditions

  • Examples

    • Max power  power supply, DPM

    • Max temperature  thermal package, DTM

    • Max dI/dt  power delivery

    • Localized max power  hot spots  circuit failure, timing errors, etc.

    • Max temperature differentials  sensor placement


How to characterize worst case

How to characterize worst-case?

  • Stressmarks

    • Hand-coded synthetic stress codes

  • Examples

    • Max power: Alpha’s Toast

    • Max dI/dt: Alpha’s Thumper

  • Limitations

    • Time-consuming to develop

    • Requires intimate understanding of system

    • Tied to a specific processor

      • Difficult to do in early design stages


A possible solution

A possible solution

  • Automatic stressmark generation

  • In two steps

    • BenchMaker

      • Generate synthetic benchmark from abstract workload model

    • StressMaker

      • Explore workload space by ‘turning knobs’ using BenchMaker and search for stressmarks


Outline

Outline

  • BenchMaker

    • Description

    • Evaluation

  • StressMaker

    • Description

    • Evaluation through case studies

      • Max-power, max single-cycle power, dI/dt

  • Related work

  • Conclusion and future work


Benchmaker

BenchMaker

hardware

abstract workload model

instruction mix

ILP

synthetic

benchmark

I & D footprint

benchmark synthesizer

D stream strides

branch transition

simulator

BB size


Instruction mix

Instruction mix

abstract workload model

Fraction short int

Fraction long int

Fraction short fp

Fraction long fp

Fraction int loads

Fraction int stores

Fraction fp load

Fraction fp stores

instruction mix

ILP

I & D footprint

D stream strides

branch transition

BB size


Automated microprocessor stressmark generation

ILP

abstract workload model

Probability for inter-operation dependency distance

= 1

= 2

= 3, 4

= 5, 6

= 7, 8

= 9, … , 16

= 17, … , 32

> 32

instruction mix

ILP

I & D footprint

D stream strides

branch transition

BB size


I d stream behavior

I & D stream behavior

abstract workload model

No. unique I & D addresses

Fraction memory operations with a local stride (at 32-byte block level) of 0, 1, 2, …, 8, or greater than 8

instruction mix

ILP

I & D footprint

D stream strides

branch transition

BB size


Branch behavior

Branch behavior

abstract workload model

Probability for a transition rate of 0%-10%, 10%-20%, etc.

Avg and stdev of the basic block size distribution

instruction mix

ILP

I & D footprint

D stream strides

branch transition

BB size


Abstract workload model

Abstract workload model

abstract workload model

  • Only 40 characteristics

    • Explicit goal

    • In contrast to prior work

  • Microarchitecture-independent

instruction mix

ILP

I & D footprint

D stream strides

branch transition

BB size


Synthetic benchmark generator

Synthetic benchmark generator

  • Program spine

  • Instruction types

  • Inter-operation dependencies

  • Stride assignment

  • Branch transition

  • Register assignment

  • Code generation

add

sub

br

add

ld

mul

br

add

ld

sub

ld

st

br


Synthetic benchmark generator1

Synthetic benchmark generator

  • Input: abstract workload model

  • Output: synthetic benchmark

    • C program with embedded assembly code

  • Benefit: synthetic benchmark converges after 10 million dynamic instructions


Experimental setup

Experimental setup

  • sim-alpha validated Alpha 21264 simulator

  • Wattch for power modeling

  • HotSpot for thermal modeling

  • SPEC CPU2000

    • 100M simulation points

  • Commercial workloads

    • SPECjbb2005, DBT2, DBMS


Synthetic clone benchmark preserves characteristics

Synthetic clone benchmark preserves characteristics

Original benchmark

Synthetic clone benchmark

2.0

1.5

IPC

1.0

0.5

0.0

vpr

gcc

mcf

gzip

dbt2

twolf

bzip2

crafty

dbms

vortex

perlbmk

jbb2005

Original benchmark

Synthetic clone benchmark

35

30

25

20

EPI

15

10

5

0

vpr

gcc

mcf

gzip

dbt2

twolf

bzip2

dbms

crafty

vortex

perlbmk

jbb2005


Outline1

Outline

  • BenchMaker

    • Description

    • Evaluation

  • StressMaker

    • Description

    • Evaluation using case studies

      • Max-power, max single-cycle power, dI/dt

  • Related work

  • Conclusion and future work


Stressmaker

StressMaker

BenchMaker

synthetic

benchmark

abstract workload

configuration

microprocessor

model

abstract workload

space exploration

stressmark

objective function: e.g., max power


Workload space exploration

Workload space exploration

  • Huge space

  • Heuristic search using genetic algorithm

    • Bio-inspired algorithm

    • Reduces likelihood for local optima

    • Iterative algorithm

      • Start from randomly generated solutions

      • Probabilistically retain solutions with highest objective function value

      • Generate new solutions using crossover & mutation

    • End result: stressmark


Max power stressmark

Max-power stressmark

StressMaker

SPEC CPU / commercial

art

30

25

mesa

SPECjbb2005

20

perlbmk

gzip

Power (Watts)

15

perlbmk

perlbmk

mesa

gzip

dbt2

gzip

10

eon

mcf

art

5

0

lsq

alu

fetch

clock

icache

issue

bpred

regfile

dcache

window

rename

dispatch

dcache2

resultbus

  • 8-wide OOO processor; 81.5Watts in total

  • assuming Wattch (0.18um, 1.2GHz, aggressive clock gating)


Max power stressmark chars

Max-power stressmark chars

  • Keep functional units busy

    • Uniform mix of instruction types

  • Keep issue logic busy

    • High ILP

  • No pipeline flushes

    • High branch predictability

  • Keep caches busy

    • Good locality

       similar to hand-crafted stressmarks

      [Gowan et al., DAC’98] [Vishwanath, Intel Tech Journal, 2000]


Evaluation of genetic algorithm

Evaluation of genetic algorithm

  • Speed

    • Three orders of magnitude faster than exhaustive search

  • Effectiveness

    • Max-power stressmark through StressMaker achieves 99% of max-power stressmark through exhaustive search: 48Watts for 4-wide OOO processor


Max single cycle power

Max single-cycle power

  • Estimate max instantaneous (single-cycle) current drawn from the power supply

  • StressMaker’s stressmark: 72W

    • Its average power consumption: 32W

    • [4-wide OOO processor]

  • Maximum power assuming all units are 100% active: 85W

    • StressMaker gets 85% of theoretical maximum


Di dt stressmark

dI/dt stressmark

  • Current swings cause ripples in supply voltage

  • dI/dt stressmark alternates between high and low power consumption

    [Joseph et al., HPCA’03] [Alpha’s Thumper]

  • StressMaker

    • Generate N-insn max-power stressmark: 72W

    • Generate N-insn min-power stressmark: 16W

    • Concatenate both

    • Cyclic behavior with period 2N


Thermal stressmarks

Thermal stressmarks

  • Thermal hotspots

    • Max component power

  • Thermal differentials

    • Thermal sensor placement

      [Lee et al., ICCD’05]

    • Examples

      • L2 vs. I-fetch: 44.6ºC difference

        • No stress on L2, high ILP, high branch predictability

      • L2 vs. register remap: 48.4ºC difference

        • Lots of L2 accesses: stress L2 and minimal stress on register remap


Why automate the process

Why automate the process?

2-wide OOO max-power stressmark

100

4-wide OOO max-power stressmark

80

8-wide OOO max-power stressmark

60

Power (Watts)

40

20

0

2-wide OOO

4-wide OOO

8-wide OOO

stressmark is processor-specific


Outline2

Outline

  • BenchMaker

    • Description

    • Evaluation

  • StressMaker

    • Description

    • Evaluation using case studies

      • Max-power, max single-cycle power, dI/dt

  • Related work

  • Conclusion and future work


Related work

Related work

  • VLSI test vectors

    • at circuit level, not at (micro)architectural level

  • Hand-crafted stressmarks

    • Current practice

    • Max-power, dI/dt, thermal hotspots, temp differentials

  • Performance model validation

    • Microbenchmarks

  • Benchmark synthesis

    • Statistical simulation


Conclusion two contributions

Conclusion: two contributions

  • BenchMaker

    • Abstract workload model

    • Generates proxies for real-life benchmarks

    • High accuracy

  • StressMaker

    • Automated stressmark generation

    • Case studies: max-power, max single-cycle power, dI/dt, thermal hotspots, etc.


Future work

Future work

  • Compare StressMaker against hand-crafted stressmarks

  • Fine-tune abstract workload model

    • Bit toggling data values and instruction opcodes

    • Interactions between threads and programs

      • Multi-threaded and multi-core processors


Thank you questions

Thank you. Questions?

Automated Microprocessor Stressmark Generation

Ajay M. Joshi*

Lieven Eeckhout**

Lizy K. John*

Ciji Isen*

*The University of Texas at Austin

**Ghent University, Belgium


  • Login