Reliability-Aware Power Management
This presentation is the property of its rightful owner.
Sponsored Links
1 / 18

Klaus Waldschmidt J. W. Goethe-University Technische Informatik Frankfurt am Main, Germany PowerPoint PPT Presentation


  • 108 Views
  • Uploaded on
  • Presentation posted in: General

Reliability-Aware Power Management Of Multi-Core Systems (MPSOCs). Klaus Waldschmidt J. W. Goethe-University Technische Informatik Frankfurt am Main, Germany [email protected] Agenda. Multi-Core embedded systems and Multi-Core platforms in future.

Download Presentation

Klaus Waldschmidt J. W. Goethe-University Technische Informatik Frankfurt am Main, Germany

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Klaus waldschmidt j w goethe university technische informatik frankfurt am main germany

Reliability-Aware Power Management

Of Multi-Core Systems (MPSOCs)

Klaus Waldschmidt

J. W. Goethe-University

Technische Informatik

Frankfurt am Main, Germany

[email protected]


Klaus waldschmidt j w goethe university technische informatik frankfurt am main germany

Agenda

  • Multi-Core embedded systems and

  • Multi-Core platforms in future

digitalHardware

(reconfigurable)

analogHardware

Software

Reliability

Reliability

and Power-

Management

  • Problems:

  • Performance: Algorithms, programming model

  • Power Management: Energy reduction

  • Reliability: Increase of lifespan and robustness

Perform-

ance

Power

Manage-

ment


Klaus waldschmidt j w goethe university technische informatik frankfurt am main germany

Power Management

  • Static and dynamic power management

  • Dynamic power management:

    • Reacts dynamically to workload variation

    • Scales the power consumption of the system and/or system parts with

      • Frequency scaling

      • Dynamic voltage scaling (dynamic power)

      • Adaptive Body Biasing (leakage power)

      • Clock-gating

      • Supply shutdown


Klaus waldschmidt j w goethe university technische informatik frankfurt am main germany

From: K.Mihic, T. Simunic, G. de Micheli: „Reliability and

Power management of integrated systems, DSD ‘04

Power Management and Reliability

  • The reliability of a digital system is affected by power management in two ways:

    • It tends to lower the system’s temperature Reliability increases

    • It introduces thermal cycling Reliability decreases

  • de Micheli et al. investigated the effects of power management on the long-term reliability of microprocessors

  • Simulations of power-managed and non power managed cores with small featuresize show a decline in reliability forpower-managed systems

Reliability Aware Power Management


Klaus waldschmidt j w goethe university technische informatik frankfurt am main germany

Power Management and Dynamic

Parallelism

  • Power management for multicore systems is more sophisticated than for single cores:

  • The required performance depends on the parallelizability of the task(s) running on the system

  • To reduce power consumption of the system, cores can be

    • put to lower frequency modes or

    • put to sleep mode or

    • switched off

  • To increase the performance of the system, cores can be

    • put to higher frequency modes or

    • woken up from sleep mode or

    • switched on

  • A system which is able to control its performance and power consumption according to the parallelizability of tasks has to support

dynamic workload distribution

dynamic adding and removing of cores


Klaus waldschmidt j w goethe university technische informatik frankfurt am main germany

Comm. ?

Communication

Distribution

Adaptivity

Virtual

Machine

# cores

heterogeneity

The Self Distributing Virtual Machine

(SDVM)

Application to be run on heterogeneous hardware (MPSOCs and reconfigurable HW)

The SDVM as a middleware between application and hardware

Application runs transparently distributed on several sites

application

site

application

SDVMdaemon

application

SDVMdaemon

SDVM

Core A

Hardware A

Core B

Hardware B

network


Klaus waldschmidt j w goethe university technische informatik frankfurt am main germany

The SDVM as a middleware for MPSOCs

besides computer clusters and grid computing, the SDVM

targets also multicore chips and SOCs in future projects

FPGA

multicore chip

middleware for several processors

increase number of sites if needed

LFM

HFM

HFM

HFM

LFM

OFF

use available space on the FPGA

implement special functionality on the FPGA

reconfigure at runtime

HFM

HFM

SLP

HFM

SLP

OFF

HFM: high frequency mode

LFM: low frequency mode

SLP: sleep mode

OFF: off

processor

HW function


Klaus waldschmidt j w goethe university technische informatik frankfurt am main germany

Modeling of Reliability Aware

Power Management for Multicores

  • We investigated different power management strategies for multicore systems with dynamic workload distribution

  • The cores are assumed to offer four different PM-states:

    • HFM (high frequency mode)

    • LFM (low frequency mode)

    • SLEEP

    • OFF

  • three different power management policies were considered:

    • fast-upgrade – tries to optimize performance (represents usual power management

    • low temperature – tries to minimize temperature

    • smooth temperature – tries to minimize thermal cycling

  • The simulations were performed using the SDVM with four cores


Klaus waldschmidt j w goethe university technische informatik frankfurt am main germany

no

average

workload > MAX ?

average

workload < MIN ?

no

yes

yes

cores in SLEEP-

mode or OFF-mode present ?

cores in HF-mode

present ?

no

yes

yes

no

cores in LF-mode

present, which

haven’t executed

applications for

more than T sec.?

cores in LF-mode

present ?

no

no

yes

yes

Switch all cores in LF-mode to HF-mode.

Among those, choose

core with highest

temperature for tran-sition to LF-mode.

Among those, choose

core with lowest

temperature for tran-sition to HF-mode.

Among those, choose

most unengaged core for transition to SLEEP-mode.

The fast upgrade policy


Klaus waldschmidt j w goethe university technische informatik frankfurt am main germany

Example run - fast upgrade policy

  • One core always in HF-mode  high temperature of core 1

  • Maximum temperature 86°C

  • The temperature TJ of a core is determined out of its power consumption by the formula


Klaus waldschmidt j w goethe university technische informatik frankfurt am main germany

no

no

average

workload > MAX ?

cores in HF-mode with temperature >TEMPMAX present?

average

workload < MIN ?

yes

yes

no

yes

cores in SLEEP-

mode or OFF-mode present ?

cores in HF-mode

present ?

no

no

yes

yes

average work-load >MAX2 for more than T sec. and cores in LF-mode with temperature <TEMPMAX present ?

put this core to LF-mode

>1cores in LF-mode

present ?

no

no

yes

no

cores in SLEEP-

mode present ?

yes

Among those, choose

core with lowest

temperature for tran-sition to LF-mode.

yes

Among those, choose

core with highest

temperature for tran-sition to LF-mode / resp. SLEEP-mode / resp. OFF-mode.

Among those, choose

core with lowest

temperature for tran-sition to HF-mode.

The low temperature policy


Klaus waldschmidt j w goethe university technische informatik frankfurt am main germany

Example run - low temperature policy

  • thermal cycling with low magnitude but high frequency


Klaus waldschmidt j w goethe university technische informatik frankfurt am main germany

no

average

workload > MAX ?

average

workload < MIN ?

no

yes

yes

cores in SLEEP-

mode present ?

cores in HF-mode

present ?

no

no

yes

yes

cores in LF-mode

present, which

haven’t executed

applications for

more than T sec.?

cores in LF-mode

present ?

no

no

yes

yes

Among those, choose

core with highest

temperature for tran-sition to LF-mode.

Among those, choose

core with lowest

temperature for tran-sition to LF-mode.

Among those, choose

core with highest

temperature for tran-sition to HF-mode.

Among those, choose

most unengaged core for transition to SLEEP-mode.

The smooth temperature policy


Klaus waldschmidt j w goethe university technische informatik frankfurt am main germany

Example run - smooth temperature policy

  • Maximum temperature 86°C

  • thermal cycling with higher magnitude but lower frequency


Klaus waldschmidt j w goethe university technische informatik frankfurt am main germany

Reliability and Temperature

  • The correlation of reliability and temperature is based on the Arrhenius equation, which gives in terms of mean time to failure (MTTF):

  • The models of the major electrical failure mechanisms are based on this equation, e.g. for electromigration, we have

  • The effect of thermal cycling on reliability can be modeled by the Coffin-Manson relation, which gives the number Nf of cycles to failure:

  • These formulas were used to determine the acceleration factor (AF) with respect to MTTF resp. Nf to compare the three PM-policies to the non-powermanaged case.


Klaus waldschmidt j w goethe university technische informatik frankfurt am main germany

Results

AFT : Acceleration Factor of Failure due to Temperature

AFTc: Acceleration Factor of Failure due to Thermal cycling

(mean over all cores)


Klaus waldschmidt j w goethe university technische informatik frankfurt am main germany

Conclusion

  • We tried to asses the impact of different DPM-strategies for multi-core systems on the long-time reliability

  • No detailed assumption (structure, feature size,…) were made regarding the cores

  • Failure acceleration due to temperature is more or less similar for the three PM-policies

  • The smooth-temperature policy performs better by a factor of 2.7 regarding acceleration due to thermal cycling, with almost no performance loss compared to fast-upgrade, but less power saving

  • This exhibits a clear trade-off between reliability, performance; and power consumption

Parallelism can be used to optimize this trade-off


Klaus waldschmidt j w goethe university technische informatik frankfurt am main germany

Thank you for

your attention!


  • Login