lifetime reliability aware task allocation and scheduling for mpsoc platforms n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Lifetime Reliability-Aware Task Allocation and Scheduling for MPSoC Platforms PowerPoint Presentation
Download Presentation
Lifetime Reliability-Aware Task Allocation and Scheduling for MPSoC Platforms

Loading in 2 Seconds...

play fullscreen
1 / 29

Lifetime Reliability-Aware Task Allocation and Scheduling for MPSoC Platforms - PowerPoint PPT Presentation


  • 128 Views
  • Uploaded on

Lifetime Reliability-Aware Task Allocation and Scheduling for MPSoC Platforms. Lin Huang, Feng Yuan and Qiang Xu Reliable Computing Laboratory Department of Computer Science & Engineering The Chinese University of Hong Kong DATE’09. Lifetime Reliability of Embedded Multiprocessor Platform.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Lifetime Reliability-Aware Task Allocation and Scheduling for MPSoC Platforms' - chelsea


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
lifetime reliability aware task allocation and scheduling for mpsoc platforms

Lifetime Reliability-Aware Task Allocation and Scheduling for MPSoC Platforms

Lin Huang, Feng Yuan and Qiang Xu

Reliable Computing Laboratory

Department of Computer Science & Engineering

The Chinese University of Hong Kong

DATE’09

lifetime reliability of embedded multiprocessor platform
Lifetime Reliability of Embedded Multiprocessor Platform
  • Multiprocessor system-on-a-chip (MPSoC)
    • Platform-based design
    • Hardware / software co-synthesis
  • Reliability issue
    • IC product wear-out  lifetime reliability threats
      • Time dependent dielectric breakdown (TDDB), electromigration (EM), stress migration (SM), negative bias temperature instability (NBTI)
    • Soft errors
prior work
Prior Work
  • Prior work in reliability-driven task allocation and scheduling
    • Constant failure rate
  • Limitation of thermal-aware task scheduling
    • Might improve the system’s lifetime reliability implicitly
    • Not readily applicable, especially for heterogeneous MPSoC
problem motivation example

MPSoC Platform

P1

P2

Problem Motivation Example
  • Electromigration
  • Suppose , and all other

parameters are the same

  • P1 ages much faster than P2,

dominating the MPSoC lifetime

problem formulation

T0

Task

Graph

T1

MPSoC Platform

T2

P1

P2

T3

T4

P1

Periodical

Schedule

T2

T4

T0

T1

T3

P2

Problem Formulation
  • Task allocation and scheduling
  • Output
  • Aim: to maximize the expected service life (mean time to failure, MTTF) of the MPSoC system under the performance constraint

Binding &

Scheduling

lifetime reliability estimation

Temperature

Variation

Example

Lifetime Reliability Estimation
  • Electromigration
  • Denote by the reliability of a single processor at time
  • Expected service life
  • Weibull distribution

Computed by existing hard error models

Reflect some important factors (e.g., architecture properties)

main approach simulated annealing

P1

Periodical

Schedule

T2

T4

P2

T0

T1

T3

Main Approach– Simulated Annealing
  • Solution representation
    • (schedule order sequence; resource assignment sequence)
      • For example, (0, 1, 3, 2, 4; P2, P2, P2, P1, P1)
    • Schedule order sequence: partial order defined by task graph
    • Every solution corresponds to a feasible schedule
  • Schedule Reconstruction
main approach simulated annealing1
Main Approach– Simulated Annealing
  • Transforms of directed acyclic graph
    • Expanded task graph
    • Undirected complement graph
  • Lemma: Given a valid schedule order , swapping adjacent nodes leads to another valid schedule order, provided there is an edge between these two nodes in the complement graph

T0

T1

T0

T1

T0

T1

T2

T3

T4

T2

T3

T4

T2

T3

T4

Task Graph

Expanded Task Graph

Complement Graph

main approach simulated annealing2
Main Approach– Simulated Annealing
  • Theorem: Starting from a valid schedule order we are able to reach any other valid schedule order

after finite times of adjacent swapping

    • For example

3

2

0

4

1

0

2

3

4

1

2

0

3

4

1

2

0

3

1

4

T0

T1

T0

T1

T0

T1

T2

T3

T4

T2

T3

T4

T2

T3

T4

Task Graph

Expanded Task Graph

Complement Graph

main approach simulated annealing3
Main Approach– Simulated Annealing
  • Moves
    • M1: Swap two adjacent nodes in both schedule order sequence and resource assignment sequence, if there is an edge between these two nodes in the complement graph
    • M2: Swap two adjacent nodes in resource assignment sequence
    • M3: Change the resource assignment of a task

T0

T1

T0

T1

T0

T1

T2

T3

T4

T2

T3

T4

T2

T3

T4

Task Graph

Expanded Task Graph

Complement Graph

main approach simulated annealing4
Main Approach– Simulated Annealing
  • Three moves are defined, so that
    • Starting from a valid schedule order A, we are able to reach any other valid schedule order B after finite times of adjacent swapping
  • Cost function
    • First term guarantees a schedule meet all tasks’ deadlines
    • Second term indicates the system lifetime

Significant large

main approach simulated annealing5
Main Approach– Simulated Annealing
  • Key problem: Computation time
  • Source of time overhead
    • Run temperature simulator EVERY TIME

we reach a new solution

      • Simulator is called 3×105 times
    • Every time trace the temperature variation

for entire service life

      • In range of years
    • Accurate calculation requires fine-

grained variation trace file

      • Significant / within very short time
  • An efficient cost computation strategy is essential!

SA parameters

revisit system lifetime reliability estimation speedup i
Revisit System Lifetime Reliability Estimation – Speedup I
  • It will be better if we are able to compute MTTF by tracing the temperature variation of only one period
revisit system lifetime reliability estimation speedup i2
Revisit System Lifetime Reliability Estimation – Speedup I

Given

Aging effect in one period

Property: does not vary from period to period

This property enables us to tracethe temperature variation of only ONE period

revisit system lifetime reliability estimation speedup i3
Revisit System Lifetime Reliability Estimation – Speedup I

The expected service life of one processor is

Provided no redundant processors in the system, expected service life of entire system is

revisit system lifetime reliability estimation speedup ii
Revisit System Lifetime Reliability Estimation – Speedup II
  • Given
  • Instead of computing the

aging effect in every period,

we propose to compute the

aging effect of periods at

one time

revisit system lifetime reliability estimation speedup iii
Revisit System Lifetime Reliability Estimation – Speedup III
  • Accurate calculation requests setting the length of time intervals as very small value
  • Use steady temperature rather than accurate temporal temperature

Temperature

Variation

Example

Task

Schedule

revisit system lifetime reliability estimation speedup iv
Revisit System Lifetime Reliability Estimation – Speedup IV

Need to run temperature simulator every time we reach a new solution

There can be at most kinds of processor usage combinations in task schedules

Given = 3, = 4, we need only 255 times pre-computation, each for a steady temperature

Estimate processors’ temperature for various processor usage combinations in pre-calculation phase only

revisit system lifetime reliability estimation speedup iv1
Revisit System Lifetime Reliability Estimation – Speedup IV

Time slot

The set of under-used processors

The power consumption of the tasks running on these processors

Categorize the tasks into types according to power consumption

E.g.,

Processor index under usage

Task power consumption type

revisit system lifetime reliability estimation speedup iv2
Revisit System Lifetime Reliability Estimation – Speedup IV

Pre-calculate the steady temperature of processor in time slot

The aging effect in unit time in this case is therefore

The aging effect of P1 in this schedule in a period is

revisit system lifetime reliability estimation summary
Revisit System Lifetime Reliability Estimation – Summary

A summary of speedup techniques

Rewrite MTTF expression in terms of aging effect in one period

Compute the aging effect of several periods at one time

Approximate aging effect in one period based on the task changes and using steady temperature

Call temperature estimation simulator in the pre-calculation phase only

The time consumption of pre-calculation can be even reduced

experimental setup
Experimental Setup
  • Random task graphs generated by TGFF
    • Task numbers range from 20 to 260
  • Hypothetical MPSoC platforms
    • Processor core numbers range from 2 to 8
    • Homogeneous / Heterogeneous
  • Take electromigration model in [Goel-IEEEPress07] as example
    • Note that, our model also applied to other failure mechanisms
  • Compare our method with a thermal-aware task scheduling algorithm proposed in [Xie-JVLSISP06]
accuracy
Accuracy
  • Comparison between approximated MTTF and accurate value
lifetime reliability of various platforms with various task graphs
Lifetime Reliability of Various Platforms with Various Task Graphs

Δ: Difference ratio between MTTF of simulated annealing and that of thermal aware

DR: Deadline Relaxation

efficiency
Efficiency
  • The simulated annealing process requests 50-200s of CPU time on Intel(R) Core(TM) 2 CPU 2.13GHz for each case
    • 4 processors 49 tasks – 84s
    • 8 processors 101 tasks – 158s
  • The CPU time spending on pre-calculation ranges from 3s to 160s
conclusion
Conclusion
  • Technology advancement has brought with adverse impact of on lifetime reliability of MPSoC embedded systems
  • Prior work on task allocation and scheduling does not explicitly take wearout failure into account
  • We propose an analytical model to estimate the lifetime reliability of multiprocessor platforms under periodical tasks
  • We present a novel lifetime reliability-aware algorithm based on simulated annealing technique
  • We propose several speedup techniques to simplify the design space exploration process with satisfactory solution quality
  • Experimental results demonstrate the effectiveness
lifetime reliability aware task allocation and scheduling for mpsoc platforms1
Lifetime Reliability-Aware Task Allocation and Scheduling for MPSoC Platforms

Thank you for your attention !