Time predictable execution of embedded software on multi core platforms
This presentation is the property of its rightful owner.
Sponsored Links
1 / 45

Time-Predictable Execution of Embedded Software on Multi-core Platforms PowerPoint PPT Presentation


  • 124 Views
  • Uploaded on
  • Presentation posted in: General

Time-Predictable Execution of Embedded Software on Multi-core Platforms. Sudipta Chattopadhyay under the guidance of A/P Abhik Roychoudhury. Embedded Systems. Real-time Constraints. Hard real-time. Embedded system. Soft real-time. Timing Analysis.

Download Presentation

Time-Predictable Execution of Embedded Software on Multi-core Platforms

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Time predictable execution of embedded software on multi core platforms

Time-Predictable Execution of Embedded Software on Multi-core Platforms

SudiptaChattopadhyay

under the guidance of

A/P AbhikRoychoudhury


Embedded systems

Embedded Systems


Real time constraints

Real-time Constraints

Hard real-time

Embedded

system

Soft real-time


Timing analysis

Timing Analysis

  • Hard real time systems require absolute timing guarantees

    • System level analysis

    • Single task analysis

  • Worst case execution time (WCET) analysis

    • An upper bound on execution time for all possible inputs

    • Sound over-approximation is obtained by static analysis


Wcet analysis

WCET Analysis

WCET

of basic

blocks

Infeasible

path

constraints

Program

WCET

bound

Micro-architectural

modeling

Loop

bound

Control

flow graph

constraints

Path analysis


Architecture

Architecture

Core 1

Core n

L1 cache

L1 cache

Shared bus

Resource

sharing

Shared L2 cache

Memory


Overview

Overview

Instr. accesses

Data accesses

Shared cache

+

shared bus

A multi-core

WCET tool

Shared cache

Core 1

Core n

L1 instruction cache

L1 data cache

Unified cache

Processor

L1 cache

L1 cache

L2 unified cache

Dissertation work

(Time-predictable execution in multi-core)

Shared bus

Resource

sharing

Bus

Shared L2 cache

Conflicts with different instruction and data memory blocks

Main Memory

Cache related

preemption delay

analysis

Shared scratchpad

allocation

Coherence miss

modeling

Memory


Micro architectural modeling

Micro-architectural Modeling

branch

predictor

shared cache

cache

pipeline

shared bus

Single Core

Multi Core


Comparison

Comparison


Imprecision in abstract interpretation

Imprecision in Abstract Interpretation

p1

p2

young

a

b

young

b

x

Cache state = C2

Cache state = C1

Abstract

cache set

Abstract

cache set

Joined Cache state = C3

Joined cache state

b

Path p1 or path p2?

Joined cache state loses information about path p1 and p2


Model checking alone

Model Checking alone ?

  • A path sensitive search

    • Path sensitive search is expensive – path explosion

    • Worse, combined with possible cache states

p1

p2

Cache state = C2

Cache state = C1


Model checking alone1

Model Checking alone ?

  • A path-sensitive search

    • Path sensitive search is expensive – path explosion

    • Worse, combined with possible cache states

Abstract LRU

cache set

p1

p2

a

b

young

young

b

x

b

young

a

young

x

b

Abstract LRU

cache set

Abstract LRU

cache set

State Explosion


Cache analysis

Cache analysis

WCET

of basic

blocks

All checked

Cache

analysis by

abstract

interpretation

Pipeline

analysis

Analysis

outcome

Infeasible

path

constraints

IPET

Program

Refine by

model checker

Branch predictor

modeling

Loop

bound

Timeout

Micro architectural

modeling

constraints

Refinement by model checker can be terminated at any point

Model checker refinement steps are inherently parallel

Path analysis

Each model checker refinement step checks light assertion property


Refinement inter core

Refinement (Inter-core)

m

start

Conflicting task

Task

x < y

m1

m1

Infeasible

x == y

m2

m2

young

≠m

m

≠m

m

exit

cache

Cache hit

Cache miss

Spurious


Refinement inter core1

Refinement (Inter-core)

start

m

Conflicting task

Task

x < y

C_m++

m1

Increment

conflict

m1

Verified

Infeasible

x == y

m2

C_m++

m2

Increment

conflict

young

m

m

m

exit

cache

assert (C_m <= 1)

A Cache Hit


Refinement why it works

Refinement (Why it works?)

m

x < y

Increment

conflict

C_m++

m’

Conflict to m

m’

Path 2

x == y

m

Does not affect the value of C_m

assert (C_m <= 0)

m

Cache miss

Property


Experimental setup chronos toolkit

Experimental Setup (Chronos Toolkit)

GCC

simplescalar

C source

Binary code

CFG

Micro

architectural

modeling

Flow

constraints

cache

pipeline

Branch

prediction

ILP

WCET

CBMC

Micro-architectural

constraints

C bounded

model checking


Experimental result

Experimental Result


Experimental result1

Experimental Result

WCET

Direct-mapped, 256 bytes

L1 cache

L1 cache

Average time = 70 secs

Shared L2

cache

4-way associative, 8 KB


Extension using symbolic execution

Extension Using Symbolic Execution

unknown

x < y

Conflicting task

x < y

x ≥ y

x < y

C_m++

x = y

x = y

m1

Increment

conflict

m1

NO

x == y

m2

constraint

solver

C_m++

Increment

conflict

m2

x < y ˄ x = y

satisfied

assert (C_m <= 1)

assert (C_m <= 1)

abort


Extension using klee

Extension Using KLEE

GCC

simplescalar

C source

Binary code

CFG

Micro

architectural

modeling

Flow

constraints

cache

pipeline

Branch

prediction

ILP

WCET

CBMC/KLEE

Micro-architectural

constraints


A generic framework

A Generic Framework

  • Three different architectural/application settings

High

priority

Low

priority

Task in

Core 1

Task in

Core 2

Cache

conflict

Cache

conflict

Cache

conflict

L1 cache

cache

cache

L1 cache

Intra task

(WCET in single core)

Inter task

(Cache Related

Preemption Delay

analysis)

Shared L2

cache

Inter core

(WCET in multi-core)


Micro architectural modeling1

Micro-architectural Modeling

branch

predictor

shared cache

cache

pipeline

shared bus

Single Core

Multi Core


Task level interference

Task-level interference

T1

T3

Tasks

T2

T2

Core 1

Core n

T1

L1 cache

L1 cache

Shared bus

T2

Timeline

Shared L2 cache

T3

T1

T3

Task interference graph


Shared cache tdma shared bus

Shared Cache + TDMA Shared Bus

Task graphs

Time Division Multiple Access (TDMA)

T1

T3

T1

T3

Core 1

Core 2

Core 1

slot

T2

T4

L1 cache

L1 cache

Shared bus

Core 2

slot

Bus access

L2 miss

due to T2

T4

Shared L2 cache

T2

Disjoint

lifetime

Core 1

slot

WAIT

Bus access

T4

T1

T2

Core 2

slot

T3

T4


Overview of the framework

Overview of the framework

L1 cache

analysis

L1 cache

analysis

Task interference

monotonically decreases

Filter

Filter

L2 cache

analysis

L2 cache

analysis

WCRT

computation

Bus aware

analysis

L2 conflict

analysis

Initial interference

Yes

Interference

changes ?

Estimated

WCRT

No


Evaluation 2 core

Evaluation (2-core)

One core runs statemate another core runs the program under evaluation


Evaluation 4 core

Evaluation (4-core)

Either runs (edn, adpcm, compress, statemate) or runs (matmult, fir, jfdcint, statemate) in 4 different cores


Micro architectural modeling2

Micro-architectural Modeling

branch

predictor

shared cache

Interactions

cache

pipeline

shared bus

Single Core

Multi Core


Timing anomaly shared cache

Timing Anomaly (shared Cache)

hit

miss

miss

miss

hit

hit

miss

hit

miss

hit

miss

hit

miss

hit

miss

hit

May not be the worst case path


Baseline abstraction timing interval

Baseline Abstraction – Timing Interval

  • Representing each pipeline stage as a timing interval

End = Start + cache miss latency interval

start

[1,3]

finish

[3,7]

[4,10]

latency

EX

WB

R1 := R2 + 5

IF

ID

CM

Structural

dependency

CM

IF

ID

EX

WB

EX

WB

CM

IF

ID

R5 := R1 * R7

IF

ID

EX

WB

CM

Contention

IF

ID

EX

WB

CM

R3 := R5 * 5

A fixed-point analysis derives the timing of each stage as an interval


Tdma shared bus analysis

TDMA Shared Bus Analysis

  • Time Division Multiple Access (TDMA)

  • Offset abstraction

Core 0

Core 1

Core 0

Core 1

Core 0

Core 1

Core 0

Core 1

delay = 0

offset

delay

offset

round

round

T’

(core 0)

T

(core 1)


Loop construct

Loop Construct

EX

WB

previous

iteration

IF

ID

CM

CM

IF

ID

EX

WB

EX

WB

CM

current

iteration

IF

ID

IF

ID

EX

WB

CM

How do we define bus context?

Property: If the bus offsets of the cross-iteration edges do not change,

WCET of the loop iteration cannot change


Loop construct1

Loop Construct

Ci = bus context of the loop body at i-th iteration

C1

C2

C3

Bus context flow graph

C4

C5

C5 C3

Property: If Ci Cj, then Ci+k  Cj+k for any k > 0


Loop construct2

Loop Construct

WCET

of basic

blocks

Bus context flow graph

C1

Infeasible

path

constraints

C2

Program

loop bound

ILP

solver

Micro-architectural

modeling

C3

Loop

bound

Control

flow graph

Compute WCET for each bus context

C4

E(C1) = number of times context C1 is executed

Generate linear constraints:

E(C1) + E(C2) + E(C3) + E(C4) ≤ loop bound

E(C1) ≥ E(C2)

constraints

ILP = Integer Linear Programming

Path analysis


Branch prediction cache

Branch prediction + Cache

Cache conflict

Cache

content

m

Branch location

JOIN

m

Maximum number of

speculated instructions

m’

Cache

content

Unclear

cache

access


Experimental setup chronos toolkit1

Experimental Setup (Chronos Toolkit)

GCC

simplescalar

C source

Binary code

CFG

Micro

architectural

modeling

Flow

constraints

Private cache

pipeline

Branch

prediction

ILP

WCET

Shared cache

Shared bus

Micro-architectural

constraints


Evaluation cache pipeline

Evaluation (cache + pipeline)

Core 1

Imprecision of shared

cache analysis

Core 1

Core 2

Core 2

Horizontally partition

Vertically partition

jfdctint

statemate


Evaluation cache pipeline speculation

Evaluation (Cache + pipeline + Speculation)

Imprecision of modeling

speculation


Evaluation bus pipeline

Evaluation (Bus + pipeline)

Imprecision of shared

bus analysis

Imprecision of path

analysis


Recap

Recap

PE-0

PE-1

PE-N

……

c

Shared cache

+

shared bus

A multi-core

WCET tool

Shared cache

Low priority

task

High priority

task

Task

Core 1

Cache

conflict

Core n

Unified cache

SPM-0

SPM-1

SPM-N

Core 1

Core n

L1 data

cache

L1 data

cache

Fast on-chip communication media

Coherence

miss traffic

Dissertation work

(Time-predictable execution in multi-core)

External

Memory

Interface

Stale data items

Shared bus

L1 cache

L1 cache

Shared L2 cache

Shared off-chip data bus

Cache related

preemption delay

analysis

Shared L2 cache

Shared scratchpad

allocation

Coherence miss

modeling

Off-chip memory

Memory


Perspective

Perspective

Time-predictable execution in single-core

Resource sharing

(cache and bus)

Data sharing

(cache coherence)

Time-predictable execution in multi-core

Testing

Static analysis

Customized

hardware

Shared

cache

Shared

bus

Cache

coherence

Shared

scratchpad

ARM Cortex A9 MPCore

Samsung Exynos

Nvidia Tegra II

(smart phones)

Time Division

Multiple Access

Aethreal Network-on-chip

Sony PSP

IBM Cell


Perspective1

Perspective

Functionality Verification

Quantitative Verification

Concrete domain

Concrete domain

Abstract

domain in

abstract

Interpretation (AI)

Abstraction

Anytime

Verification

of

Quantitative

properties

SLAM

(Microsoft)

BLAST

(UC Berkley)

Property

AI

May be

spurious

MAGIC

(CMU)

Verifier

Spurious

counter

example

Generate Quantitative property

Refinement

Path-sensitive Verification

Verified

Abstraction

refinement


Future work

Future Work

Static performance

analysis

+ testing

Symbolic Execution

x < y

Performance

testing

x < y

x ≥ y

x < y

x = y

x = y

Mobile devices

m1

x == y

Energy analysis of software

x < y ˄ x ≠ y

m2

Input

abort

Battery life

Energy-aware software testing

(Quantitative property

e.g. cache conflict)

assert (C_m <= 1)


Thank you

Thank You

My sincere thanks to all the Examiners and especially the anonymous Examiner 1 for his comment on symbolic execution


  • Login