Meterg measurement based end to end performance estimation technique in qos capable multiprocessors
This presentation is the property of its rightful owner.
Sponsored Links
1 / 23

Jae W. Lee and Krste Asanovic { leejw, krste } @mit PowerPoint PPT Presentation


  • 76 Views
  • Uploaded on
  • Presentation posted in: General

METERG: Measurement-Based End-to-End Performance Estimation Technique in QoS-Capable Multiprocessors. Jae W. Lee and Krste Asanovic { leejw, krste } @mit.edu MIT Computer Science and Artificial Intelligence Lab April 5, 2006.

Download Presentation

Jae W. Lee and Krste Asanovic { leejw, krste } @mit

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Meterg measurement based end to end performance estimation technique in qos capable multiprocessors

METERG: Measurement-Based End-to-End Performance Estimation Techniquein QoS-Capable Multiprocessors

Jae W. Lee and Krste Asanovic

{leejw, [email protected]

MIT Computer Science and Artificial Intelligence Lab

April 5, 2006


Multiprocessors are ubiquitous

Our focus: Running soft real-time apps (e.g. media codecs, web services) on general-purpose MPs

Assuming multiprogrammed workloads: real-time + best-effort apps

Problem: Contention for shared resources causes wider variationof a task’s execution time.

Multiprocessors Are Ubiquitous

Embedded CMP

(e.g. ARM MPCore)

Desktops

P1

Pn

Handheld/

Embedded

Shared bus

I/O

MEM

Servers


Execution time variation in mps

For fixed input to fixed task, execution time varies depending on the activities of other processors because of shared memory system.

Execution Time Variation in MPs

Execution

time

Processor

1

Processor

n

Severe

Contention

No

Contention

Shared Bus

Shared

Memory

Shared

Resource

Instance #

of TaskX

i+2

i+4

i+1

i+3

i


Impact of resource contention

Execution time increases by 85 % when number of background processes increases from 0 to 7.

How can we limit the impact of resource contention?

 QoS Support

Impact of Resource Contention

Measure

Execution

Time

Variable Number of

Background Processes

Normalized Execution Time

of memread on P1(a fixed input)

1.85

Processor

2

Processor

1

Processor

8

1.05

1.02

1.00

Shared Bus

Shared

Memory

Shared

Resource

Number of Background Processes


Memory qos reduces mp execution time variation

Memory QoS guarantees per-processor memory bandwidth and latency.

It guarantees minimum performance and distributes unclaimed bandwidth to sharers in order to maximize throughput.

Challenge: How can we find the minimal QoS parametersthat meet a RT task’s given performance goal?

Minimal reservation improves total system throughput.

Memory QoS Reduces MP Execution Time Variation

Range of

Execution Time

Without QoS

With QoS

Instance #

of TaskX

i+1

i+2

i

i+3

i+4


Translating performance goal in user metric into qos parameters

User’s performance goal (user metric) should be translated into QoS parameters (system metric).

User metric: Execution time, Transactions per Second (TPS), Frames per Second (FPS)

System metric: Memory bandwidth, Latency

Example: MP server virtualization

Question: What guarantees can you make to users?

Users prefer user metric (e.g. TPS) to system metric (e.g. Memory bandwidth).

Translating Performance Goal in User Metric into QoS Parameters

Partition

1

Partition

2

Partition

n

  • Minimal QoS parameters are important

    to maximize system throughput.

Multiprocessor System Substrate


Finding minimal resource reservation 1 analysis

Static timing analyses: Tools for WCET analyses

Consist of Hardware modeling + program analysis

[+] Can find minimal resource reservation for all possible inputs

[-] Analysis cost is prohibitive.

Becoming harder for increasing hardware/software complexity in MPs

Possibly overkill for soft RT apps:

Our primary concern is performance degradation from resource contention (not from input data).

We can toleratea small number of deadline violations to maximize overall system throughput.

Finding Minimal Resource Reservation (1): Analysis


Finding minimal resource reservation 2 measurement

Measurement-based Approach

Measures task’s execution time for a certain input set

Motto: “The best model for a system is the system itself.” [Colin-RTSS ’03]

Goal: To find execution time under worst-case contention (=TMAX(x, i)) for given QoS parameter (x) and instruction sequence (i)

Finding Minimal Resource Reservation (2): Measurement

Probability

distribution

(x: QoS parameter)

Range of variation

TMAX(x, i)

Execution Time

for fixed inst seq i

Degree of

Contention

Low ←

→ High


Finding minimal resource reservation 2 measurement1

Measurement-based Approach (Cont)

[+] Easy

[-] Not absolutely safe

Can be practically useful for soft real-time apps

 Can be useful if we are given “near-worst” input set

[-] Measured time varies by degree of contention in runtime

 We will address this problem in the rest of this talk!

Finding Minimal Resource Reservation (2): Measurement

Probability

distribution

(x: QoS parameter)

Range of variation

TMAX(x, i)

Execution Time

for fixed inst seq i

Degree of

Contention

Low ←

→ High


Our approach meterg

METERG QoS System: Improving measurement-based approach

Estimates TMAX(x, i) easily by measurement

Provides hardware support for safe and tightestimation for TMAX(x, i)

Introduces two QoS modes

Deployment mode (operation mode)

Enforcement mode (measurement mode for estimation)

Our Approach: METERG

Probability

distribution

1. Always greater

than TMAX(x, i)

(Safe)

Range of variation

(Deployment)

Range of variation

(Enforcement)

2. Very narrow and

close to TMAX(x, i)

(Tight)

Execution Time

for fixed inst seq i

TMAX(x, i)


Modifying shared blocks to support two qos modes

Now each QoS block supports two operation modes:

1) Deployment mode: Conventional QoS mode (Operation)

QoS parameters are treated as a lower bound of received resources in runtime.

2) Enforcement mode: New QoS Mode (Estimation)

QoS parameters are treated as an upper bound of received resources in runtime.

50

50

Modifying Shared Blocks to Support Two QoS Modes

QoS

Parameter

1) Deployment Mode

2) Enforcement Mode


Assumptions

Single-threaded apps; no shared objects.

No contentions within a node.

(e.g. multithreaded processors)

No preemption of a scheduled task.

Negligible impact of initial states on execution time.

Assumptions


Baseline mp system

Baseline MP System

  • Simple fixed frame-based scheduling for memory access

    • Example (x1=0.2, xn=0.3)

    • xi determines

      • Minimum bandwidth

      • Worst-case latency

Processor

1

Processor

n

x1

xn

Frame (size = 10 Timeslots)

Shared Bus

1

n

n

1

n

Shared

Memory

QoS-capable

Shared block

Not taken:

Given to a requester

by round-robin fashion

Taken

By P1

xi: Fraction of time slots reserved for

Processor i


Runtime usage example in meterg system

Example (x1=0.2)

0.2

0.2

Runtime Usage Example in METERG System

Reservation:

Processor

1

Processor

n

Frame (size = 10 Time Slots)

1

1

x1

xn

Shared Bus

Runtime usage (Deployment):

Shared

Memory

QoS-capable

Shared block

Not reserved

but used

Reserved &

used

Runtime usage (Enforcement):

X

X

X

X

X

X

X

X

Reserved

but unused

Not allowed

to use

Reserved &

used


Obtain minimal resource reservation using meterg

How to find minimal resource reservation with METERG:

In measurement phase:

Step 1: Measure performance in enforcement mode for a QoS parameter.

Step 2: Iterate Step 1 to find a minimal QoS parameter (yet meeting the performance goal).

Step 3: Store the minimal QoS parameter.

In operation phase:

Step 4: Execute the program in deployment mode with the stored QoS parameter for guaranteed performance.

Obtain Minimal Resource Reservation Using METERG


Bandwidth guarantees are not enough an example

Shared bus example again (x1=0.2)

Bandwidth inequality is guaranteed but not latency.

BandwidthDEP (xi) ≥ BandwidthENF (xi) (O)

MAX [LatencyDEP (xi)] ≤ MIN [LatencyENF (xi)] (X)

 May cause end-to-end performance inversion.

Bandwidth Guarantees Are Not Enough: An Example

Deployment Mode

Enforcement Mode

Memory Request

Memory Request

4 Timeslots

2 Timeslots

X

X

X

X

X

X

X

X

Used by other

processors

Not allowed

to use


Two enforcement modes safety tightness tradeoff

Relaxed enforcement (R-ENF) mode: Bandwidth-only guarantees

BandwidthDEP (xi) ≥ BandwidthR-ENF (xi)

There is a (small) chance for observed execution time in enforcement mode to be smaller than TMAX(x, i). (Not Safe)

Strict enforcement (S-ENF) mode: Bandwidth and latency guarantees

BandwidthDEP (xi) ≥ BandwidthS-ENF (xi) &&

MAX[LatencyDEP (xi)] ≤ MIN[LatencyS-ENF (xi)]

Observed execution time is always greater than TMAX(x, i) in enforcement mode as long as there is no timing anomaly in processor.(Safe)

Estimation of TMAX(x, i) is looser.

Two Enforcement Modes: Safety-Tightness Tradeoff


Memory system with strict enforcement mode an implementation

Delay queue

In Deployment mode: Bypassed

In Enforcement mode: Deferring delivery to processor for “lucky” messages

Memory System with Strict Enforcement Mode: An Implementation

METERG

QoS

Memory

block

Processor

1

OpMode1

Mem_Req

Mem_Reply

TMAX(DEP)(x1):

Worst-Case latency

in Deployment Mode

for given param x1

Tactual1:

Actual time taken to

service request 1

Delay Queue

(bypassed in DEP mode)

Processor

n

Data

Timer

V

1

data1

TMAX(DEP)(x1)-Tactual1

1

data2

TMAX(DEP)(x1)-Tactual2

0


Evaluation simulation setup

Simulation Setup

8-way SMP running Linux

In-order core running with 5x faster clock than the system bus

Detailed memory contention model with a simple fixed-frame TDM bus

32 KB unified blocking L1 cache / No L2 cache

Synthetic benchmark: Memread

Infinite loop accessing a large chunk of memory sequentially

Performance bottlenecked by memory system performance

Evaluation: Simulation Setup

Processor

1

Processor

8

Request

Reply

Request

Reply

OpMode8

OpMode1

E D

E D

Network

Interface

(NI)

Shared Bus

Shared

Memory

METERG

QoS block


Evaluation varying number of processes

Evaluation: Varying Number of Processes

Measure P1

(x1 = 0.25)

Variable # of BE Processes

Normalized Execution Time

1.85 (BE)

Processor

2

Processor

1

Processor

8

1.45 (ENF)

1.10 (DEP)

Shared Bus

Shared

Memory

0

3

7

1

Shared

Resource

# of Background Processes

(x1 =0.25)

  • Execution time degradation as number of background processes increases from 0 to 7

    without QoS (Best-Effort): 85 %

    with QoS (Deployment): 10 %

  • Estimated execution time upper bound for given QoS parameter (x1=0.25): 45 %


Evaluation varying qos parameters

Evaluation: Varying QoS Parameters

Measure P1

(Variable x1)

7 BE Background Processes

Normalized Execution Time

Processor

2

Processor

1

Processor

8

2.35 (ENF)

1.51 (DEP)

Shared Bus

1.09

1.01

Shared

Memory

Shared

Resource

QoS Parameter (x1)

  • Performance estimation becomes tighter as QoS parameter (x1) increases.

    • Because the worst-case latency in accessing memory decreases accordingly


Evaluation varying number of qos processes

Performance impact on a QoS process by other QoS processes is negligible (<2%) as long as system is not oversubscribed.

Evaluation: Varying Number of QoS Processes

Estimated

lower bound

for xi=0.25

Normalized IPC*

0.91

0.91

0.90

0.91

0.90

0.89

0.81

(0.69)

(0.54)

0.45

0.37

8-BE

Case

0.27

0.14

1 QoS + 7 BE

2 QoS + 6 BE

3 QoS + 5 BE

4 QoS + 4 BE

QoS

Parameters:

(0.25, 0, 0, 0,

0, 0, 0, 0)

(0.25, 0.25, 0, 0,

0, 0, 0, 0)

(0.25, 0.25, 0.25, 0,

0, 0, 0, 0)

(0.25, 0.25, 0.25, 0.20,

0, 0, 0, 0)

(* Higher Number = Better Performance)


Conclusion

In general-purpose MPs, shared resource contention causes large variation of execution time of a task.

(~2x increase observed in 8-processor case)

METERG QoS System provides an easy way (by measurement) to estimate execution time under worst-case resource contention for a given QoS parameter.

Introducing a new QoS mode (Enforcement Mode) for performance estimation

Using METERG QoS System, minimal QoS parameter can be easily found for given instruction sequence and performance goal.

Conclusion


  • Login