dynamic feedback an effective technique for adaptive computing n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Dynamic Feedback: An Effective Technique for Adaptive Computing PowerPoint Presentation
Download Presentation
Dynamic Feedback: An Effective Technique for Adaptive Computing

Loading in 2 Seconds...

play fullscreen
1 / 41

Dynamic Feedback: An Effective Technique for Adaptive Computing - PowerPoint PPT Presentation


  • 152 Views
  • Uploaded on

Dynamic Feedback: An Effective Technique for Adaptive Computing. Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara http://www.cs.ucsb.edu/~{pedro,martin}. Basic Issue: Efficient Implementation of Atomic Operations in Object-Based Languages

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Dynamic Feedback: An Effective Technique for Adaptive Computing' - alexandra


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
dynamic feedback an effective technique for adaptive computing

Dynamic Feedback:An Effective Techniquefor Adaptive Computing

Pedro Diniz and Martin Rinard

Department of Computer Science

University of California, Santa Barbara

http://www.cs.ucsb.edu/~{pedro,martin}

slide2

Basic Issue:

Efficient Implementation of Atomic Operations in Object-Based Languages

Approach:

Reduce Lock Overhead by

Coarsening Lock Granularity

Problem:

Coarsening Lock Granularity

May Reduce

Available Concurrency

solution dynamic feedback
Solution: Dynamic Feedback
  • Multiple Lock Coarsening Policies
  • Dynamic Feedback
    • Generate Multiple Versions of Code
    • Measure Dynamic Overhead of Each Policy
    • Dynamically Select Best Version
  • Context
    • Parallelizing Compiler
      • Irregular Object-Based Programs
      • Pointer-Based Data Structures
    • Commutativity Analysis
talk outline
Talk Outline
  • Lock Coarsening
  • Dynamic Feedback
  • Experimental Results
  • Related Work
  • Conclusions
model of computation
Model of Computation

Atomic

Operations

  • Parallel Programs
    • Serial Phases
    • Parallel Phases

Serial

Phase

Parallel

Phase

Serial

Phase

  • Atomic Operations on Shared Objects
    • Mutual Exclusion Locks
    • Acquire Constructs
    • Release Constructs

L.acquire()

L.release()

Mutual Exclusion

Region

problem lock overhead
Problem: Lock Overhead

L.acquire()

L.release()

L.acquire()

L.release()

solution lock coarsening

L.acquire()

L.release()

L.acquire()

L.release()

L.acquire()

L.release()

Solution: Lock Coarsening

Original

After Lock Coarsening

Reference: Diniz and Rinard

“Synchronization Transformations for Parallel Computing”, POPL97

lock coarsening trade off
Lock Coarsening Trade-Off
  • Advantage:
    • Reduces Number of Executed Acquires and Releases
    • Reduces Acquire and Release Overhead
  • Disadvantage: May Introduce False Exclusion
    • Multiple Processors Attempt to Acquire Same Lock
    • Processor Holding the Lock is Executing Code that was Originally in No Mutual Exclusion Region
false exclusion

L.acquire()

L.release()

L.acquire()

L.release()

L.acquire()

L.release()

L.acquire()

L.release()

L.acquire()

L.release()

False

Exclusion

False Exclusion

Original

After Lock Coarsening

lock coarsening policy
Lock Coarsening Policy

Goal:

Limit Potential Severity of False Exclusion

Mechanism:

Multiple Lock Coarsening Policies

  • Original: Never Coarsen Granularity
  • Bounded: Coarsen Granularity Only Within

Cycle-Free Subgraphs of ICFG

  • Aggressive: Always Coarsen Granularity
choosing best policy
Choosing Best Policy
  • Best Lock Coarsening Policy May Depend On
    • Topology of Data Structures
    • Dynamic Schedule Of Computation
  • Information Required to Choose Best Policy Unavailable at Compile Time
  • Complications
    • Different Phases May Have Different Best Policy
    • In Same Phase, Best Policy May Change Over Time
solution dynamic feedback1

Code

Version

Original

Bounded

Aggressive

Aggressive

Original

Overhead

Time

Sampling Phase

Production Phase

Sampling Phase

Solution: Dynamic Feedback
  • Generated Code Executes
    • Sampling Phases: Measure Performance of Different Policies
    • Production Phases : Use Best Policy From Sampling Phase
  • Periodically Resample to Discover Best Policy Changes
guaranteed performance bounds
Guaranteed Performance Bounds
  • Assumptions:
    • Overhead Changes Bounded by Exponential Decay Functions
  • Worst Case Scenario:
    • No Useful Work During Sampling Phase
    • Sampled Overheads Are Same For All Versions
    • Overhead of Selected Version Increases at Maximum Rate
    • Overhead of Other Versions Decreases at Maximum Rate

Overhead

V0

Time

S

S

S

P

guaranteed performance bound

T

T

T

Work - Work Š T 

i

j

i

Work = 1P+SN (1 - o1(t)) dt

P+SN

P

P+SN

Work - Work Š (P+SN) 

opt

0

opt

Guaranteed Performance Bound

Definition 1. Policy p is at Most  Worse Than Policy p over a Time Interval T if

i

j

Work = 0T (1 - oi(t)) dt

where

Definition 2. Dynamic Feedback is at Most  Worse Than the Optimal if

where

Result 1. To Guarantee this Bound

(1 - ) P + (1/) e(-P) Š (- 1) SN + (1/)

guaranteed performance bounds1
Guaranteed Performance Bounds

(1 - ) P + (1/) e(-P)

(- 1) SN + (1/)

Constraint Values

Feasible

Region

Production Interval P

Production Interval Too Short:

Unable to Amortize Sampling Overhead

Production Interval Too Long:

May Execute Suboptimal Policy for Long Time

Basic Constraint:

Decay Rate () Must be Small Enough

dynamic feedback implementation
Dynamic Feedback: Implementation
  • Code Generation
  • Measuring Policy Overhead
  • Interval Selection
  • Interval Expiration
  • Policy Switch
code generation
Code Generation
  • Statically Generate Different Code Versions for Each Policy
    • Alternative: Dynamic Code Generation
  • Advantages of Static Code Generation:
    • Simplicity of Implementation
    • Fast Policy Switching
  • Potential Drawback of Static Code Generation
    • Code Size (In Practice Not a Problem)
measuring policy overhead
Measuring Policy Overhead
  • Sources of Overhead
    • Locking Overhead
    • Waiting Overhead
  • Compute Locking Overhead
    • Count Number of Executed Acquire/Release Constructs
  • Estimate Waiting Overhead
    • Count Number of Spins on Locks Waiting to be Released

(

(

)

)

Number

of Spins

Number of

Acquire/Release

Acquire/Release

Execution Time

x

+

x

Spin Time

Sampled Overhead =

Sampling Time

interval selection and expiration
Interval Selection and Expiration
  • Fixed Interval Values
    • Sampling Interval: 10 milliseconds
    • Production Interval: 10 seconds
    • Good Results for Wide Range of Interval Values
  • Polling Code for Expiration Detection
    • Location: Back Edges of Parallel Loop
    • Advantage: Low Overhead
    • Disadvantage: Potential Interaction with Iteration Size

Atomic

Operations

Polling

Points

policy switch
Policy Switch
  • Synchronous
    • Processors Poll Timer to Detect Interval Expiration
    • Barrier At End of Each Interval
  • Advantages:
    • Consistent Transitions
    • Clean Overhead Measurements
  • Disadvantages:
    • Need to Synchronize All Processors
    • Potential Idle Time At Barrier
experimental results
Experimental Results
  • Parallelizing Compiler Based on Commutativity Analysis [PLDI’96]
  • Set of Complete Scientific Applications
    • Barnes-Hut N-Body Solver (1500 lines of C++)
    • Liquid Water Simulation Code (1850 lines of C++)
    • Seismic Modeling String Code (2050 lines of C++)
  • Different Lock Coarsening Policies
  • Dynamic Feedback
  • Performance on Stanford DASH Multiprocessor
code sizes

60

60

60

Dynamic

Dynamic

Original

Original

40

40

40

Dynamic

Serial

Serial

Size Text Segment (Kbytes)

Size Text Segment (Kbytes)

Original

Size Text Segment (Kbytes)

Serial

20

20

20

0

0

0

Barnes-Hut

Water

String

Code Sizes
lock overhead

60

40

Original

Percentage Lock Overhead

20

Bounded

Aggressive

0

Barnes-Hut

(16K Particles)

Lock Overhead

Percentage of Time that the Single Processor Execution Spends Acquiring and Releasing Mutual Exclusion Locks

60

60

40

40

Percentage Lock Overhead

Percentage Lock Overhead

20

20

Original

Bounded

Original

Aggressive

0

0

Aggressive

String

(Big Well Model)

Water

(512 Molecules)

contention overhead

Aggressive

Bounded

Original

Contention Overhead

Percentage of Time that Processors Spend Waiting to Acquire Locks Held by Other Processors

100

100

100

75

75

75

50

50

50

Contention Percentage

25

25

25

0

0

0

0

4

8

12

16

0

4

8

12

16

0

4

8

12

16

Processors

Processors

Processors

Barnes-Hut

(16K Particles)

Water

(512 Molecules)

String

(Big Well Model)

performance results barnes hut

Ideal

Performance Results: Barnes-Hut

16

Aggressive

Dynamic

12

Feedback

Bounded

Speedup

8

Original

4

0

0

4

8

12

16

Number of Processors

Barnes-Hut on DASH

(16K Particles)

performance results water

Ideal

Bounded

Dynamic

Feedback

Original

Aggressive

Performance Results: Water

16

12

Speedup

8

4

0

0

4

8

12

16

Number of Processors

Water on DASH

(512 Molecules)

performance results string

Ideal

Original

Dynamic

Feedback

Aggressive

Performance Results: String

16

12

Speedup

8

4

0

0

4

8

12

16

Number of Processors

String on DASH

(Big Well Model)

summary
Summary
  • Code Size Is Not An Issue
  • Lock Coarsening Has Significant Performance Impact
  • Best Lock Coarsening Policy Varies With Application
  • Dynamic Feedback Delivers Code With Performance Comparable to The Best Static Lock Coarsening Policy
related work
Related Work
  • Adaptive Execution Techniques (Saavedra Park:PACT96)
  • Dynamic Dispatch Optimizations (Hölzle Ungar:PLDI94)
  • Dynamic Code Generation (Engler:PLDI96)
  • Profiling (Brewer:PPoPP95)
  • Synchronization Optimizations (Plevyak et al:POPL95)
conclusions
Conclusions
  • Dynamic Feedback
    • Generated Code Adapts to Different Execution Environments
  • Integration with Parallelizing Compiler
    • Irregular Object-Based Programs
    • Pointer-Based Linked Data Structures
    • Commutativity Analysis
  • Evaluation with Three Complete Applications
    • Performance Comparable to Best Hand-Tuned Optimization
performance results barnes hut1

16

Ideal

14

Aggressive

Bounded

12

Original

10

8

6

4

2

0

0

2

4

6

8

10

12

14

16

Number of Processors

Performance Results : Barnes-Hut

Speedup

Barnes-Hut (16K Particles)

performance results water1

16

Ideal

Bounded

14

12

Original

Aggressive

10

Speedup

8

6

4

2

0

0

2

4

6

8

10

12

14

16

Number of Processors

Performance Results: Water

Water (512 Molecules)

performance results string1

16

Ideal

14

Original

12

Aggressive

10

8

Speedup

6

4

2

0

0

2

4

6

8

10

12

14

16

Number of Processors

Performance Results: String

String (Big Well Model)

policy switch1
Policy Switch

Timer

Expires

Policy 1

Timer

Expires

Policy 2

motivation
Motivation

Challenges:

  • Match Best Implementation to Environment
  • Heterogeneous and Mobile Systems

Goal:

  • Develop Mechanisms to Support Code that Adapts to Environment Characteristics

Technique:

  • Dynamic Feedback
overhead for barnes hut
Overhead for Barnes-Hut

0.5

0.4

Original

0.3

Sampled Overhead

Bounded

0.2

0.1

Aggressive

0

0

5

10

15

20

25

Execution Time (Seconds)

Barnes-Hut on DASH (8 Processors)

FORCES Loop

Data Set - 16K Particles

overhead for water

0.5

0.4

0.3

Sampled Overhead

0.2

Original

0.1

Bounded

0

0

10

20

30

40

50

60

Execution Time (Seconds)

Overhead for Water

Water on DASH (8 Processors)

INTERF Loop

Data Set - 512 Molecules

overhead for water1

1

Aggressive

0.8

0.6

Sampled Overhead

0.4

0.2

Original

0

0

10

20

30

40

50

60

Execution Time (Seconds)

Overhead for Water

Water on DASH (8 Processors)

POTENG Loop

Data Set - 512 Molecules

overhead for string

1

Aggressive

0.8

0.6

Sampled Overhead

0.4

0.2

Original

0

0

100

200

300

400

500

Execution Time (Seconds)

Overhead for String

String on DASH (8 Processors)

PROJFWD Loop

Data Set -Big Well

dynamic feedback

Code

Version

Aggressive

Bounded

Original

Aggressive

Overhead

Time

Sampling Phase

Production Phase

Sampling Phase

Dynamic Feedback