Assisting technologies for program parallelization
This presentation is the property of its rightful owner.
Sponsored Links
1 / 42

Assisting technologies for program parallelization PowerPoint PPT Presentation


  • 105 Views
  • Uploaded on
  • Presentation posted in: General

Assisting technologies for program parallelization. Chikayama/Taura Lab. Masakazu HAYATSU [email protected] Agenda. Introduction Difficulty of Program Parallelization Assistant Tools for Program Parallelization SUIF Explorer S-Check Ursa Minor Conclusion. Introduction.

Download Presentation

Assisting technologies for program parallelization

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Assisting technologies for program parallelization

Assisting technologies for program parallelization

Chikayama/Taura Lab.

Masakazu HAYATSU

[email protected]


Agenda

Agenda

  • Introduction

  • Difficulty of Program Parallelization

  • Assistant Tools for Program Parallelization

    • SUIF Explorer

    • S-Check

    • Ursa Minor

  • Conclusion


Introduction

Introduction

  • Popularization of parallel computer

    • Commercial computer with very large # of processor

    • Low-end PC with 2-4 processor

  • Performance

    • Progress of speedup of uni-processor is getting sluggish

⇒Importance of a parallel program is increasing further


Difficulty of program parallelization

Difficulty of Program Parallelization

  • Dependency

    • dead lock

    • data race

  • Avoid these problem

A

A

B

B

100

100?

1?

X

1


Automatic parallelization

×

?

Automatic Parallelization

  • Low performance

    • Parallelization technique is fragile

    • Knowledge out of code is often required

:

for(i=0; i<N; i++){

a[f(i)] = 0; //A

a[g(i)] = 1; // B

}

:


Development process

Development Process

Design & Improve Model

Manually Optimizing Program

Run

Data Race,

Dead Lock …

Speedup Evaluation

Validity Check

×

Done

Finding Problems


Problem of manual parallelization

(define (RayTracing ViewPoint Vscan nref energy rgb)

(if (<= nref 4)

(let ((crashed? (tracer ViewPoint Vscan))) ;crashed?

(if (and (not crashed?) (!= nref 0))

(let* ((hl0

(fcsyn (f+ (f* (vector-ref Vscan 0) (vector-ref Light 0))

(f* (vector-ref Vscan 1) (vector-ref Light 1))

(f* (vector-ref Vscan 2) (vector-ref Light 2)))))

(hl (if (f< hl0 0.0) 0.0 hl0))

(ihl (f* hl hl hl energy (car beam))))

(begin

(vector-set! rgb 0 (f+ (vector-ref rgb 0) ihl))

(vector-set! rgb 1 (f+ (vector-ref rgb 1) ihl))

(vector-set! rgb 2 (f+ (vector-ref rgb 2) ihl)))))

(if crashed?

(let* ((P (cdr crashed?)) ;intersection point

(m (car crashed?)) ;crashed object (NV (Get-NVector m Vscan P)))

(let* ((br (fcsyn (f+ (f* (vector-ref NV 0) (vector-ref Light 0))

(f* (vector-ref NV 1) (vector-ref Light 1))

(f* (vector-ref NV 2) (vector-ref Light 2)))))

(br1 (if (f< br 0.0) 0.0 br))

(bright (if (and (car sh) (Shadow-Check-One-Or-Matrix (car or-Net) P))

0.0 (f* (f+ br1 0.2) energy (vector-ref m 11)))))

(begin(utexture m P)

(vector-set! rgb 0 (f+ (vector-ref rgb 0) (f* bright (vector-ref m 13))))

(vector-set! rgb 1 (f+ (vector-ref rgb 1) (f* bright (vector-ref m 14))))

(vector-set! rgb 2 (f+ (vector-ref rgb 2) (f* bright (vector-ref m 15))))

Problem of Manual Parallelization

  • User must fully understand many lines of code

It is prone tocause an error


Important factor for assistant tool

Important factor for assistant tool

  • Assist for program parallelization

    • Combine the benefit of automatic/manual

      • automatic:can extract information by the numbers

      • manual:can use high level information

    • Extract information, and highlight important information


Extraction of parallelism

Candidate for parallelization

( 0R-05-01, 0R-05-02, 0R-05-03 )

( 0R-0e-01, 0R-0e-02 )

( 0R-0t-02, 0R-0t-03 )

( 0R-0w-01, 0R-0w-02 )

Extraction of parallelism

;; quick : v— array to be sorted left,right— renge for sort

(define (quick v left right)

(if (>= left right) v

(let ((new-left left)

(new-right right)

(pivot (vector-ref v (floor (/ (+ left right) 2)))))

(do () ((> new-left new-right))

(do () ((>= (vector-ref v new-left) pivot))

(set! new-left (+ new-left 1)))

(do () ((<= (vector-ref v new-right) pivot))

(set! new-right (- new-right 1)))

(if (<= new-left new-right)

(begin

(swap v new-left new-right)

(set! new-left (+ new-left 1))

(set! new-right (- new-right 1)))))

(begin (quick v left new-right)

(quick v new-left right)))))

(quick #(4 5 3 1 4 0 5 6 ) 0 7)

;; quick : v— array to be sorted left,right— range for sort

(define (quick v left right)

(if (>= left right) v

(let ((new-left left)

(new-right right)

(pivot (vector-ref v (floor (/ (+ left right) 2)))))

(do () ((> new-left new-right))

(do () ((>= (vector-ref v new-left) pivot))

(set! new-left (+ new-left 1)))

(do () ((<= (vector-ref v new-right) pivot))

(set! new-right (- new-right 1)))

(if (<= new-left new-right)

(begin

(swap v new-left new-right)

(set! new-left (+ new-left 1))

(set! new-right (-new-right 1)))))

(begin (quick v left new-right)

(quick v new-left right)))))

(quick #(4 5 3 1 4 0 5 6 ) 0 7)


Notice

notice

  • Different approach

    • Our work: based on dependency analysis

    • Today’s survey: based on profile data

  • Profile data?

    • Isn't it enough if execution time is known?


Difficulty in tuning a parallel program 1 2

Difficulty in Tuning a Parallel Program (1/2)

100

parallel region

10%

  • Coverage

    • Percentage of total execution time spent in the parallel regions

    • Amdahl’s law

  • Granularity

    • Average length of computation between synchronizations

    • Overhead of communication, synchronization


Difficulty in tuning a parallel program 2 2

Difficulty in Tuning a Parallel Program (2/2)

Top resource-using code segment

  • Critical Path

Simple consumption of resources does not mean that

there is a corresponding potential for improvement


Assistant tool for program parallelization

Assistant Tool for Program Parallelization

  • SUIF Explorer

    • Coverage and Granularity

  • S-Check

    • Effect of change on allover performance

  • Ursa Minor

    • Experienced programmer's knowledge


Assistant tool for program parallelization1

Assistant Tool for Program Parallelization

  • SUIF Explorer

    • Coverage and Granularity

  • S-Check

    • Effect of change on allover performance

  • Ursa Minor

    • Experienced programmer's knowledge


Suif explorer liao et al 1999

SUIF Explorer [Liao, et al 1999]

  • Objective

    • Identify the important loops

  • Rules of thumb

    • Most of a program’s execution time is spent on a small percentage of the code

    • Most of a program’s execution time is spent on loops


The suif explorer system

The SUIF Explorer System

Sequential

Program

2.Collecting profile &

dynamic dependences

Parallelizing

Compiler

1. Automatic

parallelization

Execution

Analyzers

Parallelization

Guru

3.Guidance to improving

program performance

Rivet

Visualizer

User


The parallelization guru 1 2

The Parallelization Guru (1/2)

  • Parallelization guidance

    • The coverage and granularity

      • Updates the information as new loops are parallelized

    • A list of loops to parallelize

      • Sorted in order of execution time

      • Have no I/O and are not nested under some parallel loops

    • Dependence information on each loop


The parallelization guru 2 2

The Parallelization Guru (2/2)

  • User interaction

    • Starts with the loop at the top of the list

    • If (loop have many dependence) user don’t choose to attempt

    • else User then determines

      • if the static dependence can be ignored

      • if an array can be privatized …etc.

      • using program slice


Program slice

program slice

contribute to

the value


The parallelization guru

The Parallelization Guru

  • Comment

    • Performance data & Dependency information are related closely ⇒ it cut down development cost

    • It is applicable only to loops


Assistant tool for program parallelization2

Assistant Tool for Program Parallelization

  • SUIF Explorer

    • Coverage and Granularity

  • S-Check

    • Effect of change on allover performance

  • Ursa Minor

    • Experienced programmer's knowledge


S check snelick 1997

S-Check [Snelick 1997]

  • Objective

    • Identify the parts of the program that changes to them will significantly improve overall performance

  • Effect prediction

    • Determine the effect of changes in the code without actually making the changes


Sensitive checker

Sensitive Checker

  • Insert “delay” into segments of a parallel program, calculate sensitivity to perturbation

  • Assumption

    • A program code segment ishighly sensitive to slight perturbations ⇒ comparable segment improvements will boost performance correspondingly


Program model

Program Model

  • Code = Transfer Function

    • Taylor expansion

      • βj := indicating how sensitive execution is

      • βi,j := interactions between code


Assisting technologies for program parallelization

・・・・・・

delay(0)

・・・・・・

delay(0)

・・・・・・

delay(0)

・・・・・・

delay(1)

・・・・・・

delay(0)

・・・・・・

delay(0)

・・・・・・

delay(0)

・・・・・・

delay(1)

・・・・・・

delay(0)

・・・・・・

delay(1)

・・・・・・

delay(1)

・・・・・・

delay(0)

・・・

original parallel program

while(x>y){ // A

delay(a);

}

delay(b); send(…); // B

・・・・・・

do_computation{delay(c); …}; // C

while(x>y){

}

send(…);

・・・・・・

do_computation{…};

Insert delays

1:ON / 0:OFF

Mark possible

bottlenecks

// A

// B

// C

Generate & Run

numerous versions

of program

EffectsSource

0.44 A

4.54 B

0.07 AB

1.21 C

0.02 BC

0.34 AC

0.00 ABC

Analyze Results

Solve for Effects


User interact 1 3

UserInteract (1/3)

  • Test code locations are selected manually or automatically

  • Information provided from profiler

    • programming constructs (ex. while, for)

    • certain library function call (ex. barrier(), send())


User interact 2 3

User Interact(2/3)

  • Set the parameter

    • delay perturbation patterns

    • delay value

  • Trade off (info vs # of run)


User interact 3 3

UserInteract(3/3)

  • Higher effect code is more likely to be a bottleneck

  • Dependency is not dealt with


S check

S-Check

  • Comment

    • Identify the program segment linking directly to a performance

    • Knowledge about the program is required in order to mark possible bottlenecks

    • code size get bigger, sensitivity test take longer time

    • Dependence information is not available


Assistant tool for program parallelization3

Assistant Tool for Program Parallelization

  • SUIF Explorer

    • Coverage and Granularity

  • S-Check

    • Effect of change on allover performance

  • Ursa Minor

    • Knowledge of experienced programmer's


Ursa minor kim et al 2000

Ursa Minor [Kim, et al. 2000]

  • Objective

    • × stop at pointing to problematic code〇 present with possible causes and solutions

    • Transfer knowledge to novice programmer from experienced programmer


Ursaminor system

UrsaMinor System

Import/Export

Data files from

Polaris or other

Parallel

Program

Static

Data

Dynamic

Data

Merlin

Performance

Adviser

Database

Database

Manager

Store analyzed data,

Map file, etc.

GUI Manager

Analyze problem

Suggest solution

Table View

Structure View

User


Merlin performance advisor

Merlin Performance Advisor

  • Knowledge database

    • knowledge on diagnosis and solutions

    • Transfer programming experience from experts to new users (with “MAP” file)

      • Performance model

      • Architecture … etc.


Merlin

Merlin

Symptom ⇒

Diagnostic

Suggestions


Advisor map 1 2

Advisor Map (1/2)

  • Advisor Map

    • Problem Domain

      • General performance problems from the viewpoint of programmers

    • Diagnostics Domain

      • Possible causes of these problems

    • Solution Group

      • Possible remedies


Advisor map 2 2

Advisor Map (2/2)


Expression evaluator

Expression Evaluator

  • Basic Spreadsheet Operations

    • Numeric Functions: NEG, ADD, SPDUP, PERCO, ARVG, etc.

    • Relational Functions: EQ, NE, etc.

    • Query Functions: PARALLEL, HASIO, HASCALL, HASDEP, etc.

    • Logical Functions: AND, OR, etc.


Merlin1

Merlin

  • Comment

    • The idea which progressed further rather than indication of a bottleneck

    • Who write the “MAP”?

    • The effect of this technology depends on quality of the MAP


Comparison

Comparison

  • SUIF Explorer vs. S-Check

    • No configuration, dependence information

    • Efficiency?

  • Two vs. Ursa Minor

    • Practical

    • Not kind to beginners


Conclusion

Conclusion

  • Several approach to guide the user with smart information

  • Future work

    • Integration

      • Profiler and Dependence Analyzer

    • Portability

      • Different architecture, OS, performance


  • Login