automatic optimization in parallel dynamic programming schemes l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Automatic Optimization in Parallel Dynamic Programming Schemes PowerPoint Presentation
Download Presentation
Automatic Optimization in Parallel Dynamic Programming Schemes

Loading in 2 Seconds...

play fullscreen
1 / 25

Automatic Optimization in Parallel Dynamic Programming Schemes - PowerPoint PPT Presentation


  • 346 Views
  • Uploaded on

Automatic Optimization in Parallel Dynamic Programming Schemes Juan-Pedro Martínez Departamento de Estadística y Matemática Aplicada Universidad Miguel Hernández de Elche, Spain jp.martinez@uhm.es Domingo Giménez Departamento de Informática y Sistemas Universidad de Murcia, Spain

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Automatic Optimization in Parallel Dynamic Programming Schemes' - LeeJohn


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
automatic optimization in parallel dynamic programming schemes

Automatic Optimization in Parallel Dynamic Programming Schemes

Juan-Pedro Martínez

Departamento de Estadística y Matemática Aplicada

Universidad Miguel Hernández de Elche, Spain

jp.martinez@uhm.es

Domingo Giménez

Departamento de Informática y Sistemas

Universidad de Murcia, Spain

domingo@dif.um.es

dis.um.es/~domingo

VECPAR2004

our goal
Our Goal

General Goal: to obtain parallel routines with autotuning capacity

  • Previous works: Linear Algebra Routines
  • This communication: Parallel Dynamic Programming Schemes
  • In the future: apply the techniques to hybrid, heterogeneous and distributed systems

VECPAR 2004

outline
Outline
  • Modelling Parallel Routines for Autotuning
  • Parallel Dynamic Programming Schemes
  • Autotuning in Parallel Dynamic Programming Schemes
  • Experimental Results

VECPAR 2004

modelling parallel routines for autotuning
Modelling Parallel Routines for Autotuning

Necessary to predict accurately the execution time and select

  • The number of processes
  • The number of processors
  • Which processors
  • The number of rows and columns of processes (the topology)
  • The processes to processors assignation
  • The computational block size (in linear algebra algorithms)
  • The communication block size
  • The algorithm (polyalgorithms)
  • The routine or library (polylibraries)

VECPAR 2004

modelling parallel routines for autotuning5
Modelling Parallel Routines for Autotuning

Cost of a parallel program:

: arithmetic time

: communication time

: overhead, for synchronization, imbalance, processes creation, ...

: overlapping of communication and computation

VECPAR 2004

modelling parallel routines for autotuning6
Modelling Parallel Routines for Autotuning

Estimation of the time:

Considering computation and communication divided in a number of steps:

And for each part of the formula that of the process which gives the highest value.

VECPAR 2004

modelling parallel routines for autotuning7
Modelling Parallel Routines for Autotuning

The time depends on the problem (n) and the system (p) size:

But also on some ALGORITHMIC PARAMETERS like the block size (b) and the number of processors (q) used from the total available

VECPAR 2004

modelling parallel routines for autotuning8
Modelling Parallel Routines for Autotuning

And some SYSTEM PARAMETERS which reflect the computation and communication characteristics of the system.

Typically the cost of an arithmetic operation (tc) and the start-up (ts) and word-sending time (tw)

VECPAR 2004

modelling parallel routines for autotuning9
Modelling Parallel Routines for Autotuning

The values of the System Parameters could be obtained

  • With installation routines associated to the routine we are installing
  • From information stored when the library was installed in the system
  • At execution time by testing the system conditions prior to the call to the routine

VECPAR 2004

modelling parallel routines for autotuning10
Modelling Parallel Routines for Autotuning

These values can be obtained as simple values (traditional method) or as function of the Algorithmic Parameters.

In this case a multidimensional table of values as a function of the problem size and the Algorithmic Parameters is stored,

And when a problem of a particular size is being solved the execution time is estimated with the values of the stored size closest to the real size

And the problem is solved with the values of the Algorithmic Parameters which predict the lowest execution time

VECPAR 2004

parallel dynamic programming schemes
Parallel Dynamic Programming Schemes
  • There are different Parallel Dynamic Programming Schemes.
  • The simple scheme of the “coins problem” is used:
    • A quantity C and n coins of values v=(v1,v2,…,vn), and a quantity q=(q1,q2,…,qn) of each type. Minimize the quantity of coins to be used to give C.
    • But the granularity of the computation has been varied to study the scheme, not the problem.

VECPAR 2004

parallel dynamic programming schemes12

1

2

.

.

.

.

.

.

.

.

j

.

.

.

.

.

N

1

2

….

i

n

Parallel Dynamic Programming Schemes
  • Sequential scheme:

fori=1 tonumber_of_decisions

forj=1 toproblem_size

obtain the optimum solution with i decisions and problem size j

endfor Complete the table with the formula:

endfor

VECPAR 2004

parallel dynamic programming schemes13

1

2

.

.

.

.

.

.

.

j

.

.

.

.

.

1

2

...

i

n

PO P1 P2 ...... PS ... PK-1 PK

Parallel Dynamic Programming Schemes
  • Parallel scheme:

fori=1 tonumber_of_decisions

In Parallel:

forj=1 toproblem_size

obtain the optimum

solution with

i decisions

and problem size j

endfor

endInParallel

endfor

VECPAR 2004

parallel dynamic programming schemes14

1

2

.

.

.

.

.

.

.

.

j

.

.

.

.

.

1

2

...

i

n

Parallel Dynamic Programming Schemes
  • Message-passing scheme:

In each processor Pj

for i=1 tonumber_of_decisions

communication step

obtain the optimum

solution with

i decisions

and the problem

sizes Pj has

assigned

endfor

endInEachProcessor

N

PO P1 P2 .................... PK-1 PK

VECPAR 2004

autotuning in parallel dynamic programming schemes

Process Pp

Autotuning in Parallel Dynamic Programming Schemes
  • Theoretical model:

Sequential cost:

Computational parallel cost (qilarge):

Communication cost:

  • The only AP is p
  • The SPs are tc , ts and tw

one step

VECPAR 2004

autotuning in parallel dynamic programming schemes16
Autotuning in Parallel Dynamic Programming Schemes
  • How to estimate arithmetic SPs:

Solving a small problem

  • How to estimate communication SPs:
    • Using a ping-pong (CP1)
    • Solving a small problem varying the number of processors (CP2)
    • Solving problems of selected sizes in systems of selected sizes (CP3)

VECPAR 2004

experimental results
Experimental Results
  • Systems:
    • SUNEt: five SUN Ultra 1 and one SUN Ultra 5 (2.5 times faster) + Ethernet
    • PenET: seven Pentium III + FastEthernet
  • Varying:
    • The problem size C = 10000, 50000, 100000, 500000
    • Large value of qi
    • The granularity of the computation (the cost of a computational step)

VECPAR 2004

experimental results18
Experimental Results
  • CP1:
    • ping-pong (point-to-point communication).
    • Does not reflect the characteristics of the system
  • CP2:
    • Executions with the smallest problem (C =10000) and varying the number of processors
    • Reflects the characteristics of the system, but the time also changes with C
    • Larger installation time (6 and 9 seconds)
  • CP3:
    • Executions with selected problem (C =10000, 100000) and system (p =2, 4, 6) sizes, and linear interpolation for other sizes
    • Larger installation time (76 and 35 seconds)

VECPAR 2004

slide19

Experimental Results

Parameter selection

SUNEt

PenFE

VECPAR 2004

experimental results20
Experimental Results
  • Quotient between the execution time with the parameter selected by each one of the selection methods and the lowest execution time, in SUNEt:

VECPAR 2004

experimental results21
Experimental Results
  • Quotient between the execution time with the parameter selected by each one of the selection methods and the lowest execution time, in PenFE:

VECPAR 2004

experimental results22
Experimental Results
  • Three types of users are considered:
    • GU (greedy user):
      • Uses all the available processors.
    • CU (conservative user):
      • Uses half of the available processors
    • EU (expert user):
      • Uses a different number of processors depending on the granularity:
        • 1 for low granularity
        • Half of the available processors for middle granularity
        • All the processors for high granularity

VECPAR 2004

experimental results23
Experimental Results
  • Quotient between the execution time with the parameter selected by each type of user and the lowest execution time, in SUNEt:

VECPAR 2004

experimental results24
Experimental Results
  • Quotient between the execution time with the parameter selected by each type of user and the lowest execution time, in PenFE:

VECPAR 2004

conclusions and future work
Conclusions and future work
  • The inclusion of Autotuning capacities in a Parallel Dynamic Programming Scheme has been considered.
  • Different forms of modelling the scheme and how parameters are selected have been studied.
  • Experimentally the selection proves to be satisfactory, and useful in providing the users with routines capable of reduced time executions
  • In the future we plan to apply this technique
    • to other algorithmic schemes,
    • in hybrid, heterogeneous and distributed systems.

VECPAR 2004