friday october 06 2006 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Friday, October 06, 2006 PowerPoint Presentation
Download Presentation
Friday, October 06, 2006

Loading in 2 Seconds...

play fullscreen
1 / 54

Friday, October 06, 2006 - PowerPoint PPT Presentation


  • 114 Views
  • Uploaded on

Friday, October 06, 2006. Measure twice, cut once. Carpenter’s Motto. Sources of overhead. Inter-process communication Idling Replicated computation. Sources of overhead. Inter-process communication Idling Replicated computation. Ts: The original single-processor serial time.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Friday, October 06, 2006' - washi


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
friday october 06 2006
Friday, October 06, 2006

Measure twice, cut once.

  • Carpenter’s Motto
sources of overhead
Sources of overhead
  • Inter-process communication
  • Idling
  • Replicated computation
sources of overhead1
Sources of overhead
  • Inter-process communication
  • Idling
  • Replicated computation
slide4
Ts: The original single-processor serial time.
  • Tis: The additional serial time spent on average for
    • Inter-processor communications
    • Setup
    • Depends on N.
  • Tp: The original single-processor parallelizable time.
  • Tip: The additional time spent on average by each processor
    • Setup
    • Idle time
slide5
Simplified expression
  • S(N) = Ts + Tp

Ts+ N*Tis + Tp/N + Tip

slide6

Straight line reference for linear speedup

Ts=10, Tip=1, Tis=0

Communication time negligible compared to computation. What you would expect from Amdahl’s law alone.

slide7

Ts=10, Tip=1, Tis=10

Adding small serial time. Adding more processors results in lower speedup.

slide8

Ts=10, Tip=1, Tis=1

Quadratic N dependence, e.g. every processor speaks to all others.

slide9
Adding processors won’t provide additional speedup unless the problem is scaled up as well.
  • Should not distribute calculations with small Tp/Tis over a large number of processors.
scaling a problem
Scaling a problem
  • Does number of tasks scale with the problem size?
  • Increase in problem size should increase the number of tasks rather than the size of individual tasks.
    • Should be able to solve larger problems when more processors are available.
what can we tell from our observations
What can we tell from our observations?
  • We implemented an algorithm on parallel computer X and achieved a speedup of 10.8 on 12 processors with problem size N=100 .
what can we tell from our observations1
What can we tell from our observations?
  • We implemented an algorithm on parallel computer X and achieved a speedup of 10.8 on 12 processors with problem size N=100 .
  • Region of observation is too narrow.
  • What if N=10 or N=1000?
what can we tell from our observations2
What can we tell from our observations?
  • T is the execution time, P is number of processors and N is problem size
  • T= N + N2/P
  • T= (N + N2)/P + 100
  • T= (N + N2)/P + 0.6 P2

All these algorithms all achieve a speedup of about 10.8 when P=12 and N=100 .

addition example1
Addition example

Speedup :

  • Ratio of time taken to solve a problem on a single processor to time required to solve it on a parallel computer with p identical processing elements
  • Speedup for addition example?
slide17
Speedup :
  • Comparison with best known serial algorithm
slide18
Efficiency :

Fraction of time which a processor spends doing useful work.

E = S/p

slide19
Cost :

Product of parallel runtime and the number of processors.

Cost: pTp

(Note:Tp here stands to the parallel runtime. The time from the moment the parallel computation starts to the moment last processing element finishes execution)

slide20
Cost optimal :

If cost of solving a problem on a parallel computer has same asymptotic growth as a function of input size as the fastest known sequential algorithm on a single processor.

Cost for addition example: O(n logn)

slide21
Cost optimal :

If cost of solving a problem on a parallel computer has same asymptotic growth as a function of input size as the fastest known sequential algorithm on a single processor.

Cost for addition example: O(n logn)

Not cost optimal.

slide26
If overhead increases sub-linearly with respect to problem size.

Keep efficiency fixed by increasing both the problem size and number of processors

slide27
Keep efficiency fixed by increasing both the problem size and number of processors

Scalable parallel systems

Ability to utilize increasing processing elements effectively

slide28
Scalability and cost-optimality are related

Scalable system can always be made cost-optimal if number of processing elements and size of problem are chosen carefully

slide29
Scalability and cost-optimality are related

Scalable system can always be made cost-optimal if number of processing elements and size of problem are chosen carefully

speedup anomalies
Speedup Anomalies
  • Speedup that is greater than linear: Super-linear
speedup anomalies1
Speedup Anomalies

Cache effects.

  • Each processor has a small amount of cache
  • When a problem is executed on a greater number of processors, more of its data can be placed in cache and as a result, total computation time will tend to decrease.
  • If reduction in computation time due to this cache effectoffsets increases in communication and idle time from use of additional processors then super-linearity results.
  • Similarly, the increased physical memory available in a multiprocessor may reduce the cost of memory accesses by avoiding the need for virtual memory paging.
speedup anomalies2
Speedup Anomalies

Search anomalies.

If a search tree contains solutions at varying depths, then multiple depth-first searches will, on average, explore fewer tree nodes before finding a solution than will a sequential depth-first search.

message passing
Message Passing
  • Partitioned address space
  • Data explicitly decomposed and placed by programmer
  • Locality of access.
  • Cooperation for send receive operations.
  • Structured and static requirements
message passing1
Message Passing
  • Most message passing programs are written using SPMD
message passing2
Message Passing
  • The need for a standard.
slide36
The Message Passing Interface (MPI) standard is the de-facto industry standard for parallel applications.
    • Designed by leading industry and academic researchers
  • MPI
    • Library that is widely used to parallelize scientific and compute-intensive programs
slide37
LAM (Indiana University),MPICH (Argonne National Laboratory, Chicago) are popular open source implementations of MPI library.
slide38
Implementations of MPI (such as LAM, MPICH) provide an API of library calls that allow users to pass messages between nodes of a parallel application.
  • Run on a wide variety of systems, from desktop workstations, clusters to large supercomputers (and everything in between).
slide39

MPI: the Message Passing Interface

The minimal set of MPI routines.

slide40

Starting and Terminating the MPI Library

  • MPI_Init is called prior to any calls to other MPI routines. Its purpose is to initialize the MPI environment.
  • MPI_Finalize is called at the end of the computation, and it performs various clean-up tasks to terminate the MPI environment.
  • The prototypes of these two functions are:

int MPI_Init(int *argc, char ***argv)

int MPI_Finalize()

  • MPI_Init also strips off any MPI related command-line arguments.
  • All MPI routines, data-types, and constants are prefixed by “MPI_”. The return code for successful completion is MPI_SUCCESS. (mpi.h)
hello world mpi program
Hello World MPI Program

#include <stdio.h>

#include <mpi.h>

int main(int argc, char *argv[])

{

int rank, size;

MPI_Init(&argc, &argv);

MPI_Comm_rank(MPI_COMM_WORLD, &rank);

MPI_Comm_size(MPI_COMM_WORLD, &size);

printf("Hello, world! I am %d of %d\n", rank, size);

MPI_Finalize();

return 0;

}

slide42
LAM
  • Before any MPI programs can be executed, the LAM run-time environment must be launched. This is typically called “booting LAM.”
slide43
LAM
  • Before any MPI programs can be executed, the LAM run-time environment must be launched. This is typically called “booting LAM.”
  • A text file is required that lists the hosts on which to launch the LAM run-time environment. This file is typically referred to as a “boot schema”, “hostfile”, or “machinefile.”
sample machinefile
Sample machinefile

hpcc.lums.edu.pk

compute-0-0.local

compute-0-1.local

compute-0-2.local

compute-0-3.local

compute-0-4.local

compute-0-5.local

compute-0-6.local

slide45
LAM

Settings have been done on your accounts and the following files have been copied in your home directory.

  • ssh_script
  • machinefile
  • hellompi.c
first time commands
First time commands

(Logout of all old sessions and re-login)

source ssh_script

first time commands1
First time commands

source ssh_script

Warning: Permanently added 'compute-0-0.local' (RSA) to the list of known hosts.

/bin/bash

Warning: Permanently added 'compute-0-1.local' (RSA) to the list of known hosts.

/bin/bash

Warning: Permanently added 'compute-0-2.local' (RSA) to the list of known hosts.

/bin/bash

Warning: Permanently added 'compute-0-3.local' (RSA) to the list of known hosts.

/bin/bash

Warning: Permanently added 'compute-0-4.local' (RSA) to the list of known hosts.

/bin/bash

Warning: Permanently added 'compute-0-5.local' (RSA) to the list of known hosts.

/bin/bash

Warning: Permanently added 'compute-0-6.local' (RSA) to the list of known hosts.

/bin/bash

first time commands2
First time commands

source ssh_script

/bin/bash

/bin/bash

/bin/bash

/bin/bash

/bin/bash

/bin/bash

/bin/bash

first time commands3
First time commands

lamboot -v machinefile

LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University

n-1<13857> ssi:boot:base:linear: booting n0 (hpcc.lums.edu.pk)

n-1<13857> ssi:boot:base:linear: booting n1 (compute-0-0.local)

n-1<13857> ssi:boot:base:linear: booting n2 (compute-0-1.local)

n-1<13857> ssi:boot:base:linear: booting n3 (compute-0-2.local)

n-1<13857> ssi:boot:base:linear: booting n4 (compute-0-3.local)

n-1<13857> ssi:boot:base:linear: booting n5 (compute-0-4.local)

n-1<13857> ssi:boot:base:linear: booting n6 (compute-0-5.local)

n-1<13857> ssi:boot:base:linear: booting n7 (compute-0-6.local)

n-1<13857> ssi:boot:base:linear: finished

first time commands4
First time commands

lamnodes

n0 hpcc.lums.edu.pk:1:origin,this_node

n1 compute-0-0.local:1:

n2 compute-0-1.local:1:

n3 compute-0-2.local:1:

n4 compute-0-3.local:1:

n5 compute-0-4.local:1:

n6 compute-0-5.local:1:

n7 compute-0-6.local:1:

first time commands5
First time commands

mpicc hellompi.c -o hello

first time commands6
First time commands

mpirun -np 8 hello

Hello, world! I am 0 of 8

Hello, world! I am 4 of 8

Hello, world! I am 2 of 8

Hello, world! I am 6 of 8

Hello, world! I am 3 of 8

Hello, world! I am 5 of 8

Hello, world! I am 7 of 8

Hello, world! I am 1 of 8

first time commands7
First time commands

lamhalt

LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University

lamwipe machinefile

LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University

lamnodes

-----------------------------------------------------------------------------

It seems that there is no lamd running on the host hpcc.lums.edu.pk.

This indicates that the LAM/MPI runtime environment is not operating.

The LAM/MPI runtime environment is necessary for the "lamnodes" command.

Please run the "lamboot" command the start the LAM/MPI runtime

environment. See the LAM/MPI documentation for how to invoke

"lamboot" across multiple machines.

sequence whenever you want to run an mpi program
Sequence whenever you want to run an MPI program
  • Compile using mpicc
  • Start LAM runtime environment using lamboot
  • Run MPI program using mpirun
  • When you are done, shut down LAM universe using lamhalt and lamwipe
  • lamclean can be useful if a parallel job crashes to remove all running programs