parallel programming in mpi n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Parallel Programming in MPI PowerPoint Presentation
Download Presentation
Parallel Programming in MPI

Loading in 2 Seconds...

play fullscreen
1 / 55

Parallel Programming in MPI - PowerPoint PPT Presentation


  • 77 Views
  • Uploaded on

Parallel Programming in MPI. January 8 , 2013. 1. 今日の実験環境. ホスト名 sandy.cc.kyushu-u.ac.jp ユーザ ID とパスワード 後で通知します。. 前回の講義でのプロセス並列プログラム例. 処理だけでなくデータも分割. プロセス0. プロセス1. プロセス2. プロセス3. 0. 24. 0. 24. 0. 24. 0. 24. A. A. A. A. =. =. =. =. =. =. =. =. =. =. =. =. =. =.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Parallel Programming in MPI' - xander-rivers


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide2
今日の実験環境
  • ホスト名 sandy.cc.kyushu-u.ac.jp
  • ユーザIDとパスワード後で通知します。
slide3
前回の講義でのプロセス並列プログラム例
  • 処理だけでなくデータも分割

プロセス0

プロセス1

プロセス2

プロセス3

...

...

...

...

0

24

0

24

0

24

0

24

A

A

A

A

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

B

B

B

B

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

C

C

C

C

double A[25],B[25],C[25];

...

for (i=0;i<25;i++)

A[i] = B[i] + C[i];

double A[25],B[25],C[25];

...

for (i=0;i<25;i++)

A[i] = B[i] + C[i];

double A[25],B[25],C[25];

...

for (i=0;i<25;i++)

A[i] = B[i] + C[i];

double A[25],B[25],C[25];

...

for (i=0;i<25;i++)

A[i] = B[i] + C[i];

プロセス0

プロセス1

プロセス2

プロセス3

プロセス0で、プロセス1の A[10]を参照したい場合? ⇒ プロセス間通信

how to describe communications in a program
どうやって、プログラムに通信を記述するか?How to Describe Communications in a Program?
  • TCP, UDP ?
    • Good:- 多くのネットワークに実装されており,可搬性が高い. Portable: Available on many networks.
    • Bad:- 接続やデータ転送の手続きが複雑Protocols for connections and data-transfer are complicated.- 広域ネットワークを対象に設計されており,オーバーヘッドが大きい.High overhead, since they are designed for wide-area (= unreliable) networks.

記述可能だが,並列処理には適さないPossible. But not suitable for parallel processing.

mpi message passing interface
MPI (Message Passing Interface)
  • 並列計算向けに設計された通信関数群A set of communication functions designed for parallel processing
    • C, C++, Fortranのプログラムから呼び出しCan be called from C/C++/Fortran programs.
  • "Message Passing" = Send + Receive
    • 実際には,Send, Receive 以外にも多数の関数を利用可能.Actually, more functions other than Send and Receive are available.
  • ともかく、プログラム例を見てみましょうLet's see a sample program, first.
slide6

#include <stdio.h>

#include "mpi.h"

int main(intargc, char *argv[])

{

intmyid, procs, ierr, i;

double myval, val;

MPI_Status status;

FILE *fp;

char s[64];

MPI_Init(&argc, &argv);

MPI_Comm_rank(MPI_COMM_WORLD, &myid);

MPI_Comm_size(MPI_COMM_WORLD, &procs);

if (myid == 0) {

fp = fopen("test.dat", "r");

fscanf(fp, "%lf", &myval);

for (i = 1; i < procs; i++){

fscanf(fp, "%lf", &val);

MPI_Send(&val, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD);

}

fclose(fp);

} else

MPI_Recv(&myval, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, &status);

printf("PROCS: %d, MYID: %d, MYVAL: %e\n", procs, myid, myval);

MPI_Finalize();

return 0;

}

Setup MPI environment

Get own process ID (= rank)

Get total number of processes

If my ID is 0

input data for this process and keep it in myval

i = 1~procs-1

input data and keep it in val

useMPI_Send to send value in valto process i

processes with ID other than 0use MPI_Recv to receive data from process 0 and keep it in myval

print-out its own myval

end of parallel computing

6

6

flow of the sample program
プログラム例の実行の流れFlow of the sample program.

rank 0

rank 1

read data from a file

myval

rank 2

receive datafrom rank 0

receive datafrom rank 0

read datafrom a file

val

send val to rank 1

wait for the arrival of the data

myval

read data from a file

val

print myval

wait for the arrival of the data

send val to rank 2

myval

print myval

print myval

複数の"プロセス"が,自分の番号(ランク)に応じて実行Multiple "Processes" execute the program according to their number (= rank).

7

7

7

sample of the result of execution
実行例Sample of the Result of Execution
  • 各プロセスがそれぞれ勝手に表示するので、表示の順番は毎回変わる可能性がある。The order of the output can be different,since each process proceeds execution independently.

PROCS: 4 MYID: 1 MYVAL: 20.0000000000000000

PROCS: 4 MYID: 2 MYVAL: 30.0000000000000000

PROCS: 4 MYID: 0 MYVAL: 10.0000000000000000

PROCS: 4 MYID: 3 MYVAL: 40.0000000000000000

rank 1

rank 2

rank 0

rank 3

mpi characteristics of mpi interface
MPIインタフェースの特徴Characteristics of MPIInterface
  • MPI プログラムは,普通の C言語プログラムMPI programs are ordinal programs in C-language
    • Not a new language
  • 各プロセスが同じプログラムを実行するEvery process execute the same program
  • ランク(=プロセス番号)を使って,プロセス毎に違う仕事を実行Each process executes its own work according to its rank(=process number)
  • 他のプロセスの変数を直接見ることはできない。A process cannot read or write variables on other process directly

Rank 0

Read file

myval

Rank 1

Read file

val

Rank 2

Receive

Send

Receive

myval

Read file

val

Print myval

Send

myval

Print myval

9

Print myval

tcp udp vs mpi
TCP, UDP vs MPI
  • MPI:並列計算に特化したシンプルな通信インタフェースSimple interface dedicated for parallel computing
    • SPMD(Single Program Multiple Data-stream) model
    • 全プロセスが同じプログラムを実行All processes execute the same program
  • TCP, UDP: 各種サーバ等,様々な用途を想定した汎用的な通信インタフェースGeneric interface for various communications,such as internet servers
    • Server/Client model
    • 各プロセスが自分のプログラムを実行 Each process executes its own program.
slide11

MPI

TCP Client

#include <stdio.h>

#include "mpi.h"

int main(int argc, char *argv[])

{

int myid, procs, ierr, i;

double myval, val;

MPI_Status status;

FILE *fp;

char s[64];

MPI_Init(&argc, &argv);

MPI_Comm_rank(MPI_COMM_WORLD, &myid);

MPI_Comm_size(MPI_COMM_WORLD, &procs);

if (myid == 0) {

fp = fopen("test.dat", "r");

fscanf(fp, "%lf", &myval);

for (i = 1; i < procs; i++){

fscanf(fp, "%lf", &val);

MPI_Send(&val, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD);

}

fclose(fp);

} else

MPI_Recv(&myval, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, &status);

printf("PROCS: %d, MYID: %d, MYVAL: %e\n", procs, myid, myval);

MPI_Finalize();

return 0;

}

sock = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP); memset(&echoServAddr, 0, sizeof(echoServAddr));

echoServAddr.sin_family = AF_INET;

echoServAddr.sin_addr.s_addr = inet_addr(servIP);

echoServAddr.sin_port = htons(echoServPort);

connect(sock, (struct sockaddr *) &echoServAddr, sizeof(echoServAddr));

echoStringLen = strlen(echoString);

send(sock, echoString, echoStringLen, 0);

totalBytesRcvd = 0;

printf("Received: ");

while (totalBytesRcvd < echoStringLen){

bytesRcvd = recv(sock, echoBuffer, RCVBUFSIZE - 1, 0);

totalBytesRcvd += bytesRcvd;

echoBuffer[bytesRcvd] = '\0' ;

printf(echoBuffer);

}

printf("\n");

close(sock);

initialize

initialize

TCP Server

servSock = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP);

memset(&echoServAddr, 0, sizeof(echoServAddr));

echoServAddr.sin_family = AF_INET;

echoServAddr.sin_addr.s_addr = htonl(INADDR_ANY);

echoServAddr.sin_port = htons(echoServPort);

bind(servSock, (struct sockaddr *) &echoServAddr, sizeof(echoServAddr));

listen(servSock, MAXPENDING);

for (;;){

clntLen = sizeof(echoClntAddr);

clntSock = accept(servSock,(struct sockaddr *)&echoClntAddr, &clntLen);

recvMsgSize = recv(clntSock, echoBuffer, RCVBUFSIZE, 0);

while (recvMsgSize > 0){

send(clntSock, echoBuffer, recvMsgSize, 0);

recvMsgSize = recv(clntSock, echoBuffer, RCVBUFSIZE, 0);

}

close(clntSock);

}

initialize

mpi layer of mpi
MPIの位置づけLayer of MPI
  • ネットワークの違いを、MPIが隠ぺいHide the differences of networks

Applications

MPI

Sockets

XTI

TCP

UDP

IP

High-Speed Interconnect(InfiniBand, etc.)

Ethernetdriver, Ethernetcard

mpi how to compile mpi programs
MPIプログラムのコンパイルHow to compile MPI programs
  • Compilecommand: mpicc Example)mpicc -O3 test.c -o test

optimization optionO is not 0

source file to compile

executable file to create

mpi sandy how to execute mpi programs on sandy
MPIプログラムの sandyでの実行How to execute MPI programs (on 'sandy')

Number of Nodes(ex: 2)

  • Prepare a script file
  • Submit the script fileqsub test.sh
  • Other commands
    • qstat (= check status), qdeljob_number (= cancel job)

Sample:

Number of Processes per node

(ex: 4)

#!/bin/sh

#PBS -l nodes=2:ppn=4,walltime=00:10:00

#PBS -j oe

#PBS -q middle

cd $PBS_O_WORKDIR

mpirun -np 8 ./test-mpi

Maximum Execution Time

(ex: 10min.)

Job Queue

Commands to be Executed

cd to the directory from where this job is submitted

Run MPI program with specified number (ex: 8) of processes

ex 0 mpi execution of an mpi program
Ex 0)MPIプログラムの実行 Execution of an MPI program

sandyにログインして、以下を実行しなさい。Login to sandy, and try the following commands.

時間に余裕があったら,プロセス数を変えたり,プログラムを書き換えたりしてみる.Try changing the number of processes,or modifying the source program.

$ cp /tmp/test-mpi.c .

$ cp /tmp/test.dat .

$ cp /tmp/test.sh .

$ cat test-mpi.c

$ cat test.dat

$ mpicc test-mpi.c –o test-mpi

$ qsub test.sh

wait for a while

$ ls (check the name of the result file (test.sh.o????))

$ less test.sh.o????

mpi mpi library
MPIライブラリMPI Library
  • MPI関数の実体は,MPIライブラリに格納されているThe bodies of MPI functions are in "MPI Library".
    • mpicc が自動的に MPIライブラリをプログラムに結合するmpicc links the library to the program

mpicc

main()

{

MPI_Init(...);

...

MPI_Comm_rank(...);

...

MPI_Send(...);

...}

link

Executablefile

compile

MPI_Init

MPI_Comm_rank

...

MPI Library

source program

mpi basic structure of mpi programs
MPIプログラムの基本構造Basic Structure of MPI Programs

Crucial lines

header file "mpi.h"

#include <stdio.h>

#include "mpi.h"

int main(int argc, char *argv[])

{

...

MPI_Init(&argc, &argv);

...

MPI_Comm_rank(MPI_COMM_WORLD, &myid);

MPI_Comm_size(MPI_COMM_WORLD, &procs);

...

MPI_Send(&val, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD);

...

MPI_Recv(&myval, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, &status);

...

MPI_Finalize();

return 0;

}

Function for start-up

You can call MPI functions

in this area

Functions for finish

mpi basic functions of mpi
基本的な MPI関数Basic Functions of MPI
  • MPI_Init
    • Initialization
  • MPI_Finalize
    • Finalization
  • MPI_Comm_size
    • Get number of processes
  • MPI_Comm_rank
    • Get rank (= Process number) of this process
  • MPI_Send & MPI_Recv
    • Message Passing
mpi init
MPI_Init

Usage:

int MPI_Init(int *argc, char **argv);

  • MPIの並列処理開始Start parallel execution of in MPI
    • プロセスの起動やプロセス間通信路の確立等。Start processes and establish connectionsamong them.
    • 他のMPI関数を呼ぶ前に、必ずこの関数を呼ぶ。Most be called once before calling otherMPI functions
  • 引数:Parameter:
    • main関数の2つの引数へのポインタを渡す。Specify pointers of both of the arguments of 'main' function.
      • 各プロセス起動時に実行ファイル名やオプションを共有するために参照。Each process most share the name of the executable file, and the options given to the mpirun command.

Example

#include <stdio.h>

#include "mpi.h"

int main(int argc, char *argv[])

{

int myid, procs, ierr;

double myval, val;

MPI_Status status;

MPI_Init(&argc, &argv);

MPI_Comm_rank(MPI_COMM_WORLD, &myid);

MPI_Comm_size(MPI_COMM_WORLD, &procs); ...

19

19

mpi finalize
MPI_Finalize

Usage:

int MPI_Finalize();

  • 並列処理の終了Finishes paralles execution
    • このルーチン実行後はMPIルーチンを呼び出せないMPI functions cannot be calledafter this function.
    • プログラム終了前に全プロセスで必ずこのルーチンを実行させる。Every process needs to call this function before exitting the program.

Example

main()

{

...

MPI_Finalize();

}

20

20

mpi comm rank
MPI_Comm_rank

Usage:

int MPI_Comm_rank(MPI_Comm comm, int *rank);

  • そのプロセスのランクを取得するGet the rank(= process number) of the process
    • 2番目の引数に格納Returned in the second argument
  • 最初の引数 = “コミュニケータ”1st argument = "communicator"
    • プロセスのグループを表す識別子An identifier for the group of processes
    • 通常は,MPI_COMM_WORLD を指定In most cases, just specify MPI_COMM_WORLD, here.
      • MPI_COMM_WORLD: 実行に参加する全プロセスによるグループa group that consists all of the processes in this execution
      • プロセスを複数のグループに分けて、それぞれ別の仕事をさせることも可能Processes can be devided into multiple groups and attached different jobs.

Example

...

MPI_Comm_rank(MPI_COMM_WORLD, &myid);

...

21

21

mpi comm size
MPI_Comm_size

Usage:

int MPI_Comm_size(MPI_Comm comm, int *size);

  • プロセス数を取得するGet the number of processes
    • 2番目の引数に格納される

Example

...

MPI_Comm_size(MPI_COMM_WORLD, &procs); ...

22

22

message passing
一対一通信Message Passing
  • 送信プロセスと受信プロセスの間で行われる通信Communication between "sender" and "receiver"
  • 送信関数と受信関数を,"適切"に呼び出す.Functions of Sending and Receiving most be called in a correct manner.
    • "From" rank and "To" rank are correct
    • Specified size of the data to be transferred is the same on both side
    • Same "Tag" is specified on both side

Rank 1

Rank 0

Receive

From: Rank 0

Size: 10 Integer data

Tag: 100

Send

To: Rank 1

Size: 10 Integer data

Tag: 100

Wait for the message

mpi send
MPI_Send

Usage:

int MPI_Send(void *b, int c, MPI_Datatype d, int dest,int t, MPI_Comm comm);

  • 送信内容Information of the message to send
    • start address of the data 開始アドレス,number of elements 要素数,data type データ型,rank of the destination 送信先,tag,communicator (= MPI_COMM_WORLD, in most cases)
    • data types:
    • tag: メッセージに付ける番号(整数) The number attached to each message
      • 不特定のプロセスから届く通信を処理するタイプのプログラムで使用Used in a kind of programs that handles anonymous messages.
      • 通常は、0 を指定しておいて良い. Usually, you can specify 0.

Example

...

MPI_Send(&val, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD);

...

24

24

example of mpi send
Example of MPI_Send
  • 整数変数 d の値を送信(整数1個)Send the value of an integer variable 'd'
  • 実数配列 mat の最初の要素から100番目の要素までを送信Send first 100 elements of array 'mat' (with MPI_DOUBLE type)
  • 整数配列 data の10番目の要素から50個を送信Send elements of an integer array 'data' from 10th to 59th element

MPI_Send(&d, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);

MPI_Send(mat, 100, MPI_DOUBLE, 1, 0, MPI_COMM_WORLD);

MPI_Send(&(data[10]), 50, MPI_INT, 1, 0, MPI_COMM_WORLD);

mpi recv
MPI_Recv

Usage:

int MPI_Recv(void *b, int c, MPI_Datatype d, int src,

int t, MPI_Comm comm, MPI_Status *st);

  • Information of the message to receive
    • start address for storing data 受信データ格納用の開始アドレス,number of elements 要素数,data type データ型,rank of the source 送信元,tag (= 0, in most cases), communicator (= MPI_COMM_WORLD, in most cases),status
  • status: メッセージの情報を格納する整数配列An integer array for storing the information of arrived message
    • 送信元ランクやタグの値を参照可能(通常は、あまり使わない)Consists the information about the source rank and the tag. ( Not be used in most case )

Example

...

MPI_Recv(&myval, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD &status); ...

26

26

ex 1 a program that displays random numbers
Ex 1) 乱数をランク順通りに表示するプログラムA program that displays random numbers

「各プロセスがそれぞれ整数乱数を一つ生成し、 それらをランク順に表示するプログラム」を作成しなさい。Make a program in which each process creates one random number and display the numbers in 'rank' order.

Example of the output:

0: 1524394631

1: 999094501

2: 941763604

3: 526956378

4: 152374643

5: 1138154117

6: 1926814754

7: 156004811

sample program 1 create and display a random number
Sample Program 1: Create and Display a Random Number

#include <stdio.h>

#include <stdlib.h>

#include <sys/time.h>

int main(int argc, char *argv[])

{

int r;

struct timeval tv;

gettimeofday(&tv, NULL);

srand(tv.tv_usec);

r = rand();

printf("%d\n", r);

}

sample program 2 display random n umbers with no sort
Sample program 2:Display Random Numbers (with no sort)

#include <stdio.h>

#include <stdlib.h>

#include <sys/time.h>

#include "mpi.h"

int main(int argc, char *argv[])

{

int r, myid, procs;

struct timeval tv;

MPI_Init(&argc, &argv);

MPI_Comm_rank(MPI_COMM_WORLD, &myid);

MPI_Comm_size(MPI_COMM_WORLD, &procs);

gettimeofday(&tv, NULL);

srand(tv.tv_usec);

r = rand();

printf("%d: %d\n", myid, r);

MPI_Finalize();

}

slide30
Hint:
  • 表示するプロセスを一つにすると、表示順をコントロールできるOrder of output can be controlled by letting only oneprocess to print out.
sample answer
Sample answer

if (myid == 0){

printf("%d: %d\n", myid, r);

for (i = 1; i < procs; i++){

MPI_Recv(&r, 1, MPI_INT, i, 0, MPI_COMM_WORLD, &status);

printf("%d: %d\n", i, r);

}

} else{

MPI_Send(&r, 1, MPI_INT, 0, 0,

MPI_COMM_WORLD);

}

MPI_Finalize();

}

#include <stdio.h>

#include <stdlib.h>

#include <sys/time.h>

#include "mpi.h"

int main(intargc, char *argv[])

{

int r, myid, procs;

structtimevaltv;

inti;

MPI_Status status;

MPI_Init(&argc, &argv);

MPI_Comm_rank(MPI_COMM_WORLD, &myid);

MPI_Comm_size(MPI_COMM_WORLD, &procs);

gettimeofday(&tv, NULL);

srand(tv.tv_usec);

r = rand();

mpi advanced functions of mpi
より高機能な MPI関数Advanced Functions of MPI

集団通信Collective Communication (= Group Communication)

ノンブロッキング通信Non-Blocking Communication

通信の完了を待つ間に他の処理を行うExecute other instructions while waiting for the completion of a communication.

collective communications
集団通信Collective Communications
  • グループ内の全プロセスで行う通信Communications among all of the processes in the group
  • Examples)
    • MPI_Bcast
      • copy a data to otherprocesses
    • MPI_Gather
      • Gather data from other processesto an array
    • MPI_Reduce
      • Apply a 'Reduction'operation to the distributed datato produce one array

Rank 0

Rank 1

Rank 2

3

1

8

2

3

1

8

2

3

1

8

2

Rank 0

Rank 1

Rank 2

7

5

9

7

5

9

Rank 0

Rank 1

Rank 2

1

2

3

4

5

6

7

8

9

12

15

18

mpi bcast
MPI_Bcast

Usage:

int MPI_Bcast(void *b, int c, MPI_Datatype d, int root, MPI_Comm comm);

  • あるプロセスのデータを全プロセスにコピーcopy a data on a process to all of the processes
  • Parameters:
    • start address, number of elements, data type, root rank, communicator
    • root rank: コピー元のデータを所有するプロセスのランクrank of the process that has the original data
  • Example:MPI_Bcast(a, 3, MPI_DOUBLE, 0, MPI_COMM_WORLD);

Rank1

Rank2

Rank3

Rank0

a

a

a

a

34

34

mpi gather
MPI_Gather

Usage:

int MPI_Gather(void *sb, int sc MPI_Datatype st, void *rb, int rc, MPI_Datatype rt, int root, MPI_Comm comm);

  • 全プロセスからデータを集めて一つの配列を構成Gather data from other processes to construct an array
  • Parameters:
    • send data: start address, number of elements, data type, receive data: start address, number of elements, data type, (means only on the root rank)root rank, communicator
    • root rank: 結果の配列を格納するプロセスのランクrank of the process that stores the result array
  • Example:

MPI_Gather(a, 3, MPI_DOUBLE, b, 3, MPI_DOUBLE, 0, MPI_COMM_WORLD);

Rank0

Rank1

Rank2

Rank3

a

a

a

a

b

35

35

usage of collective communications
集団通信の利用に当たってUsage of Collective Communications
  • 同じ関数を全プロセスが実行するよう、記述する。Every process must call the same function
    • 例えば MPI_Bcastは,rootrankだけでなく全プロセスで実行For example, MPI_Bcast must be called not only by the root rank but also all of the other ranks
  • 送信データと受信データの場所を別々に指定するタイプの集団通信では、送信データの範囲と受信データの範囲が重ならないように指定する。On functions that require information of both send and receive, the specified ranges of the addresses for sending and receiving cannot be overlapped.
    • MPI_Gather, MPI_Allgather, MPI_Gatherv, MPI_Allgatherv, MPI_Recude, MPI_Allreduce, MPI_Alltoall, MPI_Alltoallv,etc.

36

36

non blocking communication functions
ノンブロッキング通信関数 Non-blocking communication functions
  • ノンブロッキング = ある命令の完了を待たずに次の命令に移るNon-blocking = Do not wait for the completion of an instruction and proceed to the next instruction
  • Example) MPI_Irecv& MPI_Wait

Blocking

Non-Blocking

MPI_Recv

MPI_Irecv

Proceed to the next instruction without waiting for the data

next instructions

Wait for the arrival of data

data

data

MPI_Wait

next instructions

mpi irecv
MPI_Irecv

Usage:

int MPI_Irecv(void *b, int c, MPI_Datatype d, int src,

int t, MPI_Comm comm, MPI_Request *r);

  • Non-Blocking Receive
    • Parameters:start address for storing received data,number of elements, data type,rank of the source, tag (= 0, in most cases), communicator (= MPI_COMM_WORLD, in most cases),request
  • request: 通信要求 Communication Request
    • この通信の完了を待つ際に用いるUsed for Waiting completion of this communication
  • Example)MPI_Requestreq; ...MPI_Irecv(a, 100, MPI_INT, 0, 0, MPI_COMM_WORLD, &req); ...MPI_Wait(&req, &status);

38

38

mpi isend
MPI_Isend

Usage:

int MPI_Isend(void *b, int c, MPI_Datatype d, int dest,int t, MPI_Comm comm,

MPI_Request *r);

  • Non-Blocking Send
    • Parameters:start address for sending data,number of elements, data type,rank of the destination, tag (= 0, in most cases), communicator (= MPI_COMM_WORLD, in most cases),request
  • Example)MPI_Requestreq; ...MPI_Isend(a, 100, MPI_INT, 1, 0, MPI_COMM_WORLD, &req); ...MPI_Wait(&req, &status);

39

39

non blocking send
Non-Blocking Send?
  • Blocking send (MPI_Send):送信データが別の場所にコピーされるのを待つ Wait for the data to be copied to somewhere else.
    • ネットワークにデータを送出し終わるか、一時的にデータのコピーを作成するまで。Until completion of the data to be transferred to the network or, until completion of the data to be copied to a temporal memory.
  • Non-Blocking send (MPI_Recv):待たない
notice data is not sure in non blocking communications
Notice: ノンブロッキング通信中はデータが不定Data is not sure in non-blocking communications
  • MPI_Irecv:
    • 受信データの格納場所と指定した変数の値は MPI_Waitまで不定Value of the variable specified for receiving data is not fixed before MPI_Wait

A

arriveddata

MPI_Irecvto A

10

...

~ = A

...

A

50

Value of A at herecan be 10 or 50

50

MPI_Wait

Value of A is 50

~ = A

notice data is not sure in non blocking communications1
Notice: ノンブロッキング通信中はデータが不定Data is not sure in non-blocking communications
  • MPI_Isend:
    • 送信データを格納した変数を MPI_Waitより前に書き換えると、実際に送信される値は不定If the variable that stored the data to be sent is modified before MPI_Wait, the value to be actually sent is unpredictable.

A

MPI_Isend A

Modifying value of A here causes incorrect communication

10

data sent

...

A = 50

...

A

10 or 50

50

MPI_Wait

You can modify value of A at here without any problem

A = 100

mpi wait
MPI_Wait

Usage:

int MPI_Wait(MPI_Request *req, MPI_Status *stat);

  • ノンブロッキング通信(MPI_Isend、MPI_Irecv)の完了を待つ。Wait for the completion of MPI_Isend or MPI_Irecv
    • 送信データの書き換えや受信データの参照が行えるMake sure that sending data can be modified,or receiving data can be referred.
    • Parameters:request, status
  • status:MPI_Irecv完了時に受信データの statusを格納The status of the received data is stored at the completion of MPI_Irecv
mpi waitall
MPI_Waitall

Usage:

int MPI_Waitall(int c, MPI_Request *requests, MPI_Status *statuses);

  • 指定した数のノンブロッキング通信の完了を待つWait for the completion of specified number of non-blocking communications
    • Parameters:count, requests, statuses
  • count:ノンブロッキング通信の数The number of non-blocking communications
  • requests, statuses:少なくとも count個の要素を持つ MPI_Requestと MPI_Statusの配列Arrays of MPI_Request or MPI_Status that consists at least 'count' number of elements.
inside of the functions of collective communications
集団通信関数の中身Inside ofthe functions of collective communications
  • 通常,集団通信関数は,MPI_Send, MPI_Recv, MPI_Isend, MPI_Irecv等の一対一通信で実装されるUsually, functions of collective communications are implemented by using message passing functions.
inside of mpi bcast
Inside of MPI_Bcast

int MPI_Bcast(char *a, int c, MPI_Datatype d, int root, MPI_Comm comm)

{

int i, myid, procs;

MPI_Status st;

MPI_Comm_rank(comm, &myid);

MPI_Comm_rank(comm, &procs);

if (myid == root){

for (i = 0; i < procs)

if (i != root)

MPI_Send(a, c, d, i, 0, comm);

} else{

MPI_Recv(a, c, d, root, 0, comm, &st); }

return 0;

}

  • One of the most simple implementations
another implementation with mpi isend
Another implementation: With MPI_Isend

int MPI_Bcast(char *a, int c, MPI_Datatype d, int root, MPI_Comm comm)

{

int i, myid, procs, cntr;

MPI_Status st, *stats;

MPI_Request *reqs;

MPI_Comm_rank(comm, &myid);

MPI_Comm_rank(comm, &procs);

if (myid == root){

stats = (MPI_Status *)malloc(sizeof(MPI_Status)*procs);

reqs = (MPI_Request *)malloc(sizeof(MPI_Request)*procs);

cntr = 0;

for (i = 0; i < procs)

if (i != root)

MPI_Isend(a, c, d, i, 0, comm, &(reqs[cntr++]));

MPI_Waitall(procs-1, reqs, stats);

free(stats);

free(reqs);

} else{

MPI_Recv(a, c, d, root, 0, comm, &st); }

return 0;

}

another implementation binomial tree
Another implementation: Binomial Tree

int MPI_Bcast(char *a, int c, MPI_Datatype d, int root, MPI_Comm comm)

{

int i, myid, procs;

MPI_Status st;

int mask, relative_rank, src, dst;

int tag = 1, success = 0; MPI_Comm_rank(comm, &myid);

MPI_Comm_rank(comm, &procs);

relative_rank = myid - root;

if (relative_rank < 0) relative_rank += procs;

mask = 1;

while (mask < num_procs){

if (relative_rank & mask){

src = myid - mask;

if (src < 0) src += procs;

MPI_Recv(a, c, d, src, 0, comm, &st);

break;

}

mask <<= 1;

}

mask >>= 1;

while (mask > 0){

if (relative_rank + mask < procs){

dst = myid + mask;

if (dst >= procs) dst -= procs;

MPI_Send (a, c, d, dst, 0, comm);

}

mask >>= 1;

}

return 0;

}

flow of binomial tree
Flow of Binomial Tree
  • Use 'mask' to determine when and how to Send/Recv

Rank 0

Rank 1

Rank 2

Rank 3

Rank 4

Rank 5

Rank 6

Rank 7

mask = 1

mask = 1

mask = 1

mask = 1

mask = 1

mask = 1

mask = 1

mask = 1

mask = 2

mask = 2

mask = 2

mask = 2

Recv from 6

Recv from 4

Recv from 0

Recv from 2

mask = 4

mask = 4

Recv from 4

Recv from 0

Recv from 0

mask = 4

Send to 4

mask = 2

Send to 6

mask = 2

mask = 1

mask = 1

Send to 2

Send to 5

Send to 7

mask = 1

Send to 3

mask = 1

Send to 1

deadlock
Deadlock

何らかの理由で、プログラムを進行させることができなくなった状態A status of a program in which it cannot proceed by some reasons.

MPIプログラムでデッドロックが発生しやすい場所:Places you need to be careful for deadlocks:1. MPI_Recv, MPI_Wait, MPI_Waitall

2. Collective communications

 全部のプロセスが同じ集団通信関数を実行するまで先に進めないA program cannot proceed until all processes call the same collective communication function

Wrong case:

One solution: use MPI_Irecv

if (myid == 0){

MPI_Recv from rank 1

MPI_Send to rank 1

}

if (myid == 1){

MPI_Recv from rank 0

MPI_Send to rank 0 }

if (myid == 0){

MPI_Irecv from rank 1

MPI_Send to rank 1

MPI_Wait

}

if (myid == 1){

MPI_Irecv from rank 0

MPI_Send to rank 0

MPI_Wait }

ex2 make reduce function by yourself
Ex2) Make Reduce function by yourself
  • 次のページのプログラムの my_reduce関数の中身を追加してプログラムを完成させるFill the inside of 'my_reduce' function in the programshown in the next slide
    • my_reduce: MPI_Reduceの簡略版 Simplified version of MPI_Reduce
      • 整数の総和のみ. ルートランクは 0限定.コミュニケータは MPI_COMM_WORLDCalculates total sum of integer numbers. The root rank is always 0.The communicator is always MPI_COMM_WORLD.
  • アルゴリズムは好きなものを考えてよいAny algorithm is OK.
slide52

#include <stdio.h>

#include <stdlib.h>

#include "mpi.h"

#define N 20

int my_reduce(int *a, int *b, int c)

{

return 0;

}

int main(int argc, char *argv[])

{

int i, myid, procs;

int a[N], b[N];

MPI_Init(&argc, &argv);

MPI_Comm_rank(MPI_COMM_WORLD, &myid);

MPI_Comm_size(MPI_COMM_WORLD, &procs);

for (i = 0; i < N; i++){

a[i] = i;

b[i] = 0;

}

my_reduce(a, b, N);

if (myid == 0)

for (i = 0; i < N; i++)

printf("b[%d] = %d , correct answer = %d\n", i, b[i], i*procs);

MPI_Finalize();

return 0;

}

complete here by yourself

answer
Answer
  • 演習の解答例は、明日以降、講義のページに掲載します。Sample programs of the answer of exercises will beuploaded to the web page of this class tomorrow.
  • 試験の問題になるので、各自チェックし、不明な点はメールで質問してください。Those programs will be used as part of the examination. Check them by yourself and, if there is any questions, send E-mails to me: E-mail: nanri@cc.kyushu-u.ac.jp
summary
まとめSummary
  • MPIでは、一つのプログラムを複数のプロセスが実行するOn MPI, multiple processes run the same program
  • 各プロセスには、そのランク(番号)に応じて仕事を割り当てるJobs are attached according to the rank(the number) of each process
  • 各プロセスはそれぞれ自分だけの記憶空間で動作するEach process runs on its own memory space
  • 他のプロセスが持っているデータを参照するには、通信するAccesses to the data on other processes can be made only by explicit communication among processes
  • 通信以外の処理は、各プロセスが独立して進行するOperations on each process, other than communications, progress independently.
  • MPIfunctions
    • MPI_Init, MPI_Finalize, MPI_Comm_rank
    • MPI_Send, MPI_Recv
    • MPI_Bcast, MPI_Gather
    • MPI_Isend, MPI_Irecv, MPI_Wait
references
References
  • MPI Forumhttp://www.mpi-forum.org/
    • specification of "MPI standard"
  • MPI仕様(日本語訳) http://phase.hpcc.jp/phase/mpi-j/ml/
  • 理化学研究所の講習会資料http://accc.riken.jp/HPC/training/mpi/mpi_all_2007-02-07.pdf

55

55