Titulo

Titulo Introducción a MPI Clase 5 Marcelo Rozenberg (agradecimiento: Ruben Weht ruweht@cnea.gov.ar)

Tipos de datos derivados • Tipo contiguo (bloques simples) • Tipo vector (bloques equiespaceados) • Tipo indexado (bloques y espaceados variables) MPI_Type_contiguous(…) MPI_Type_vector(…) MPI_Type_indexed(cuantos, vec-largobloque, vec-desplazamiento, oldtype, newtype, ierr)

Definir y manipular comunicadores • Obtener un conjunto de IDs de un comunicador existente • Crear un grupo como un subconjunto de un dado grupo • Definir un nuevo comunicador para el grupo. 1. MPI_Comm_group ( comm, group, ierr) 2.1 MPI_Group_incl ( oldgrp, count, ranks, newgrp, ierr)2.2 MPI_Group_excl ( oldgrp, count, ranks, newgrp, ierr) 3. MPI_Comm_create ( comm, newgrp, newcomm, ierr)

Una rutina útil: MPI_Comm_split • Permite crear de una vez varios comunicadores Ej: una llamada crea q nuevos comunicadores (todos con el mismo nombre mi_fila_comm) MPI_Comm mi_fila_comm int mi_fila C mi_rank es rank en MPI_Comm_world C q*q = p mi_fila = mi_rank/q MPI_Comm_split(MPI_COMM_WORLD, mi_fila , mi_rank , mi_fila_comm) q = 3 p = 9 out Para el P5 los procs del comunicador mi_fila_comm son el {3,4,5}

Topologías virtuales • Topologías virtuales: • MPI tiene rutinas que permiten definir una grilla de procesos que se adaptan bien a la geometría particular del cálculo. • El concepto de topología virtual es una característica de MPI que permite manejar una grilla de procesos. • El objetivo es obtener códigos mas concisos y simples, permitiendo también optimizar la comunicación entre los nodos. • Una topología virtual se asocia con un comunicador.

P0 Grilla 1-d P1 P2 P3 Ejemplos típicos

P4 P0 P8 Grilla 2-d P5 P1 P9 P6 P2 P10 P7 P3 P11 Ejemplos típicos

P4 P8 P5 P9 P6 P10 P7 P11 Definir una matriz de procesos! P0 Grilla 2-d P1 P2 P3 y definir comunicadores fila y comunicadores columna

Cart_create Hay dos tipos de topologías Cartesianas  Graph (generales) • Las topologías cartesianas son grillas cartesianas uni o bi-dimensionales • Son un caso particular de las generales, pero como son muy utilizadas, • hay subrutinas dedicadas • Para asociar un punto de la grilla virtual a cada procesador es preciso • especificar la siguiente información: • Numero de dimensiones de la grilla (1 o 2) • Largo de cada dimension (# de procs por fila y/o columna) • Periodicidad de la dimensión (condiciones de contorno: anillos, toros) • Opción de optimización (reordenar los procesadores físicos)

Cart_create Topología cartesiana MPI_CART_CREATE(old_comm, ndims, dims, periods, reorder, comm_cart, ierr) ndims: dimensión de la grilladims: vector con la dimensión de cada eje de coordenadasperiods: vector lógico que indica si existe periodicidad o no en cada ejereorder: .false. Si los datos ya fueron distribuidos. Usa los ranks del comunicador original.true. Si los datos todavia no fueron distribuidos. Reordena los procesos para optimizar la comunicación

Ejemplo: Crear una topología cartesiana bidimensional de q x q, con condiciones de contorno periodicas (toro) y permitiendo al sistema reordenar procesadores MPI_Comm grid_comm int dim_sizes(2), wrap_around(2), reorder reorder = 1 dim_sizes(1) = q dim_sizes(2) = q wrap_around(1) = 1 wrap_around(2) = 1 MPI_Cart_create( MPI_COMM_WORLD , 2 , dim_sizes , wrap_around , reorder , grid_comm )

Ejemplo (cont): Cómo obtener las coordenadas? MPI_Comm grid_comm int dim_sizes(2), wrap_around(2), reorder, coord(2), mi_grid_rank reorder = 1 dim_sizes(1) = q dim_sizes(2) = q wrap_around(1) = 1 wrap_around(2) = 1 MPI_Cart_create( MPI_COMM_WORLD , 2 , dim_sizes , wrap_around , reorder , grid_comm ) MPI_Comm_rank( grid_comm , mi_grid_rank ) MPI_Cart_coord( grid_comm , mi_grid_rank , 2 , coord )

Otras rutinas de la clase MPI_Cart: MPI_Cart_rank( grid_comm , coord , grid_rank ) Es el “inverso” de MPI_Cart_coord MPI_Cart_sub( grid_comm , var_coords , fila_comm ) Crea subgrillas (nuevos comunicadores de filas) Ej: MPI_comm fila_comm int var_coords(2) var_coords(1) = 0 ( 0 porque la coord “x” no varia en cada fila) var_coords(2) = 1 ( 1 porque la coord “y” varia en cada fila) MPI_Cart_sub( grid_comm , var_coords , fila_comm)

Otras rutinas de la clase MPI_Cart: MPI_Cart_get( grid_comm , 2 , dim_sizes , wrap_around , coord ) Obtiene las coordenadas del proceso ademas de los tamaños de las dimensiones y la periodicidad de un dado comunicador cartsiano. vectores de salida: dim_sizes(2) , wrap_around(2) , coord(2) MPI_Cart_shift( grid_comm , dir, disp , rank_source, rank_dest ) Obtine el rank del proceso de origen y el de destino (para usar en un send/recv) entre procesos separados por disp procesos en la dirección dir. “Averigua quiénes son los vecinos”

Un código de ejemplo (típico): • Método de Jacobi para la solución del Problema de Poisson (ecuación en derivadas parciales PDE)

i,j+1 Discretización i,j i-1,j i+1,j xi = i / (n+1), i = 0,1,..., n+1 yj = j / (n+1), j = 0,1,..., n+1 n+2 x n+2 puntos, h = 1/(n+1) i,j-1 borde interior ui-1,j + ui,j+1 + ui,j-1 + ui+1,j –4ui,j h2 = fi,j uk+1i,j = ¼ (uki-1,j + uki,j+1 + uki,j-1 + uki+1,j –h2 fi,j) iteración D u = f(x,y) en el interior u(x,y) = g(x,y) en el borde Problema de Poisson ui,j = ¼ (ui-1,j + ui,j+1 + ui,j-1 + ui+1,j –h2 fi,j)

1/3 c******************************************************************* c oned.f - a solution to the Poisson problem using Jacobi c interation on a 1-d decomposition c c The size of the domain is read by processor 0 and broadcast to c all other processors. The Jacobi iteration is run until the c change in successive elements is small or a maximum number of c iterations is reached. The difference is printed out at each c step. c******************************************************************* program main C include "mpif.h" integer maxn parameter (maxn = 128) double precision a(maxn,maxn), b(maxn,maxn), f(maxn,maxn) integer nx, ny integer myid, numprocs, ierr integer comm1d, nbrbottom, nbrtop, s, e, it double precision diff, diffnorm, dwork double precision t1, t2 double precision MPI_WTIME external MPI_WTIME external diff call MPI_INIT( ierr ) call MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr ) call MPI_COMM_SIZE( MPI_COMM_WORLD, numprocs, ierr )

2/3 c if (myid .eq. 0) then c c Get the size of the problem c c print *, 'Enter nx' c read *, nx nx = 110 endif call MPI_BCAST(nx,1,MPI_INTEGER,0,MPI_COMM_WORLD,ierr) ny = nx c c Get a new communicator for a decomposition of the domain c call MPI_CART_CREATE( MPI_COMM_WORLD, 1, numprocs, .false., $ .true., comm1d, ierr ) c c Get my position in this communicator, and my neighbors c call MPI_COMM_RANK( comm1d, myid, ierr ) call MPI_Cart_shift( comm1d, 0, 1, nbrbottom, nbrtop, ierr ) c c Compute the actual decomposition c call MPE_DECOMP1D( ny, numprocs, myid, s, e ) c c Initialize the right-hand-side (f) and the initial solution guess (a) c call onedinit( a, b, f, nx, s, e ) crea la topología: una dimensión con reordenamiento cada procs obtiene su rank y averigua quienes son sus vecinos asigna bloques de puntos (x,y) a los procesos

Discretización xi = i / (n+1), i = 0,1,..., n+1 yj = j / (n+1), j = 0,1,..., n+1 n+2 x n+2 puntos, h = 1/(n+1) borde interior Topología virtual 1-d para el Problema de Poisson i,j+1 rank=2 i,j i-1,j i+1,j i,j-1 rank=1 rank=0 1-d Debo definir bloques: double precision u(0:n+1, s:e) s:eindica los valores de j de cada bloque. Cómo obtengo s y e? Problema con los bordes!

Topología virtual 1-d para el Problema de Poisson (cont) i,j+1 rank=2 Agrando los bloques i,j i-1,j i+1,j puntos fantasmas i,j-1 rank=1 rank=0 double precision u(0:n+1, s:e) s:eindica los valores de j de cada bloque Esto se hace con la rutina MPE_DECOMP1D( ny, numprocs, myid, s, e ) Es una extension de MPI, freeware no forma parte del standard MPI

Detalles sobre MPE_DECOMP1D • MPE_DECOMP1D( ny, numprocs, myid, s, e ) • ny es el tamaño del sistema en el eje vertical • numprocs es el numero de procesos • myid es el rank de la coordenada cartesiana • Si ny y nprocs son divisibles es fácil: • s = 1+ myid * (ny /nprocs) • e = s + (ny/nprocs) – 1 • Si no lo son la elección obvia es: • s= 1+ myid * piso(ny /nprocs) • if (myid .eq. nprocs – 1) then • e = ny • else • e = s + piso(ny/nprocs) – 1 • endif • Donde piso(x) da el mayor entero que no es mayor que x. • Pero si nprocs = 64 y ny = 127 .... Da 63 de 1 y 1 de 64!!!! MPE_DECOMP1D Es más inteligente !

2/3 c if (myid .eq. 0) then c c Get the size of the problem c c print *, 'Enter nx' c read *, nx nx = 110 endif call MPI_BCAST(nx,1,MPI_INTEGER,0,MPI_COMM_WORLD,ierr) ny = nx c c Get a new communicator for a decomposition of the domain c call MPI_CART_CREATE( MPI_COMM_WORLD, 1, numprocs, .false., $ .true., comm1d, ierr ) c c Get my position in this communicator, and my neighbors c call MPI_COMM_RANK( comm1d, myid, ierr ) call MPI_Cart_shift( comm1d, 0, 1, nbrbottom, nbrtop, ierr ) c c Compute the actual decomposition c call MPE_DECOMP1D( ny, numprocs, myid, s, e ) c c Initialize the right-hand-side (f) and the initial solution guess (a) c call onedinit( a, b, f, nx, s, e ) crea la topología: una dimensión con reordenamiento cada procs obtiene su rank y averigua quiénes son sus vecinos asigna bloques de puntos (x,y) a los procesos inicializa

Subrutina onedinit subroutine onedinit( a, b, f, nx, s, e ) integer nx, s, e, i, j double precision a(0:nx+1, s-1:e+1), b(0:nx+1, s-1:e+1),f(0:nx+1, s-1:e+1) c do 10 j=s-1,e+1 do 10 i=0,nx+1 a(i,j) = 0.0d0 b(i,j) = 0.0d0 f(i,j) = 0.0d0 10 continue c c Handle boundary conditions c do 20 j=s,e a(0,j) = 1.0d0 b(0,j) = 1.0d0 a(nx+1,j) = 0.0d0 b(nx+1,j) = 0.0d0 20 continuec if (s .eq. 1) then do 30 i=1,nx a(i,0) = 1.0d0 b(i,0) = 1.0d0 30 continue endif return end limpia las matrices fija las condiciones de contorno u(0,y) = 1 u(x,0) = 1 u(x,1) = 0 u(1,y) = 0

3/3 sincroniza y mide el tiempo c c Actually do the computation. Note the use of a collective operation to c check for convergence, and a do-loop to bound the number of iterations. c call MPI_BARRIER( MPI_COMM_WORLD, ierr ) t1 = MPI_WTIME() do 10 it=1, 100 c do 10 it=1, 2 call exchng1( a, nx, s, e, comm1d, nbrbottom, nbrtop ) call sweep1d( a, f, nx, s, e, b ) call exchng1( b, nx, s, e, comm1d, nbrbottom, nbrtop ) call sweep1d( b, f, nx, s, e, a ) dwork = diff( a, b, nx, s, e ) call MPI_Allreduce( dwork, diffnorm, 1, MPI_DOUBLE_PRECISION, $ MPI_SUM, comm1d, ierr ) if (diffnorm .lt. 1.0e-5) goto 20 c if (myid .eq. 0) print *, 2*it, ' Difference is ', diffnorm 10 continue if (myid .eq. 0) print *, 'Failed to converge' 20 continue t2 = MPI_WTIME() if (myid .eq. 0) then print *, 'Converged after ', 2*it, ' Iterations in ', t2 - t1, $ ' secs ' endif c call MPI_FINALIZE(ierr) end pasa los puntos fantasma

Subrutina exchng1 subroutine exchng1( a, nx, s, e, comm1d, nbrbottom, nbrtop ) include 'mpif.h' integer nx, s, e double precision a(0:nx+1,s-1:e+1) integer comm1d, nbrbottom, nbrtop integer status(MPI_STATUS_SIZE), ierr c call MPI_SENDRECV( & a(1,e), nx, MPI_DOUBLE_PRECISION, nbrtop, 0, & a(1,s-1), nx, MPI_DOUBLE_PRECISION, nbrbottom, 0, & comm1d, status, ierr ) call MPI_SENDRECV( & a(1,s), nx, MPI_DOUBLE_PRECISION, nbrbottom, 1, & a(1,e+1), nx, MPI_DOUBLE_PRECISION, nbrtop, 1, & comm1d, status, ierr ) return end

3/3 sincroniza y mide el tiempo c c Actually do the computation. Note the use of a collective operation to c check for convergence, and a do-loop to bound the number of iterations. c call MPI_BARRIER( MPI_COMM_WORLD, ierr ) t1 = MPI_WTIME() do 10 it=1, 100 c do 10 it=1, 2 call exchng1( a, nx, s, e, comm1d, nbrbottom, nbrtop ) call sweep1d( a, f, nx, s, e, b ) call exchng1( b, nx, s, e, comm1d, nbrbottom, nbrtop ) call sweep1d( b, f, nx, s, e, a ) dwork = diff( a, b, nx, s, e ) call MPI_Allreduce( dwork, diffnorm, 1, MPI_DOUBLE_PRECISION, $ MPI_SUM, comm1d, ierr ) if (diffnorm .lt. 1.0e-5) goto 20 c if (myid .eq. 0) print *, 2*it, ' Difference is ', diffnorm 10 continue if (myid .eq. 0) print *, 'Failed to converge' 20 continue t2 = MPI_WTIME() if (myid .eq. 0) then print *, 'Converged after ', 2*it, ' Iterations in ', t2 - t1, $ ' secs ' endif c call MPI_FINALIZE(ierr) end pasa los puntos fantasma barrida por toda la red

uk+1i,j = ¼ (uki-1,j + uki,j+1 + uki,j-1 + uki+1,j –h2 fi,j) iteración Subrutina sweep c c Perform a Jacobi sweep for a 1-d decomposition. c Sweep from a into b c subroutine sweep1d( a, f, nx, s, e, b ) integer nx, s, e double precision a(0:nx+1,s-1:e+1), f(0:nx+1,s-1:e+1), + b(0:nx+1,s-1:e+1) c integer i, j double precision h c h = 1.0d0 / dble(nx+1) do 10 j=s, e do 10 i=1, nx b(i,j) = 0.25 * (a(i-1,j)+a(i,j+1)+a(i,j-1)+a(i+1,j)) - + h * h * f(i,j) 10 continue return end

3/3 sincroniza y mide el tiempo c c Actually do the computation. Note the use of a collective operation to c check for convergence, and a do-loop to bound the number of iterations. c call MPI_BARRIER( MPI_COMM_WORLD, ierr ) t1 = MPI_WTIME() do 10 it=1, 100 c do 10 it=1, 2 call exchng1( a, nx, s, e, comm1d, nbrbottom, nbrtop ) call sweep1d( a, f, nx, s, e, b ) call exchng1( b, nx, s, e, comm1d, nbrbottom, nbrtop ) call sweep1d( b, f, nx, s, e, a ) dwork = diff( a, b, nx, s, e ) call MPI_Allreduce( dwork, diffnorm, 1, MPI_DOUBLE_PRECISION, $ MPI_SUM, comm1d, ierr ) if (diffnorm .lt. 1.0e-5) goto 20 c if (myid .eq. 0) print *, 2*it, ' Difference is ', diffnorm 10 continue if (myid .eq. 0) print *, 'Failed to converge' 20 continue t2 = MPI_WTIME() if (myid .eq. 0) then print *, 'Converged after ', 2*it, ' Iterations in ', t2 - t1, $ ' secs ' endif c call MPI_FINALIZE(ierr) end pasa los puntos fantasma barrida por toda la red check de convergencia

Notas finalesqué quedó afuera? • Bibliotecas (2 buenas y documentadas): • ScaLAPACK (Scalable LAPACK) - originada en F77 • PBLAS (Parallel BLAS) • PETSc (Portable Extensible Toolkit for Sci. Comp.) - diseñada para C • Profiling (medición de performance) • No hay un standard. La medición afecta el tiempo de cálculo! • Problemas de buffering • Balanceo de carga • Debugging (www.lam-mpi.org/software/xmpi/) • Bibliografia y links • Using MPI (W. Gropp, E. Lusk y A. Skjellum) • Parallel Programming with MPI • www.mpi-forum.org/; www.mcs.anl.gov/mpi; www.erc.msstate.edu/mpi

Titulo

Titulo

Presentation Transcript

Titulo

titulo

Titulo

≪ Titulo ≫

TITULO

Titulo

Titulo

TITULO

TITULO

TITULO

TITULO

TITULO

Titulo

TITULO

TITULO

Titulo

TITULO

TITULO

TITULO

TITULO

TITULO TITULO TITULO TITULO TITULO TITULO TITULO...

TITULO