1 / 51

Getting Started With KeLP 1.4 (Abstract KeLP)

Getting Started With KeLP 1.4 (Abstract KeLP). A Practical Introduction Daniel Shalit, Ph.D. Dept of Computer Science and Engineering UCSD. Outline. What is KeLP? Model Problem Why Use KeLP? Further Information. What is KeLP ?. KeLP = Ke rnal L attice P arallelism

dee
Download Presentation

Getting Started With KeLP 1.4 (Abstract KeLP)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Getting Started With KeLP 1.4(Abstract KeLP) A Practical Introduction Daniel Shalit, Ph.D. Dept of Computer Science and Engineering UCSD

  2. Outline • What is KeLP? • Model Problem • Why Use KeLP? • Further Information

  3. What is KeLP ? • KeLP = Kernal Lattice Parallelism • A C++ framework for rapid development of high performance parallel applications • Structured blocked N-dimensional data • Run-time decomposition • Run-time communication optimization • Allows you to think (and write code) in terms of your problem geometry • Version 1.4 available for IBM SP*, Cray T3E, Sun HPC, single processor workstations, and clusters using g++

  4. What is KeLP ? • Every KeLP main program must be written in C++ ... • This follows from KeLP’s definition as a framework and its implementation as a C++ class library • ...But the lower levels of your application can be written in whatever language you like. • C and Fortran are easy • Most existing KeLP applications are wrappers around Fortran numerics. • You can get by with minimal knowledge of C++ • “Paint by Numbers”

  5. Scope of KeLP Applications • Computational domains comprising a collection of boxes in Cartesian space. • This information can be determined at run time, and may vary dynamically.

  6. MetaData and Real Data

  7. What KeLP Demands from the User • In a word, Data Structures • An N-Dimensional Array is supplied as a default. If that’s good enough for your application you can doze through the next couple of slides. • Some users want more flexibility • “Use Grid or go home” • B.VanStraalen, SIAM 2000 • In this release we introduce The Patch Abstraction • a template (specification) for users to follow in designing their own data holder classes.

  8. The Patch Abstraction(the Abstract in Abstract KeLP) A Patch Class Must Provide • Constructor • May take an arbitrary number of arguments • Functions to • Copy data • Serialize (pack and unpack) data into and out of buffers • Query the size of data • Query the region, bounds, and number of components in a Patch • For details, see http://www-cse.ucsd.edu/groups/hpcl/scg/KeLP1.4/AbstractKeLP.html

  9. Data Decomposition KeLP provides structural abstractions to help manage irregular block-structured data decompositions at run time. Two fundamental classes: • Region: an abstract object representing an index space, a rectangular subset of Zn ( i.e. a bounding box) • FloorPlan: an array of Regions with processor assignments, representing a blocked data decomposition.

  10. KeLP’s communication model • Form logical groupings of distributed data which may have a many-to-one processor mappings: FloorPlan and XArray • Generate a data dependence descriptor: MotionPlan • Bind the MotionPlan to one or two XArrays: Mover • Use a geometric Region Calculus to build the groupings and the descriptors

  11. Region Abstraction • Region: bounding box in multidimensional index space • A geometric calculus to manipulate the regions • Note: * = Intersection operator

  12. Data DecompositionAggregate Abstractions • FloorPlan: a table of regionsand their assignment to processors (owners) • XArray: a distributed collection of data holder objects instantiated over a FloorPlan (many to one assignments)

  13. Data MotionDependence descriptors • Geometric description of data dependencies • FloorPlan captures the block structure of the problem • MotionPlan: encodes the dependencies among the blocks

  14. The KeLP Mover • A persistent communication object, but more powerful than MPI’s • update-moves, inspector-executor analysis, non-blocking collective communication • Uses the communication pattern encoded in the MotionPlan • The Mover instantiates a MotionPlan much as an XArray instantiates a FloorPlan • Currently implemented on top of MPI point to point message passing

  15. Stommel Model Using KeLPEvery KeLP Program Must Have: #include "kelp.h” int main(int argc,char **argv) { MPI_Init(&argc,&argv); InitKeLP(argc,argv); . // my program goes here . MPI_Finalize(); return(0); }

  16. Program Summary Region2 domain = [1 : N, 1 : N] Region1 fields = [1:2] FloorPlan T = block partition of domain For each subdomain T(i) in T T.setregion(i) T.setowner(i) grow(T(i),1) // space for ghost cells XArray2 grid(T,fields); XArray1 force(TF) MotionPlan M InitMotionPlan(T,M) Mover Mvr(grid,grid,M) Initialize

  17. Summary - Continued do Mvr.execute( ) // exchange ghost cells ComputeLocal(grid, fields, force) Swap Old with New end

  18. Processor Geometry // Set the processor array as close to square as the total number of processors will allow. for (int j = NDIM;j>1;j--) { int i = 1; while (pow(i,j) <= nProcs){ if ((nProcs%i) == 0) alloc = i++; } nrows = alloc; nProcs /= alloc; ncols = nProcs; }

  19. What Did That Do? Nprocs Nrows Ncols 4 2 2 6 2 3 30 5 6 (Not 2 x 15 or 3 x 10)

  20. Data Decomposition: Building the FloorPlan const int NREGIONS = mpNodes(); Region2 R; // computational grid Region1 RF // force terms FloorPlan2 T(NREGIONS); FloorPlan1 TF(NREGIONS); // Processor geometry is Nrows x Ncols // Problem size is N x N

  21. Data Decomposition: Building the FloorPlan int r = 0; for(int px = 0 ; px < Nrows; px++){ for(int py = 0 ; py < Ncols; py++){ xLow = 1+px*N/Nrows; xHi = Min( (px+1)*N/Nrows, N); yLow = 1+py*N/Ncols; yHi = Min( (py+1)*N/Ncols, N); R.setregion(xLow, yLow, xHi, yHi); RF.setregion(yLow,yHi) T.setregion (r,R); TF.setregion(r,RF); r++} }

  22. Data Decomposition: Processor Assignments // Map the regions to processors until all regions are mapped. Mapping is known on each processor int p = 0 ,r = 0 ; while (r < NREGIONS) { T.setowner(r,p); TF.setowner(r,p); r++ ; p++ ; p %= nproc ; }

  23. (101,1) (1,1) (1,68) (1,135) P1 P0 P4 P2 (200,67) (100,134) (100,67) (100,200) (101,135) (101,68) P5 P3 (200,200) (200,134) What Did That Do?Result on 6 Processors ! FloorPlan ! • Size: 6 • 0:[(1,1),(100,67)] • 1:[(101,1),(200,67)] • 2:[(1,68),(100,134)] • 3:[(101,68),(200,134)] • 4:[(1,135),(100,200)] • 5:[(101,135),(200,200)] • !------------!

  24. Grow the Ghost Cells int index; for (indexIterator1 ii(T); ii; ++ii) { index = ii(0); // Grow( ) is a built-in Region calculus operation T.setregion(index,grow(T(index),1)); }

  25. Communications Plan (start) void initMotionPlan(FloorPlan2 &X, MotionPlan2 &M ) { for (indexIterator1 ii(X); ii; ++ii) { int i = ii(0); Region2 inside = grow(X(i), -1); for (indexIterator1 jj(X); jj; ++jj) { int j = jj(0); if (i != j) M.CopyOnIntersection(X,i,X,j,inside); } } }

  26. What did that do? Grow CopyOnIntersection U(i)

  27. Allocate Storage // Allocate the data over the processors // according to the FloorPlan XArray2<Grid2<double> > grid(T,fields); XArray2<Grid1<double> > force(FT);

  28. Executing Communications Mover2<Grid2<double> > pDM1 (grid,grid,M); /* Exchange boundary data with neighboring processors */ pDM1.execute(); Note: • All the hard work has taken place “under the hood.”

  29. Jacobi Iteration void ComputeLocal(XArray2<Grid2<double> >& oldgrid, const int cs, const int ce,XArray1<Grid1<double> >& tfor ) { int i; for (nodeIterator ni(oldgrid); ni; ++ni) { i = ni(); Grid2<double>& OG = oldgrid(i); Grid1<double>& the_for = tfor(i); FortranRegion2 Foldgrid(OG.region()); f_j5relax(OG.data(), FORTRAN_REGION2(Foldgrid), &cs,&ce, &a1,&a2,&a3,&a4,&a5,the_for.data()); } };

  30. Numerical Kernel subroutine j5relax(u,ul0,ul1,uh0,uh1,v,a1,a2,a3,a4,a5,for,norm) double precision u(ul0:uh0,ul1:uh1), v(ul0:uh0,ul1:uh1) double precision for(ul1:uh1) do j = ul1+1, uh1-1 force_term = a5*for(j) do i = ul0+1, uh0-1 v(i,j) = a1*u(i+1,j) + a2*u(i-1,j) + a3*u(i,j+1)+ a4* u(i,j-1) - force_term end do end do return end

  31. Exchange Old with New inline void SwapIndex(int * a, int * b) { int temp = *a; *a = *b; *b = temp; }

  32. Putting It All Together(Main Loop) for (int i= 1; i<STEPS+1; i++) { pDM1.execute(); // Exchange boundary data /* Perform the local jacobi computation */ ComputeLocal(*oldgrid,fields_start, fields_end, *p_force); SwapIndex(&fields_old,&fields_new); }

  33. These tutorials always show the trivial case! How does KeLP handle a realistically complicated geometry? (A multiblock decomposition of Lake Superior Courtesy of Yingxin Pang)

  34. FloorPlan

  35. So What Has KeLP Done For Me? • Nothing much . . . Yet • But • Still need to • Orchestrate the communications • Execute the communications • Here’s why you might want to use KeLP

  36. Communications Plan void initMotionPlan(FloorPlan2 &X, MotionPlan2 &M ) { for (indexIterator1 ii(X); ii; ++ii) { int i = ii(0); Region2 inside = grow(X(i), -1); for (indexIterator1 jj(X); jj; ++jj) { int j = jj(0); if (i != j) M.CopyOnIntersection(X,i,X,j,inside); } } }

  37. Executing Communications /* Exchange boundary data with neighboring processors */ Mover2<Grid2<double> > pDM1 (grid, grid, M); pDM1.execute(); Abli, abli, abli, that’s all folks!

  38. Access To KeLP • If KeLP is not installed on your system, download from http://www–cse.ucsd.edu/groups/hpcl/scg/kelp/ • Follow build and installation instructions in README file

  39. Additional Information • KeLP Software and Documentation Web Page http://www-cse.ucsd.edu/groups/hpcl/scg/kelp/software.html • Users Guide and Reference Manual • Users Guide contains two additional tutorials • Slides of this talk • Including hidden slides with more detail • Example code from this talk: • KeLP Technical Support kelp@cs.ucsd.edu

  40. Fortran Interface • KeLP provides several macros to ease the process of calling Fortran numeric code on KeLP data structures. • All KeLP arrays are currently laid out in Fortran-style column-major order. • See the KeLP documentation for an example of how to use these constructs.

  41. Fortran Interface The ComputeLocal() function performs local Jacobi relaxation on each Grid in an XArray . We now show how to implement this operation by calling a serial Fortran 77 routine from KeLP. #define f_j5relax FORTRAN_NAME(j5relax_, J5RELAX, j5relax) extern "C" { void f_j5relax(double *u, FORTRAN_REGION2 R,int & , …,double *f); }

  42. Exercises • Copy the /examples directory into your guest directory • Type setenv KELP_HOME /usr/local/apps/KeLP1.4 • There’s a lot more here than you can probably do in the allotted time. The rest are to do at home if you’re really interested in learning KeLP.

  43. Exercise 1 Compile, and run the stommel example program. Use the makefile provided. make stommel poe stommel<st.in -nodes A -tasks_per_node B -rmpool 1 Concatinate the individual processor output files perl catfile.pl Plot the output: gnuplot plot_stommel Try it with various numbers of processors and compare the FloorPlans and output.

  44. Exercise 2 Compile and run the Superior example program. make Superior Note that since this program reads in a predetermined FloorPlan it must be run on exactly 8 processors

  45. Exercise 3 Fill in the blanks to complete a KeLP program to • Distribute a 2D array of integers over a rectangular set of subregions • Number of regions must be  Number of processors • Number of elements along one dimension must be proportional to number of subregions along that dimension • Perform a clockwise circular shift by regions and assign the result to a second 2D array.

  46. Exercise 3 Original Array 0 1 2 3 4 5 Shifted Array 0 1 2 3 4 5

  47. Exercise 3 • Use the file examples/Cshift/cshift_exercise.C as a template. • e.g. // a) Declare the FloorPlan2 and assign Regions to FloorPlan elements • Solutions to individual sections are in examples/Cshift/Solutions • Hint: some of these are hard. Others are “who’s buried in Grant’s Tomb?” Do the latter first. • Solutions are not unique. • Complete program is in examples/Cshift/runnable/cshift.C

  48. Exercise 4 • In the complete cshift program, write a nodeIterator loop to do something to the elements of one of your arrays and output the new values. • The output is cut and paste.

  49. Exercise 5Hard but Essential The general form of MotionPlan.Copy(…) is Copy(From XArray,From index,From Region, To XArray,To index, To Region) Write your own MotionPlan in the cshift program to move some data around and execute the plan, i.e. by declaring and executing a Mover.

  50. Exercise 6(probably at home) Using 1. A KeLP PointX is a point (or vector) in X-dimensional space, e.g. (i,j,k) in 3-space. 2. A RegionX can be instantiated by either 2X integer index arguments or two PointX arguments. Point2 LowLeft(l0,l1); Point2 UpRight(u0,u1); Region2(LowLeft,UpRight); Show that the FloorPlan in stommel.C is (mod some load balancing) identical to that shown in slides 19 and 20. First convince yourself by cutting and pasting. Do the “paper and pencil” analysis.

More Related