1 / 23

Assignment 2

Learn how to create parallel programs on the cluster at UNCW using Paraguin and submit jobs through the Sun Grid Engine (SGE) scheduler. Compile and run programs using job submission files. Manage job status and delete jobs as needed.

ktran
Download Presentation

Assignment 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Assignment 2 Using Paraguin to Create Parallel Programs

  2. Cluster at UNCW User Computers Dedicated Cluster Ethernet interface Submit Host: babbage Master node Head Node: harpua Switch Compute nodes Compute Nodes: compute-0-0, compute-0-1, compute-0-2, …

  3. Cluster at UNCW • We use the Sun Grid Engine (SGE) to schedule jobs on the cluster • This is to allow users to have exclusive use of the compute nodes so that users’ applications don’t interfere with the performance of others • The scheduler (SGE) is responsible for allocating compute nodes to jobs exclusively • Compile as normal: $ mpicc hello.c –o hello

  4. SGE • But running is done through a job submission file • Some SGE commands: • qsub <job submission file> – submits a job to the schedule to run • qstat – see the status of submitted jobs (waiting, queued, running, terminated, etc.) • qdel <#> - deletes a job (by number) from the system • qhost – see a list of hosts

  5. SGE • Example job submission file (hello.sge): #!/bin/sh # Usage: qsub hello.sge #$ -S /bin/sh #$ -peorte 16 # Specify how many processors we want # -- our name --- #$ -N Hello # Name for the job #$ -l h_rt=00:01:00 # Request 1 minute to execute #$ -cwd # Make sure that the .e and .o file arrive in the working directory #$ -j y # Merge the standard out and standard error to one file mpirun -np $NSLOTS ./hello

  6. SGE • Example job submission file (hello.sge): #!/bin/sh # Usage: qsub hello.sge #$ -S /bin/sh #$ -peorte 16 # Specify how many processors we want

  7. SGE • Example job submission file (hello.sge): # -- our name --- #$ -N Hello # Name for the job #$ -l h_rt=00:01:00 # Request 1 minute to execute The name of the job plus the name of the output files: Hello.o### and Hello.op### Indicates that the job will need only a minute. This is important so that SGE will clean up if the program hangs or terminates incorrectly. May need to increase the time for longer programs or it will terminate the program before it has completed.

  8. SGE • Example job submission file (hello.sge): #$ -cwd # Make sure that the .e and .o file arrive in the working directory #$ -j y # Merge the standard out and standard error to one file Do the job in the current directory SGE will create 3 files: Hello.o##, Hello.e##, and Hello.op##. The –j y command will merge the Hello.o and Hello.e files (std out and error).

  9. SGE • Example job submission file (hello.sge): mpirun -np $NSLOTS ./hello And finally the command to run the MPI program. $NSLOTS is the same number given with the #$ -pe orte 16 line.

  10. SGE Example $ qstat $ qsub hello.sge Your job 106 ("Hello") has been submitted $ qstat job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 106 0.00000 Hello cferner qw 09/04/2012 09:08:38 16 $ The state of “qw” means queued and waiting.

  11. SGE Example $ qstat job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 106 0.55500 Hello cferner r 09/04/2012 09:11:43 all.q@compute-0-0.local 16 [cferner@babbage mpi_assign]$ The state of “r” means running

  12. SGE Example $ ls hello hello.c Hello.o106 Hello.po106 hello.sge ring ring.c ring.sge test test.c test.sge $ cat Hello.o106 Hello world from master process 0 running on compute-0-2.local Message from process = 1 : Hello world from process 1 running on compute-0-2.local Message from process = 2 : Hello world from process 2 running on compute-0-2.local … You will want to clean up the output files when you are done with them or you will end up with a bunch of clutter.

  13. Deleting a job $ qstat job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 108 0.00000 Hello cferner qw 09/04/2012 09:18:20 16 $ qdel 108 cferner has registered the job 108 for deletion $ qstat $

  14. Assignment 2 Setup (Do this only once) • Put these lines in the file .bash_profile export MACHINE=x86_64-redhat-linux export SUIFHOME=/share/apps/suifhome export COMPILER_NAME=gcc `perl $SUIFHOME/setup_suif -sh` • Run the command: $ . .bash_profile • Notice the 2 periods and the space between them

  15. Hello World Program • Program is given to you • You simply need to compile it and run it (using a job submission file) • Try running it on my processors • Produce documentation of compiling and running the program

  16. Matrix Multiplication • Matrix Multiplication skeleton program is given to you in Appendix • Includes: • Opening the input file • Reading the input • Taking a time stamp • Taking a 2nd time stamp • Computing the elapsed time between the time stamps • Printing the results

  17. Matrix Multiplication • You need to: • Broadcast the error to the processors and exit in necessary • Scatter the input • Compute the partial results • Gather the partial results

  18. Heat Distribution • Using the stencil pattern, model the distribution of heat in a room that has a fireplace along one wall

  19. Heat Distribution • The newly computed values will be the average of its neighbors (diagonals also) as well as its own old value • So each value at location i,j should be the average of 9 values • This reduces oscillations

  20. Producing a Visual of the Output Produced with X11 Graphics Produced with Excel

  21. Producing a Visual of the Output • See the document http://coitweb.uncc.edu/~abw/ITCS4145F13/Assignments/X11GraphicsNotes.pdf for help with creating graphics using X11. • The Excel Graph is a surface plot

  22. Monte Carlo Estimation of π(required for Graduates/optional for Undergraduates) • Scatter/Gather pattern, but uses broadcast and reduce • This is not a workflow pattern • π can also be estimated by integrating the function , but you aren’t asked to do this.

  23. Questions?

More Related