1 / 19

Task Farming on HPCx

Task Farming on HPCx. David Henty HPCx Applications Support d.henty@epcc.ed.ac.uk. What is Task Farming?. Many independent programs (tasks) running at once each task can be serial or parallel “independent” means they don’t communicate directly

laasya
Download Presentation

Task Farming on HPCx

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Task Farming on HPCx David Henty HPCx Applications Support d.henty@epcc.ed.ac.uk

  2. What is Task Farming? • Many independent programs (tasks) running at once • each task can be serial or parallel • “independent” means they don’t communicate directly • Common approach for using cycles in a loosely-connected cluster • how does it relate to HPCx and Capability Computing? • Often needed for pre or post-processing • Tasks may contribute to a single, larger calculation • parameter searches or optimisation • enhanced statistical sampling • ensemble modelling HPCx User Group

  3. Worker 1 Worker 2 Controller Worker 3 Worker 4 Classical Task Farm • A single parallel code (eg written in MPI) • one process is designated as the controller • the rest are workers input output HPCx User Group

  4. Characteristics • Pros • load balanced for sufficiently many tasks • can use all of HPCx (using MPI) • Cons • must write a new parallel code • potential waste of a CPU if controller is not busy • each task must be serial, ie use a single CPU • Approach • find an existing task farm harness on the WWW HPCx User Group

  5. Worker 5 Worker 1 Worker 2 Worker 3 Worker 4 Shared Counter • Tasks are numbered 1, 2, ... maxTask • shared counter requires no CPU time output 1 1 5 2 Counter 4 3 6 7 output 3 output 6 HPCx User Group

  6. Characteristics • Pros • load-balanced • don’t have to designate a special controller • Cons • very much a shared-memory model • easy to scale up to a frame (32 CPUs) with OpenMP • harder to scale to all of HPCx • need to write a new parallel program HPCx User Group

  7. Task Farming Existing Code • Imagine you have a pre-compiled executable • and you simply want to run P copies on P processors • common in parameter searching or ensemble studies • can be done via poe but is non-portable • Possible to launch a simple MPI harness • each process does nothing but run the executable • easy to do via “system(commandstring)” • Have written a general-purpose harness • called taskfarm • see /usr/local/packages/bin/ HPCx User Group

  8. Controlling the Task Farm • Need to allow the tasks to do different things • each task assigned unique MPI rank: 0, 1, 2, ..., P-2, P-1 • I have hijacked the C “%d” printf syntax • taskfarm “echo hello from task %d” • command string is run as-is on each processor • except with %d replaced by the MPI rank • On 3 CPUs: hello from task 0 hello from task 1 hello from task 2 HPCx User Group

  9. Verbose Mode taskfarm -v "echo hello from task %d" taskfarm: called with 5 arguments: echo hello from task %d taskfarm: process 0 executing "echo hello from task 0“ taskfarm: process 1 executing "echo hello from task 1" taskfarm: process 2 executing "echo hello from task 2" hello from task 0 hello from task 1 hello from task 2 taskfarm: return code on process 0 is 0 taskfarm: return code on process 1 is 0 taskfarm: return code on process 2 is 0 • Could also report where task is running • ie the name of the HPCx frame HPCx User Group

  10. Use in Practice • Need tasks to use different input and output files taskfarm "cd rundir%d; serialjob < input > output.log" • or taskfarm "serialjob < input.%d > output.%d.log” • Pros • no new coding, and taskfarm also relatively portable • Cons • no load balancing: single job per run • Extensions • do more tasks than CPUs, aiming for load balance? • dedicated controller makes this potentially messy HPCx User Group

  11. Implement Shared Counter in MPI-2 • Could be accessed as a library function: do task = gettask() if (task .ge. 0) then call serialjob(task) end if while (task .ge. 0) • or via an extended harness taskfarm -n 150 “serialjob < input.%d > output.%d.log” • Would run serial jobs on all available processors until all 150 had been completed • potential for load-balancing with more tasks than processors • work in progress! HPCx User Group

  12. Multiple Parallel MPI Jobs • What is the issue in HPCx? • poe picks up number of MPI processes directly from the Loadleveler script • can only have a single global MPI job running at once • Cannot do mpirun mpijob -nproc 32 & mpirun mpijob -nproc 32 & mpirun mpijob -nproc 32 & mpirun mpijob -nproc 32 & wait • unlike on many other systems like Sun, T3E, Altix, ... HPCx User Group

  13. Using taskfarm • taskfarm is a harness implemented in MPI • cannot use it to run MPI jobs • but can run jobs parallelised with some other method, eg threads • To run 4 copies of a 32-way OpenMP job: • export OMP_NUM_THREADS=32 • taskfarm "openmpjob < input.%d > output.%d.log" • Controlling the OpenMP parallelism • how to ensure that each OpenMP job runs on a separate frame? • need to select 4 MPI tasks but place only one on each node #@ cpus=4 #@ tasks_per_node=1 HPCx User Group

  14. Real Example: MOLPRO • An ab-initio quantum chemistry package • parallelised using the Global Array (GA) Tools library • on HPCx, normal version of GA Tools uses LAPI • LAPI requires poe: same problems for taskfarm as with MPI • But ... • there is an alternative implementation of GA Tools • uses the TCMSG messaging library ... • which is implemented using Unix sockets,not MPI • Not efficient over the switch • but probably fine on a node, ie up to 32 processors HPCx User Group

  15. Running MOLPRO as Parallel Taskfarm • TCMSG parallelism specified on command line • to run 6 MOLPRO jobs each using 16 CPUs • ie 2 jobs per frame on a total of 3 frames #@ cpus=6 #@ tasks_per_node=2 taskfarm “molpro -n 16 < input.%d.com > output.%d.out” • Completely analogous to taskfarming OpenMP jobs • MOLPRO can now be used to solve many different problems simultaneously • which may not individually scale very well HPCx User Group

  16. Multiple Parallel MPI Jobs • So far have seen ways of running the following (where simple means no load balancing) • general serial task farm requiring new parallel code • simple serial task farm of existing program(s) • potential for general serial task farm of existing program(s) • simple parallel (non-MPI) task farms with existing program(s) • What about task farming parallel MPI jobs? • eg four 64-way MPI jobs in a 256 CPU partition • requires some changes to source code • but potentially not very much HPCx User Group

  17. Communicator Splitting • (Almost) every MPI routine takes a communicator • usually MPI_COMM_WORLD but can be a subset of processes call MPI_Init(ierr) comm = MPI_COMM_WORLD call MPI_Comm_size(comm, ...) call MPI_Comm_rank(comm, ...) if (rank .eq. 0) & write(*,*) 'Hello world‘ ! now do the work ... call MPI_Finalize(ierr) call MPI_Init(ierr) bigcomm = MPI_COMM_WORLD comm = split(bigcomm,4) call MPI_Comm_size(comm, ...) call MPI_Comm_rank(comm, ...) if (rank .eq. 0) & write(*,*) 'Hello world' ! now do the work ... call MPI_Finalize(ierr) HPCx User Group

  18. Issues • Each group of 64 processors lives in its own world • each has ranks 0 – 63 and its own master, rank = 0 • must never directly reference MPI_COMM_WORLD • Need to allow for different input and output files • use different directories for minimum code changes • can arrange for each parallel task to run in a different directory using clever scripts • How to split the communicator appropriately? • can be done by hand with MPI_Comm_split • the MPH library gives users some help • If you’re interested, submit a query! HPCx User Group

  19. Summary • Like any parallel computer, HPCx can run parallel taskfarm programs written by hand • However, usual request are: • multiple runs of existing serial program • multiple runs of existing parallel program • These can both be done with the taskfarm harness • limitations on tasks’ parallelism (must be non-MPI) • currently no load-balancing • Task farming MPI code requires source changes • but can be quite straightforward in many cases • eg ensemble modelling with Unified Model HPCx User Group

More Related