1 / 24

Master-Worker Tutorial Condor Week 2006

Master-Worker Tutorial Condor Week 2006. Agenda. What is M-W When to use M-W How to build a simple M-W application Q & A. Why M-W?. M-W addresses a weakness in Condor: Short jobs Also, for dynamic, parallel workflows. A Condor Job…. An easy solution:.

taji
Download Presentation

Master-Worker Tutorial Condor Week 2006

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Master-WorkerTutorialCondor Week 2006

  2. Agenda • What is M-W • When to use M-W • How to build a simple M-W application • Q & A

  3. Why M-W? • M-W addresses a weakness in Condor: • Short jobs • Also, for dynamic, parallel workflows

  4. A Condor Job…

  5. An easy solution: • Why not just wrap up smaller jobs into a bigger Condor job? • Partial failures? • Load balancing? • Dynamic creation of work?

  6. Solution: Lightweight TasksMultiplexed on top of Jobs • Process : Thread :: Condor Job : MW Task • MWTask dispatch in milliseconds, Condor job can take minutes

  7. MW is… • C++ Framework • To re-use condor worker jobs • To each run many tasks • Results in very parallel application

  8. MW is not • MPI • General parallel programming scheme

  9. MW in action T Worker Master exe T T T T T T T T T Worker T condor_submit Worker Submit machine

  10. You Must Write 3 Classes Subclasses of … MWDriver MWTask MWWorker Master exe Worker exe

  11. Your_MWTask • Subclass MWTask • Data members for inputs • Data member for results • Serialization of inputs and results • Distinct instances on each side

  12. The Four Task Methods • void MyTask::pack_work(void); • void MyTask::unpack_work(void); • void MyTask::pack_results(void); • void MyTask::unpack_results(void); • Also ctor/dtor!

  13. RMComms • Abstraction for communication • (and some other stuff…) • RMC->pack(int *array, int length); • RMC->unpack(int *array, int length);

  14. MWWorker • Just one method: • executeTask(MWTask *t) • Also ctor/dtor!

  15. MWDriver • get_userinfo(int argc, char **argv) • RMC->add_executable(char *exe, char *requirements); • setup_initial_tasks(int num_tasks, MWTask ***init_tasks) • act_on_completed_task(MWTask *t) • RMC->add_task(MWTask *t) • Also ctor/dtor

  16. Putting it all together:new_skel • ./new_skel MY_PROJECT • Use configure –help for options • make

  17. Debugging with Independent Mode • Special RMComm for debugging • Single process, can run under gdb

  18. Running on the Grid… • Just launch the appropriate master • condor_q to see it in action

  19. Advice for Large Runs • Use personal condor • Flock, glide-in, schedd-on-side, hobblein • Use checkpointing! • Set_worker_increment high

  20. User-level Checkpointing • MWTask::write_chkpt_info(FILE *) • MWTask::read_chkpt_info(FILE *) • MWDriver::read_master_state(FILE *) • MWDriver::write_master_state(FILE *)

  21. Example codes with MW • Matmul • Blackbox • knapsack

  22. MW Philosophy • Reuse either code or concept • Key idea: Late binding

  23. Other resources • http://www.cs.wisc.edu/condor/mw • Online manual • MW-users mailing list

  24. Thank You! Questions? MW Home page: http://www.cs.wisc.edu/condor/mw

More Related