1 / 17

Interactive MPI on Demand

Learn about the philosophy of Unix tools, their composition, and their use in interactive MPI on demand. Discover the benefits of division in Unix processes, such as restartability, better security, and scalability across multi-core systems.

rwillbanks
Download Presentation

Interactive MPI on Demand

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Interactive MPI on Demand

  2. Unix Tool Philosophy • 1) Individual tools do one thing well • 2) Communicate via ascii streams • 3) Are composable

  3. The Paradox • Universal assent that it’s good • No one uses it • (Except for shell one-liners) • grep ^abc| sort | uniq –c | sort –n

  4. More than just shell scripts Division in Unix processes provides: Restartabilty Better security Scalable across multi-core

  5. For example… • Qmail: • Secure, stable • Implemented across ~dozen processes

  6. Getting back to Condor… • Condor uses this in some places • x-Gahp’s • condor_master • Replaceable shadow/starter pairs • Multi_shadow vs. many shadow • But not everywhere • schedd

  7. Condor Daemons as Components • Very Successful strategy: • Glide-in • Personal-condor • “Hoffman” and schedd’s as jobs • Condor-c

  8. Case Study: MPI on Demand • The problem: • Have a pool with lots of machines • Very-long running (weeks) vanilla jobs • Need to run big, but short MPI • Can’t reboot startds • Need Dedicated scheduler • Requires dedicated machines

  9. Possible Solutions • Add “suspension slot” • Requires Reboot • Submit MPI job normally • Preempts vanilla job

  10. COD refresher • COD: Computing On Demand • No Scheduling • No File Transfer • When COD runs, vanilla job suspends • “Checkpoint to swap” • Needs security on to work • Explicitly allowed

  11. Startd as COD job • Overview: • Launch personal condor • Run startds as COD jobs on base pool • Report to personal Condor • Base jobs suspend • Submit parallel job to personal Condor • Remove COD startds

  12. Startd under COD: Details • Two condor_config files: careful! • COD provides no file transfer • Can re-use existing startd binary • Need to pre-stage or NFS config_file • Don’t lose claimid!

  13. Example code • HOSTS=“a b c” • For h in hosts do; • Condor_cod request –name $h > claimid.$h • For n in claimid.* do; • Condor_cod activate –id `cat $n` -jobad ja

  14. Cod JOB_AD • CMD = “/nfs/path/run-startd.sh” • IWD = “/tmp” • Out = “startd.out” • Err = “startd.err” • Universe = 5

  15. Run-startd.sh • Mkdir –p p-condor/{spool,log,execute) • CONDOR_CONFIG=/nfs/new_config • Exec /usr/sbin/condor_master –f -t

  16. Summary • Use condor daemons as components • Mix-and-match as needed

  17. Questions? • Thank You!

More Related