1 / 47

Condor Tutorial NCSA Alliance ‘98

Condor Tutorial NCSA Alliance ‘98. Presented by: The Condor Team University of Wisconsin-Madison Email: condor-admin@cs.wisc.edu URL: http://www.cs.wisc.edu/condor. Welcome to the Condor Tutorial!. Introductions What is Condor ? A system for High Throughput Computing.

Download Presentation

Condor Tutorial NCSA Alliance ‘98

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Condor TutorialNCSA Alliance ‘98 Presented by: The Condor Team University of Wisconsin-Madison Email: condor-admin@cs.wisc.edu URL: http://www.cs.wisc.edu/condor

  2. Welcome to the Condor Tutorial! • Introductions • What is Condor ? • A system for High Throughput Computing Condor Tutorial, NCSA Alliance '98, April 27th 1998

  3. The “Religion” behind High Throughput Computing Key Concepts: • High Throughput Computing (HTC) • Distributively owned resources Condor Tutorial, NCSA Alliance '98, April 27th 1998

  4. Performance vs.Throughput • High Performance - Very large amounts of processing capacity over short time periods (FLOPS - Floating Point Operations Per Second) • High Throughput - Large amounts of processing capacity sustained over very long time periods (FLOPY - Floating Point Operations Per Year) FLOPY 30758400*FLOPS Condor Tutorial, NCSA Alliance '98, April 27th 1998

  5. Distributed Ownership • Due to dramatic decrease in the cost-performance ratio of hardware, powerful computing resources are owned today by individuals, groups, departments, … • Huge increase in the aggregate processing capacity owned by the organization • Much smaller increase in the capacity accessible by a single person Condor Tutorial, NCSA Alliance '98, April 27th 1998

  6. The Challenge and Motivation behind Condor Turn large collections of existing distributively owned (and perhaps non-dedicated) computing resources into effective High Throughput Computing Environments Minimize Wait while Idle Condor Tutorial, NCSA Alliance '98, April 27th 1998

  7. Road Block: Sociology Make owners (& system administrators) happy. • Give owners full control on • when and by whom private resources are used for HTC • impact of HTC on private Quality of Service • membership and information on HTC related activities • No changes to existing software and make it easy • to install, configure, monitor, and maintain Happy owners  more resources  higher throughput Condor Tutorial, NCSA Alliance '98, April 27th 1998

  8. Road Block: Robustness To be effective, a HTC environment must run as a 24-7-365 operation. • Customers count on it • Debugging and fault isolation may be a very time consuming processes • In a large distributed system, everything that might go wrong will go wrong. Robust system  less down time  higher throughput Condor Tutorial, NCSA Alliance '98, April 27th 1998

  9. Road Block: Portability To be effective, the HTC software must run on and support the latest greatest hardware and software. • Owners select hardware and software according to their needs and tradeoffs • Customers expect it to be there. • Application developer expect only few (if any) changes to their applications. Portability  more platforms higher throughput Condor Tutorial, NCSA Alliance '98, April 27th 1998

  10. Condor’s unique mechanisms for HTC • Matchmaking - enables requests for services and offers to provide services to find each other. • Checkpointing - enables preemptive resume scheduling (go ahead and use it as long as it is available!). • Remote I/O - enables remote (from execution site) access to local (at submission site) data. Condor Tutorial, NCSA Alliance '98, April 27th 1998

  11. Condor Viewpoints • Owner • Creates resource offers • User • Creates resource requests • Administrator • Drinks Coffee • Manages the pool-wide configuration • Could also be the Owner Condor Tutorial, NCSA Alliance '98, April 27th 1998

  12. Condor Agents • Condor Resource Agent • condor_startd daemon • allows a machine to execute Condor jobs • enforces owner policy • Condor User Agent • condor_schedd daemon • allows a machine to submit jobs to a pool Condor Tutorial, NCSA Alliance '98, April 27th 1998

  13. Central Manager The Tutorial Installation Alliance ‘98 Pool schedd Your Workstation startd Condor Tutorial, NCSA Alliance '98, April 27th 1998

  14. Central Manager The Tutorial Installation Central Manager UW-Madison Pool Alliance ‘98 Pool schedd schedd Your Workstation startd Condor Tutorial, NCSA Alliance '98, April 27th 1998

  15. Hands-on:Example #1Joining the UW-Madison CS Condor Pool as a Submit-only node Condor Tutorial, NCSA Alliance '98, April 27th 1998

  16. Overview of Submitting a Job to Condor • Create a Submit-Description File • Run condor_compile to relink your program with the Condor Libraries, if Condor’s Checkpointing or Remote I/O support is desired • Run condor_submit • sends your request to the User Agent (condor_schedd) Condor Tutorial, NCSA Alliance '98, April 27th 1998

  17. Condor System Structure Condor Tutorial, NCSA Alliance '98, April 27th 1998

  18. Hands-on:Example #2Submit Jobs to Condor Condor Tutorial, NCSA Alliance '98, April 27th 1998

  19. Condor Universes A Universe specifies a Condor runtime environment: • STANDARD • Supports Checkpointing • Supports Remote System Calls • Has some limitations…. • VANILLA • Any Unix executable (shell scripts, etc) • No Condor Checkpointing or Remote I/O Condor Tutorial, NCSA Alliance '98, April 27th 1998

  20. Hands-on:Example #3Tour of User Tools/Commands Condor Tutorial, NCSA Alliance '98, April 27th 1998

  21. User Priorities in Condor • Each active user in the pool has a user priority • Viewed or changed with condor_userprio • Like golf: the lower, the better • A given user’s share of available machines is inversely related to the ratio between user priorities. • Example: Fred’s priority is 10, Joe’s is 20. Fred will be allocated twice as many machines as Joe. Condor Tutorial, NCSA Alliance '98, April 27th 1998

  22. User Priorities in Condor, cont. • Condor continuously adjusts user priorities over time • machines allocated > priority, priority worsens • machines allocated < priority, priority improves • Priority Preemption • Higher priority users will grab machines away from lower priority users (thanks to Checkpointing…) • Starvation is prevented • Priority “thrashing” is prevented Condor Tutorial, NCSA Alliance '98, April 27th 1998

  23. Parallel Jobs in CondorCondor can run parallel applications ( written to the popular PVM message passing library ) Condor Tutorial, NCSA Alliance '98, April 27th 1998

  24. Master-Worker Paradigm Condor-PVM is designed to run PVM applications which follow the master-worker paradigm. • Master • has a pool of work, sends pieces of work to the workers, manages the work and the workers • Worker • gets a piece of work, does the computation, sends the result back Condor Tutorial, NCSA Alliance '98, April 27th 1998

  25. What does Condor-PVM do? Condor acts as the PVM resource manager. • All pvm_addhost requests get re-mapped to Condor. • Condor dynamically constructs PVM virtual machines out of non-dedicated desktop machines. • When a machine leaves the pool, the user gets notified via the normal PVM notification mechanisms. Condor Tutorial, NCSA Alliance '98, April 27th 1998

  26. How to compile and submit Condor-PVM jobs • Binary Compatible • Compile and link with PVM library just as normal PVM applications. No need to link with Condor. • Submit In the submit file set: universe = PVM machine_count = <min>..<max> Condor Tutorial, NCSA Alliance '98, April 27th 1998

  27. Classified Advertisements • ClassAds • Language for expressing attributes • Semantics for evaluating them • Intuitively, a ClassAd is a set of named expressions • Each named expression is an attribute • Expressions are similar to C … • Constants, attribute references, operators Condor Tutorial, NCSA Alliance '98, April 27th 1998

  28. MyType = "Machine" TargetType = "Job" Name = "froth.cs.wisc.edu" StartdIpAddr="<128.105.73.44:33846>" Arch = "INTEL" OpSys = "SOLARIS251" VirtualMemory = 225312 Disk = 35957 KFlops = 21058 Mips = 103 LoadAvg = 0.011719 KeyboardIdle = 12 Cpus = 1 Memory = 128 Requirements = LoadAvg <= 0.300000 && KeyboardIdle > 15 * 60 Rank = 0 Classified Advertisements: Example Condor Tutorial, NCSA Alliance '98, April 27th 1998

  29. Classified Advertisements: Matching • ClassAds are always considered in pairs Does ClassAd A match ClassAd B (and vice versa)? Condor Tutorial, NCSA Alliance '98, April 27th 1998

  30. ClassAd A MyType = "Apartment" TargetType = "ApartmentRenter" SquareArea = 3500 RentOffer = 1000 HeatIncluded = False OnBusLine = True Rank = UnderGrad==False + TARGET.RentOffer Requirements = MY.RentOffer - TARGET.RentOffer < 150 ClassAd B MyType = "ApartmentRenter" TargetType = "Apartment" UnderGrad = False RentOffer = 900 Rank = 1/(TARGET.RentOffer + 100.0) + 50*HeatIncluded Requirements = OnBusLine && SquareArea > 2700 Classified Advertisements: Examples Condor Tutorial, NCSA Alliance '98, April 27th 1998

  31. ClassAds in the Condor System • ClassAds allow Condor to be a general system • Constraints and ranks on matches expressed by entities themselves • Only priority logic integrated into Manager • All principal entities in the Condor system are represented by ClassAds • Machines, Jobs, Submitters Condor Tutorial, NCSA Alliance '98, April 27th 1998

  32. ClassAds in Condor: Requirements and Rank (Example) Friend = Owner == "tannenba" || Owner == "wright" ResearchGroup = Owner == "jbasney" || Owner == "raman" Trusted = Owner != "rival" && Owner != "riffraff" Requirements = Trusted && ( ResearchGroup || LoadAvg < 0.3 && KeyboardIdle > 15*60 ) Rank = Friend + ResearchGroup*10 Condor Tutorial, NCSA Alliance '98, April 27th 1998

  33. Hands-on:Example #4Submit Jobs with ClassAd Constraints Condor Tutorial, NCSA Alliance '98, April 27th 1998

  34. Resource Owner’s Viewpoint Owner is King • In Condor, the owner of the resource (machine owner) can dictate the terms and conditions under which that resource can be used • How? Configure the Resource Agent’s Policy (condor_startd configuration) Condor Tutorial, NCSA Alliance '98, April 27th 1998

  35. Resource Agent ConfigurationExpressions • START expression • When TRUE, Condor can start a job • True = Unclaimed State • False = Owner State • SUSPEND expression • When TRUE, Condor suspends any job running on this machine • CONTINUE expression • When TRUE, will continue a suspended job Condor Tutorial, NCSA Alliance '98, April 27th 1998

  36. Resource Agent Configuration Expressions, cont. • VACATE expression • When TRUE, kick the job off of the machine (via a Checkpoint if possible) • KILL expression • When TRUE, kill the job immediately • No Checkpoint • On UNIX: a “kill -9” Condor Tutorial, NCSA Alliance '98, April 27th 1998

  37. START True True True True True WANT SUSPEND False False SUSPEND WANT VACATE VACATE KILL Resource Agent Configuration Expressions, Cont. Condor Tutorial, NCSA Alliance '98, April 27th 1998

  38. Resource Agent Configuration Expressions, cont. • Default Setup WANT_VACATE : True WANT_SUSPEND : True START : Keyboard_Idle && CPU_Idle SUSPEND : Keyboard_Busy || CPU_Busy CONTINUE : Keyboard and CPU idle again VACATE : If Suspended > 10 minutes KILL : If spent > 10 minutes in VACATE state Condor Tutorial, NCSA Alliance '98, April 27th 1998

  39. Hands-on:Example #5UW-Madison CS Pool Startd Policy Condor Tutorial, NCSA Alliance '98, April 27th 1998

  40. Condor Administrator Features • The condor_master is the administrator’s best friend • Watches/restarts other daemons • Sends Email if notices suspicious problems • Runs condor_preen • Provides administrator remote control Condor Tutorial, NCSA Alliance '98, April 27th 1998

  41. Condor Administrator Commands • Administrator Commands • condor_off [ hostname … ] • Down entire pool: condor_off `cat machines-file` • condor_on • condor_restart • condor_reconfig (“on-the-fly” reconfiguration) • condor_vacate • These commands could be used by the Owner as well, if desired Condor Tutorial, NCSA Alliance '98, April 27th 1998

  42. Condor Host-based Access Control • HOST_ALLOW and HOST_DENY to grant machines (subnets, domains) different access levels: • READ access • WRITE access • ADMINISTRATOR access • OWNER access Condor Tutorial, NCSA Alliance '98, April 27th 1998

  43. Example: Simple Host-based Access Control HOSTDENY_READ = *.mil HOSTALLOW_WRITE = *.ncsa.uiuc.edu HOSTDENY_WRITE = ppp*.ncsa.uiuc.edu, 172.44.* HOSTALLOW_ADMINISTRATOR = bigcheese.ncsa.uiuc.edu HOSTALLOW_OWNER = $(FULL_HOSTNAME), $(HOSTALLOW_ADMINISTRATOR) Condor Tutorial, NCSA Alliance '98, April 27th 1998

  44. Configuration File Hierarchy • condor_config • Pool-wide default • Condor pool administrator’s requirements • condor_config.local • Overrides for a specific machine • Reflects Owner’s requirements • condor_config.root • System Administrator requirements Condor Tutorial, NCSA Alliance '98, April 27th 1998

  45. Future Directions • Condor for Windows NT • SMP support • More parallel job support • Checkpoint parallel jobs • MPI, MPI-2 • Flocking … Condor Tutorial, NCSA Alliance '98, April 27th 1998

  46. Obtaining Condor • Condor can be downloaded from the Condor web site at: http://www.cs.wisc.edu/condor • Complete Users and Administrators manual available http://www.cs.wisc.edu/condor/manual • Contracted Support is available • Questions? Email : condor-admin@cs.wisc.edu Condor Tutorial, NCSA Alliance '98, April 27th 1998

  47. Thank You!! Thank you for your interest! The Condor Team: Miron Livny Marvin Solomon Todd Tannenbaum Derek Wright Bin Song Rajesh Raman Tom Stanis Jim Basney Adiel Yoaz Condor Tutorial, NCSA Alliance '98, April 27th 1998

More Related