280 likes | 427 Views
Condor Overview. Bill Hoagland. Condor. Workload management system for compute-intensive jobs Harnesses collection of dedicated or non-dedicated hardware under distributed ownership. Condor History. Developed by University of Wisconsin-Madison Computer Science Department
E N D
Condor Overview Bill Hoagland
Condor • Workload management system for compute-intensive jobs • Harnesses collection of dedicated or non-dedicated hardware under distributed ownership
Condor History • Developed by University of Wisconsin-Madison Computer Science Department • First put into production use 15 years ago • Mature and stable
Condor Availability • Freely available under a BSD style license • Not open source, code is not distributed publicly
Supported Systems • Solaris 8, 9, & 10 (Sparc) • Red Hat & Fedora Core (x86) • MS Windows 2000, XP & 2003 Server (x86) • Mac OS 10.3 & 10.4 (PPC) • Other Unixes (SuSE, AIX, HPUX,Yellow Dog, Debian)
Condor Design • Originally developed for “cycle stealing” from idle machines • Retains robustness to failures and changing availability from this legacy
Condor Goal • “High throughput” vs “High performance” • High performance - fast machines (ie. Cray) • High throughput - many machines, fault tolerant infrastructure (ie. SETI@Home)
Condor Components • Job queueing • Scheduling policy • Priority mechanism • Resource monitoring • Resource management
Condor Highlights • Checkpointing • Checkpointing saves complete running process and I/O state to disk
Checkpointing • Allows recovery from failures • Roll back to the last saved state • Allows process migration • Move saved state and restart
Checkpointing continued • Can compress checkpoint images • Checkpoint mechanism can be used outside of Condor
Checkpointing continued • Some limitations • Single process space • Single kernel thread • Cannot save state of file open for both read and write • Not supported on all platforms
Checkpointing continued • Must have object files • Usually requires no changes • Relink code to include condor library layer, e.g. $ condor_compile gcc -o foo foo.c
Condor Highlights • Remote system calls • Preserves user environment on remote machine • Users need not make files available or have access to remote machine
Condor Highlights • Pools of Machines can be Hooked Together • Jobs submitted to one pool can migrate to a second • Subject to the policies of each pools owner
Condor Highlights • Jobs can be Ordered • Jobs can be ordered because of dependencies easily • Dependencies are described in a directed acyclic graph
Condor Highlights • Condor Enables Grid Computing • Condor has been designed with grid support hooks • Globus controlled resources
Condor Highlights • Sensitive to the Desires of Machine Owners • Machine owners may set almost any usage policy
Condor Highlights • Powerful priority policy mechanism • Requirements and preferences are associated with jobs and machines • A negotiation process matches job requirements then ranks on preferences
Condor Security • Condors purpose is to allow users to run arbitrary code on large numbers of machines • Assumes users are trustworthy
Condor Security continued • Cannot protect against users that can elevate their privileges • Does not run user jobs in sandboxes
Condor Security continued • Can prevent unauthorized access to Condor • Optional authentication e.g. Kerberos, Grid Security Infrastructure (GSI), others
Condor Security continued • Can ensure that user data has not been examined or tampered with • Optional encryption and integrity checking of all network traffic
Condor Backfill • When machine completely idle… • Configure default job • Support for BOINC
Condor Configuration • Controlled by hierarchical config files • Well commented • Human readable • In some cases, more clear than the manual
Condor Adminstration • CondorView • Web based statistics • Machine and user data
Condor Website • http://www.cs.wisc.edu/condor