further elements of condor and camgrid
Skip this Video
Download Presentation
Further elements of Condor and CamGrid

Loading in 2 Seconds...

play fullscreen
1 / 30

Further elements of Condor and CamGrid - PowerPoint PPT Presentation

  • Uploaded on

Further elements of Condor and CamGrid. Mark Calleja. What Condor Daemons are running on my machine, and what do they do?. Central Manager. = Process Spawned. collector. schedd. startd. negotiator. master. Condor Daemon Layout. Note: there can also be other, more specialist daemons.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Further elements of Condor and CamGrid' - Anita

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
condor daemon layout
Central Manager

= Process Spawned






Condor Daemon Layout

Note: there can also be other, more specialist daemons

condor master
  • Starts up all other Condor daemons
  • If there are any problems and a daemon exits, it restarts the daemon and sends email to the administrator
  • Checks the time stamps on the binaries of the other Condor daemons, and if new binaries appear, the master will gracefully shutdown the currently running version and start the new version
condor startd


  • Represents a machine to the Condor system
  • Responsible for starting, suspending, and stopping jobs
  • Enforces the wishes of the machine owner (the owner’s “policy”… more on this soon)
condor schedd



  • Represents users to the Condor system
  • Maintains the persistent queue of jobs
  • Responsible for contacting available machines and sending them jobs
  • Services user commands which manipulate the job queue:
    • condor_submit,condor_rm, condor_q, condor_hold, condor_release, condor_prio, …
condor collector




  • Collects information from all other Condor daemons in the pool
    • “Directory Service” / Database for a Condor pool
  • Each daemon sends a periodic update called a “ClassAd” to the collector
  • Services queries for information:
    • Queries from other Condor daemons
    • Queries from users (condor_status)
condor negotiator





  • Performs “matchmaking” in Condor
  • Gets information from the collector about all available machines and all idle jobs
  • Tries to match jobs with machines that will serve them
  • Both the job and the machine must satisfy each other’s requirements
typical condor pool



Regular Node

Regular Node

Central Manager

= Process Spawned


















Typical Condor Pool

= ClassAd



Each daemon maintains its own log; makes for interesting distributed debugging!

job startup
JobJob Startup

Central Manager



Submit Machine

Execute Machine







Syscall Lib

the parallel universe
The Parallel Universe

First, some caveats:

  • Using the PU requires extra set-up on the execute nodes by a sysadmin.
  • Execute hosts willing to run under the PU will only accept jobs from one dedicated scheduler.
  • No flocking!

At its simplest, the PU just launches N identical jobs on N machines.

The job will only run when all machines have been allocated.

For example, the following submit script will run the command “uname –a” on four nodes.


## Example PU submit description file


universe = parallel

executable = /bin/uname

arguments = -a

should_transfer_files = yes

when_to_transfer_output = on_exit

requirements = OpSys == "LINUX" && Arch == "X86_64"

log = logfile

output = outfile.$(NODE)

error = errfile.$(NODE)

machine_count = 4


the parallel universe and mpi
The Parallel Universe and MPI
  • Condor leverages the PU to run MPI jobs.
  • Does this by launching a wrapper on N nodes. Then:
    • N-1 nodes exit (but don’t relinquish claim on node).
    • Wrapper on first node launches MPI job, grabbing all N nodes and calling relevant MPI command (e.g. MPICH2, OpenMPI, etc.)
    • Condor bundles wrappers for various MPI flavours, which you can then modify suit your needs.
    • But what does MPI on CamGrid mean?
      • Job spans many pools: not practical! Consider packet latency over routers and firewalls. Also, it’s a heterogeneous environment.
      • Job spans many machines in one pool: possible, and some people do this. Inter-node connectivity is usually 1Gb/s (at best).
      • Job sits on one machine, but spans all cores: we’re in the money! With multi-core machines this is becoming increasingly attractive, and comms take place over shared memory (avoiding n/w stack) = very fast.
mpi example
MPI Example

# This is a wrapper for an OpenMPI job using 4 cores on the same

# physical host:

executable = openmpi.sh

transfer_input_files = castep, Al_00PBE.usp, O_00PBE.usp, \ corundum.cell, corundum.param

WhenToTransferOutput = ON_EXIT

output = myoutput

error = myerror

log = mylog

machine_count = 4 # We want four processes

arguments = "castep corundum"

+WantParallelSchedulingGroups = True

requirements = OpSys == "LINUX" && Arch == "X86_64"


mpi example cont
MPI Example (cont.)
  • openmpi.sh does necessary groundwork, e.g. set paths to MPI binaries and libraries, create machine file lists, etc., before invoking the MPI starter command for that flavour.
  • In parallel environments, machines can be divided into groups by suitable configuration on the execute hosts. For example, the following configurational entry would mean that all processes on the host would reside in the same group, all on that same machine:

ParallelSchedulingGroup = “$(HOSTNAME)”

  • This feature is then requested in the user’s submit script by having:

+WantParallelSchedulingGroups = TRUE

accessing data across camgrid
Accessing data across CamGrid
  • CamGrid spans many administrative domains, with each pool generally run by different sysadmins.
  • This makes the running of a file system that need privileged installation and administration, e.g. NFS, impractical.
  • So far we’ve got round this by sending input files from the submit node with every job submission.
  • However, there are times when it would be really nice to be able to mount a remote file store, e.g.: maybe I don’t know exactly which files I need at submit time (identified at run time).
  • Fortunately, a tool to do just this in Linux has come out of the Condor project, called Parrot.
  • Parrot gives a transparent way of accessing these resources without the need of superuser intervention (unlike trying to export a directory via NFS, or setting up sshfs).
  • It supports many protocols (http, httpfs, ftp, anonftp, gsiftp, chirp, …) and authentication models (GSI, kerberos, IP-address,…).
  • Parrot can be used on its own (outside of Condor).
  • It also allows server clustering for load balancing.
digression chirp
Digression: Chirp
  • Chirp: a remote I/O protocol used Condor.
  • I can start my own chirp_server and export a directory:

chirp_server -r /home/mcal00/data -I -p 9096

  • I set permissions per directory with a .__acl file in the exported directory:

hostname:*.grid.private.cam.ac.uk rl hostname:*.escience.cam.ac.uk rl

using interactive shells with parrot
Using interactive shells with Parrot
  • Not that useful in a grid job, but a nice feature:

parrot vi /chirp/woolly--escience.grid.private.cam.ac.uk:9096/readme

  • Default port is 9094, so can export different mount-points from same resource using different ports.
  • I can also mount a remote file system in a new shell:

parrot -M

/dbase=/http/woolly--escience.grid.private.cam.ac.uk:80 bash

  • /dbase appears as a local directory in the new shell
parrot and non interactive grid jobs
Parrot and non-interactive grid jobs
  • Consider an executable called a.out, that needs to access the directories /Dir1 and /Dir2.
  • I start by constructing a mountpoint file (call it Mountfile):
  • Next I wrap it in an executable that will provide all the parrot functionality, and is what I actually submit to the relevant grid scheduler, e.g. via condor_submit. Call this wrapper.sh:

/Dir1 /chirp/woolly--escience.grid.private.cam.ac.uk:9094

/Dir2 /chirp/woolly--escience.grid.private.cam.ac.uk:9096

wrapper sh


export PATH=.:/bin:/usr/bin


# Nominate file with the mount points


# What's the "real" executable called?


chmod +x $my_executable parrot

# Run the executable under parrot

parrot -k -Q -m $mountfile $my_executable

new in 7 4 file transfer by url
New in 7.4: File transfer by URL
  • It is now possible for vanilla jobs to specify a URL for the input files so that the execute host pulls the files over.
  • First, a sysadmin must have added an appropriate plugin and configured the execute host as such, e.g:


  • You can then submit a job to that machine and not send it any files, but instead direct it to pull files from an appropriate server, e.g. have in a submit script:

URL = https://www.escience.cam.ac.uk/

transfer_input_files = $(URL)/file1.txt, $(URL)/file2.txt

checkpointing diy
Checkpointing - DIY
  • Recap: Condor’s process checkpointing via the Standard Universe saves all the state of a process into a checkpoint file
    • Memory, CPU, I/O, etc.
  • Checkpoints are saved on submit host unless a dedicated checkpoint server is nominated.
  • The process can then be restarted from where it left off
  • Typically no changes to the job’s source code needed – however, the job must be relinked with Condor’s Standard Universe support library
  • Limitations: no forking, kernel threads, or some forms of IPC
  • Not all combinations of OS/compilers are supported (none for Windows), and support is getting harder.
  • VM universe is meant to be the successor, but users don’t seem too keen.
recursive shell scripts cont
Recursive shell scripts (cont.)
  • We can run a recursive shell script.
  • This script does a condor_submit on our required executable, and we ensure that the input files are such that the job only runs for a “short” duration.
  • The script then runs condor_wait on the job’s log file and waits for it to finish.
  • Once this happens, the script checks the output files to see if the completion criteria have been met, otherwise we move the output files to input files and resubmit the job.
  • Hence, there is a proviso that output files can generate next set of input files (not all applications can).
recursive shell scripts cont27
Recursive shell scripts (cont.)
  • There are some drawbacks with this approach:
    • We need to write the logic for checking for job completion. This will probably vary between applications.
    • We need to take into account of how our recursive script will behave if the job exits abnormally, e.g. execute host disappears, etc.
  • We can mitigate some of these concerns by running a recursive DAG (so Condor worries about abnormalities), and an example is given in CamGrid’s online documentation. However, we still need to write some application-specific logic.
checkpointing linux vanilla universe jobs
Checkpointing (linux) vanilla universe jobs
  • Many applications can’t link with Condor’s checkpointing libraries. And what about interpreted languages?
  • To perform this for arbitrary code we need:

1) An API that checkpoints running jobs.

2) A user-space FS to save the images

  • For 1) we use the BLCR kernel modules – unlike Condor’s user-space libraries these run with root privilege, so less limitations as to the codes one can use.
  • For 2) we use Parrot, which came out of the Condor project. Used on CamGrid in its own right, but with BLCR allows for any code to be checkpointed.
  • I’ve provided a bash implementation, blcr_wrapper.sh, to accomplish this (uses chirp protocol with Parrot).
Checkpointing linux jobs using BLCR kernel modules and Parrot
  • Start chirp server to receive checkpoint images

2. Condor jobs starts: blcr_wrapper.sh uses 3 processes



Parrot I/O

3. Start by checking for image from previous run

4. Start job

5. Parent sleeps; wakes periodically to checkpoint and save images.

6. Job ends: tell parent to clean up

example of submit script
Example of submit script
  • Application is “my_application”, which takes arguments “A” and “B”, and needs files “X” and “Y”.
  • There’s a chirp server at:


Universe = vanilla

Executable = blcr_wrapper.sh

arguments = woolly--escience.grid.private.cam.ac.uk 9096 60 $$([GlobalJobId]) \

my_application A B

transfer_input_files = parrot, my_application, X, Y

transfer_files = ALWAYS

Requirements = OpSys == "LINUX" && Arch == "X86_64" && HAS_BLCR == TRUE

Output = test.out

Log = test.log

Error = test.error