Further elements of Condor and CamGrid - PowerPoint PPT Presentation

Further elements of condor and camgrid l.jpg
Download
1 / 30

  • 315 Views
  • Updated On :
  • Presentation posted in: Pets / Animals

Further elements of Condor and CamGrid. Mark Calleja. What Condor Daemons are running on my machine, and what do they do?. Central Manager. = Process Spawned. collector. schedd. startd. negotiator. master. Condor Daemon Layout. Note: there can also be other, more specialist daemons.

Related searches for Further elements of Condor and CamGrid

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Further elements of Condor and CamGrid

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Further elements of condor and camgrid l.jpg

Further elements of Condor and CamGrid

Mark Calleja


What condor daemons are running on my machine and what do they do l.jpg

What Condor Daemons are running on my machine, and what do they do?


Condor daemon layout l.jpg

Central Manager

= Process Spawned

collector

schedd

startd

negotiator

master

Condor Daemon Layout

Note: there can also be other, more specialist daemons


Condor master l.jpg

master

condor_master

  • Starts up all other Condor daemons

  • If there are any problems and a daemon exits, it restarts the daemon and sends email to the administrator

  • Checks the time stamps on the binaries of the other Condor daemons, and if new binaries appear, the master will gracefully shutdown the currently running version and start the new version


Condor startd l.jpg

startd

master

condor_startd

  • Represents a machine to the Condor system

  • Responsible for starting, suspending, and stopping jobs

  • Enforces the wishes of the machine owner (the owner’s “policy”… more on this soon)


Condor schedd l.jpg

schedd

startd

master

condor_schedd

  • Represents users to the Condor system

  • Maintains the persistent queue of jobs

  • Responsible for contacting available machines and sending them jobs

  • Services user commands which manipulate the job queue:

    • condor_submit,condor_rm, condor_q, condor_hold, condor_release, condor_prio, …


Condor collector l.jpg

collector

schedd

startd

master

condor_collector

  • Collects information from all other Condor daemons in the pool

    • “Directory Service” / Database for a Condor pool

  • Each daemon sends a periodic update called a “ClassAd” to the collector

  • Services queries for information:

    • Queries from other Condor daemons

    • Queries from users (condor_status)


Condor negotiator l.jpg

collector

schedd

startd

negotiator

master

condor_negotiator

  • Performs “matchmaking” in Condor

  • Gets information from the collector about all available machines and all idle jobs

  • Tries to match jobs with machines that will serve them

  • Both the job and the machine must satisfy each other’s requirements


Typical condor pool l.jpg

Execute-Only

Execute-Only

Submit-Only

Regular Node

Regular Node

Central Manager

= Process Spawned

negotiator

collector

schedd

schedd

schedd

schedd

master

master

master

master

master

master

startd

startd

startd

startd

startd

Typical Condor Pool

= ClassAd

Communication

Pathway

Each daemon maintains its own log; makes for interesting distributed debugging!


Job startup l.jpg

Job

Job Startup

Central Manager

Negotiator

Collector

Submit Machine

Execute Machine

Schedd

Startd

Starter

Shadow

Submit

Condor

Syscall Lib


The parallel universe l.jpg

The Parallel Universe

First, some caveats:

  • Using the PU requires extra set-up on the execute nodes by a sysadmin.

  • Execute hosts willing to run under the PU will only accept jobs from one dedicated scheduler.

  • No flocking!

    At its simplest, the PU just launches N identical jobs on N machines.

    The job will only run when all machines have been allocated.

    For example, the following submit script will run the command “uname –a” on four nodes.


Slide12 l.jpg

######################################

## Example PU submit description file

######################################

universe = parallel

executable = /bin/uname

arguments = -a

should_transfer_files = yes

when_to_transfer_output = on_exit

requirements = OpSys == "LINUX" && Arch == "X86_64"

log = logfile

output = outfile.$(NODE)

error = errfile.$(NODE)

machine_count = 4

queue


The parallel universe and mpi l.jpg

The Parallel Universe and MPI

  • Condor leverages the PU to run MPI jobs.

  • Does this by launching a wrapper on N nodes. Then:

    • N-1 nodes exit (but don’t relinquish claim on node).

    • Wrapper on first node launches MPI job, grabbing all N nodes and calling relevant MPI command (e.g. MPICH2, OpenMPI, etc.)

    • Condor bundles wrappers for various MPI flavours, which you can then modify suit your needs.

    • But what does MPI on CamGrid mean?

      • Job spans many pools: not practical! Consider packet latency over routers and firewalls. Also, it’s a heterogeneous environment.

      • Job spans many machines in one pool: possible, and some people do this. Inter-node connectivity is usually 1Gb/s (at best).

      • Job sits on one machine, but spans all cores: we’re in the money! With multi-core machines this is becoming increasingly attractive, and comms take place over shared memory (avoiding n/w stack) = very fast.


Mpi example l.jpg

MPI Example

# This is a wrapper for an OpenMPI job using 4 cores on the same

# physical host:

executable = openmpi.sh

transfer_input_files = castep, Al_00PBE.usp, O_00PBE.usp, \ corundum.cell, corundum.param

WhenToTransferOutput = ON_EXIT

output = myoutput

error = myerror

log = mylog

machine_count = 4 # We want four processes

arguments = "castep corundum"

+WantParallelSchedulingGroups = True

requirements = OpSys == "LINUX" && Arch == "X86_64"

queue


Mpi example cont l.jpg

MPI Example (cont.)

  • openmpi.sh does necessary groundwork, e.g. set paths to MPI binaries and libraries, create machine file lists, etc., before invoking the MPI starter command for that flavour.

  • In parallel environments, machines can be divided into groups by suitable configuration on the execute hosts. For example, the following configurational entry would mean that all processes on the host would reside in the same group, all on that same machine:

    ParallelSchedulingGroup = “$(HOSTNAME)”

  • This feature is then requested in the user’s submit script by having:

    +WantParallelSchedulingGroups = TRUE


Case study ag 3 co cn 6 energy surface from dft l.jpg

Case study: Ag3[Co(CN)6] energy surface from DFT


Accessing data across camgrid l.jpg

Accessing data across CamGrid

  • CamGrid spans many administrative domains, with each pool generally run by different sysadmins.

  • This makes the running of a file system that need privileged installation and administration, e.g. NFS, impractical.

  • So far we’ve got round this by sending input files from the submit node with every job submission.

  • However, there are times when it would be really nice to be able to mount a remote file store, e.g.: maybe I don’t know exactly which files I need at submit time (identified at run time).


Parrot l.jpg

Parrot

  • Fortunately, a tool to do just this in Linux has come out of the Condor project, called Parrot.

  • Parrot gives a transparent way of accessing these resources without the need of superuser intervention (unlike trying to export a directory via NFS, or setting up sshfs).

  • It supports many protocols (http, httpfs, ftp, anonftp, gsiftp, chirp, …) and authentication models (GSI, kerberos, IP-address,…).

  • Parrot can be used on its own (outside of Condor).

  • It also allows server clustering for load balancing.


Digression chirp l.jpg

Digression: Chirp

  • Chirp: a remote I/O protocol used Condor.

  • I can start my own chirp_server and export a directory:

    chirp_server -r /home/mcal00/data -I 172.24.116.7 -p 9096

  • I set permissions per directory with a .__acl file in the exported directory:

    hostname:*.grid.private.cam.ac.uk rl hostname:*.escience.cam.ac.uk rl


Using interactive shells with parrot l.jpg

Using interactive shells with Parrot

  • Not that useful in a grid job, but a nice feature:

    parrot vi /chirp/woolly--escience.grid.private.cam.ac.uk:9096/readme

  • Default port is 9094, so can export different mount-points from same resource using different ports.

  • I can also mount a remote file system in a new shell:

    parrot -M

    /dbase=/http/woolly--escience.grid.private.cam.ac.uk:80 bash

  • /dbase appears as a local directory in the new shell


Parrot and non interactive grid jobs l.jpg

Parrot and non-interactive grid jobs

  • Consider an executable called a.out, that needs to access the directories /Dir1 and /Dir2.

  • I start by constructing a mountpoint file (call it Mountfile):

  • Next I wrap it in an executable that will provide all the parrot functionality, and is what I actually submit to the relevant grid scheduler, e.g. via condor_submit. Call this wrapper.sh:

/Dir1 /chirp/woolly--escience.grid.private.cam.ac.uk:9094

/Dir2 /chirp/woolly--escience.grid.private.cam.ac.uk:9096


Wrapper sh l.jpg

wrapper.sh

#!/bin/bash

export PATH=.:/bin:/usr/bin

export LD_LIBRARY_PATH=.

# Nominate file with the mount points

mountfile=Mountfile

# What's the "real" executable called?

my_executable=a.out

chmod +x $my_executable parrot

# Run the executable under parrot

parrot -k -Q -m $mountfile $my_executable


New in 7 4 file transfer by url l.jpg

New in 7.4: File transfer by URL

  • It is now possible for vanilla jobs to specify a URL for the input files so that the execute host pulls the files over.

  • First, a sysadmin must have added an appropriate plugin and configured the execute host as such, e.g:

    FILETRANSFER_PLUGINS = $(RELEASE_DIR)/plugins/curl-plugin

  • You can then submit a job to that machine and not send it any files, but instead direct it to pull files from an appropriate server, e.g. have in a submit script:

    URL = https://www.escience.cam.ac.uk/

    transfer_input_files = $(URL)/file1.txt, $(URL)/file2.txt


Checkpointing diy l.jpg

Checkpointing - DIY

  • Recap: Condor’s process checkpointing via the Standard Universe saves all the state of a process into a checkpoint file

    • Memory, CPU, I/O, etc.

  • Checkpoints are saved on submit host unless a dedicated checkpoint server is nominated.

  • The process can then be restarted from where it left off

  • Typically no changes to the job’s source code needed – however, the job must be relinked with Condor’s Standard Universe support library

  • Limitations: no forking, kernel threads, or some forms of IPC

  • Not all combinations of OS/compilers are supported (none for Windows), and support is getting harder.

  • VM universe is meant to be the successor, but users don’t seem too keen.


Strategy 1 recursive shell scripts l.jpg

Strategy 1 – Recursive shell scripts


Recursive shell scripts cont l.jpg

Recursive shell scripts (cont.)

  • We can run a recursive shell script.

  • This script does a condor_submit on our required executable, and we ensure that the input files are such that the job only runs for a “short” duration.

  • The script then runs condor_wait on the job’s log file and waits for it to finish.

  • Once this happens, the script checks the output files to see if the completion criteria have been met, otherwise we move the output files to input files and resubmit the job.

  • Hence, there is a proviso that output files can generate next set of input files (not all applications can).


Recursive shell scripts cont27 l.jpg

Recursive shell scripts (cont.)

  • There are some drawbacks with this approach:

    • We need to write the logic for checking for job completion. This will probably vary between applications.

    • We need to take into account of how our recursive script will behave if the job exits abnormally, e.g. execute host disappears, etc.

  • We can mitigate some of these concerns by running a recursive DAG (so Condor worries about abnormalities), and an example is given in CamGrid’s online documentation. However, we still need to write some application-specific logic.


Checkpointing linux vanilla universe jobs l.jpg

Checkpointing (linux) vanilla universe jobs

  • Many applications can’t link with Condor’s checkpointing libraries. And what about interpreted languages?

  • To perform this for arbitrary code we need:

    1) An API that checkpoints running jobs.

    2) A user-space FS to save the images

  • For 1) we use the BLCR kernel modules – unlike Condor’s user-space libraries these run with root privilege, so less limitations as to the codes one can use.

  • For 2) we use Parrot, which came out of the Condor project. Used on CamGrid in its own right, but with BLCR allows for any code to be checkpointed.

  • I’ve provided a bash implementation, blcr_wrapper.sh, to accomplish this (uses chirp protocol with Parrot).


Slide29 l.jpg

Checkpointing linux jobs using BLCR kernel modules and Parrot

  • Start chirp server to receive checkpoint images

2. Condor jobs starts: blcr_wrapper.sh uses 3 processes

Job

Parent

Parrot I/O

3. Start by checking for image from previous run

4. Start job

5. Parent sleeps; wakes periodically to checkpoint and save images.

6. Job ends: tell parent to clean up


Example of submit script l.jpg

Example of submit script

  • Application is “my_application”, which takes arguments “A” and “B”, and needs files “X” and “Y”.

  • There’s a chirp server at:

    woolly--escience.grid.private.cam.ac.uk:9096

    Universe = vanilla

    Executable = blcr_wrapper.sh

    arguments = woolly--escience.grid.private.cam.ac.uk 9096 60 $$([GlobalJobId]) \

    my_application A B

    transfer_input_files = parrot, my_application, X, Y

    transfer_files = ALWAYS

    Requirements = OpSys == "LINUX" && Arch == "X86_64" && HAS_BLCR == TRUE

    Output = test.out

    Log = test.log

    Error = test.error

    Queue


  • Login