The flight of the Condor - a decade of High Throughput Computing

The flight of the Condor - a decade of High Throughput Computing Miron Livny Computer Sciences Department University of Wisconsin-Madison miron@cs.wisc.edu

Remember! • There are no silver bullets. • Response time = Queuing Time + Execution Time. • If you believe in parallel computing you need a very good reason for not using an idle resource. • Debugging complex parallel applications is not fun.

Background andmotivation …

“ … Since the early days of mankind the primary motivation for the establishment of communities has been the idea that by being part of an organized group the capabilities of an individual are improved. The great progress in the area of inter-computer communication led to the development of means by which stand-alone processing sub-systems can be integrated into multi-computer ‘communities’. … “ M. Livny, “Study of Load Balancing Algorithms for Decentralized Distributed Processing Systems.”, Ph.D thesis, July 1983.

The growing gap betweenwhat we ownand whateach of us can access

Distributed Ownership Due to dramatic decrease in the cost-performance ratio of hardware, powerful computing resources are owned today by individuals, groups, departments, universities… • Huge increase in the computing capacity owned by the scientific community • Moderate increase in the computing capacity accessible by a scientist

What kind of Computing?   High Performance Computing  Other

How aboutHigh Throughput Computing (HTC)? I introduced the term HTC in a seminar at the NASA Goddard Flight Center in July of ‘96 and a month later at the European Laboratory for Particle Physics (CERN). • HTC paper in HPCU News 1(2), June ‘97. • HTC interview in HPCWire, July ‘97. • HTC part of NCSA PACI proposal Sept. ‘97 • HTC chapter in “the Grid” book, July ‘98.

High Throughput Computingis a24-7-365activity FLOPY (60*60*24*7*52)*FLOPS

A simple scenario of a High Throughput Computing (HTC) user with a very simple application and one workstation on his/her desk

The HTC Application Study the behavior of F(x,y,z) for 20 values of x, 10 values of y and 3 values of z (20*10*3 = 600) • F takes on the average 3 hours to compute on a “typical” workstation (total = 1800 hours) • F requires a “moderate” (128MB) amount of memory • F performs “little” I/O - (x,y,z) is 15 MB and F(x,y,z) is 40 MB

What we have hereis aMaster WorkerApplication!

Master-Worker Paradigm Many scientific, engineering and commercial applications (Software builds and testing, sensitivity analysis, parameter space exploration, image and movie rendering, High Energy Physics event reconstruction, processing of optical DNA sequencing, training of neural-networks, stochastic optimization, Monte Carlo...) follow the Master-Worker (MW) paradigm where ...

Master-Worker Paradigm … a heap or a Directed Acyclic Graph (DAG) of tasks is assigned to a master. The master looks for workers who can perform tasks that are “ready to go” and passes them a description (input) of the task. Upon the completion of a task, the worker passes the result (output) of the task back to the master. • Master may execute some of the tasks. • Master maybe a worker of another master. • Worker may require initialization data.

Master-Worker computing is Naturally Parallel.It is by no means Embarrassingly Parallel. As you will see, doing it right is by no means trivial.Here are a few challenges ...

Dynamic or Static? This is the key question one faces when building a MW application. How this question is answered has an impact on • The algorithm • Target architecture • Resources availability • Quality of results • Complexity of implementation

How do the Master and Worker Communicate? • Via a shared/distributed file/disk system using reads and writes or • Via a message passing system (PVM-MPI) using sends and receives or • Via a shared memory using loads, stores and semaphores.

How many workers? • One per task? • One per CPU allocated to the master? • N(t) depending on the dynamic properties of the “ready to go” set of tasks?

Job Parallel MW • Master and workers communicate via the file system. • Workers are independent jobs that are submitted/started, suspended, resumed and cancelled by the master. • Master may monitor progress of jobs and availability of resources or just collect results at the end.

Building a basic Job Parallel Application 1. Create n directories. 2. Write an input file in each directory. 3. Submit a cluster of n job. 4. Wait for the cluster to finish. 5. Read an output file from each directory.

Task Parallel MW • Master and workers exchange data via messages delivered by a message passing system like PVM or MPI. • Master monitors availability of resources and expends or shrinks the resource pool of the application accordingly. • Master monitors the “health” of workers and redistribute tasks accordingly.

C High Throughput Computing ondor Our Answer to High Throughput MW Computing

“… Modern processing environments that consist of large collections of workstations interconnected by high capacity network raise the following challenging question: can we satisfy the needs of users who need extra capacity without lowering the the quality of service experienced by the owners of under utilized workstations? … The Condor scheduling system is our answer to this question. … “ M. Litzkow, M. Livny and M. Mutka, “Condor - A Hunter of Idle Workstations”, IEEE 8th ICDCS, June 1988.

The Condor System A High Throughput Computing system that supports large dynamic MW applications on large collections of distributively owned resources developed, maintained and supported by the Condor Team at the University of Wisconsin - Madison since ‘86. • Originally developed for UNIX workstations. • Fully integrated NT version in advance testing. • Deployed world-wide by academia and industry. • A 600 CPU system at U of Wisconsin • Available at www.cs.wisc.edu/condor.

Selected sites (18 Nov 1998 10:21:13) Name Machine Running IdleJobs HostsTotal RNI core.rni.helsinki.fi 9 9 17 dali.physik.uni-l dali.physik.uni-leipzig.de 1 0 23 Purdue ECE drum.ecn.purdue.edu 4 9 4 ICG TU-Graz fcggsg06.icg.tu-graz.ac.at 0 0 47 TU-Graz Physikstu fubphpc.tu-graz.ac.at 0 8 5 PCs lam.ap.polyu.edu.hk 7 5 8 C.O.R.E. Digital latke.coredp.com 7 45 26 legba legba.unsl.edu.ar 0 0 5 ictp-test mlab-42.ictp.trieste.it 18 0 26 CGSB-NLS nls7.nlm.nih.gov 4 1 8 UCB-NOW now.cs.berkeley.edu 3 3 5 INFN - Italy venus.cnaf.infn.it 31 61 84 NAS CONDOR POOL win316.nas.nasa.gov 6 0 20

“… Several principals have driven the design of Condor. First is that workstation owners should always have the resources of the workstation they own at their disposal. … The second principal is that access to remote capacity must be easy, and should approximate the local execution environment as closely as possible. Portability is the third principal behind the design of Condor. … “ M. Litzkow and M. Livny, “Experience With the Condor Distributed Batch System”, IEEE Workshop on Experimental Distributed Systems, Huntsville, AL. Oct. 1990.

Key Condor Mechanisms • Matchmaking - enables requests for services and offers to provide services find each other (ClassAds). • Checkpointing - enables preemptive resume scheduling (go ahead and use it as long as it is available!). • Remote I/O - enables remote (from execution site) access to local (at submission site) data. • Asynchronous API - enables management of dynamic (opportunistic) resources.

Application Application Agent Tasks Jobs Customer Agent Environment Agent Owner Agent Local Resource Management Resource Condor Layers

Condor MW services • Checkpointing of Job Parallel (JP) workers • Remote I/O for master-worker communication • Log files for JP workers • Management of large (10K) numbers of jobs • Process management for dynamic PVM applications • A DAGMan (Directed Acyclic Graph Manager) • Access to large amounts of computing power

Condor System Structure Central Manager Negotiator Collector N C Submit Machine Execution Machine [...A] CA RA [...C] [...B] Customer Agent Resource Agent

Advertising Protocol [...N] [...M] N C [...M] [...A] CA RA [...C] [...B]

Advertising Protocol [...N] [...M] N C [...A] CA RA [...C] [...B]

Matching Protocol [...N] N C [...M] [...B] [...A] CA RA [...C]

Claiming Protocol [...S] N C [...A] CA RA [...C]

Memory CPU File System Remote Execution Customer File System* Remote Workstation Executable Checkpoint Network Input Files Output Files *May be distributed.

Application Agent Execution Submission Owner Agent Customer Agent Request Queue Object Files Object Files Execution Agent Data & Object Files Ckpt Files Application Process Application Process Remote I/O & Ckpt

Workstation Cluster Workshop December 1992

We have users that ... • … have job parallel MW applications with more than 5000 jobs. • … have task parallel MW applications with more than 100 tasks. • … run their job parallel MW application for more than six month. • … run their task parallel MW application for more than four weeks.

A Condor Job-Parallel Submit File executable = worker requirement =( (OS == “Linux2.2”) && Memory >= 64)) initialdir = worker_dir.$(process) input = in output = out error = err log = log queue 1000

Worker Tasks Master Tasks Material Sciences MW Application Potential = start FOR cycle = 1 to 36 FOR location = 1 to 31 totalEnergy =+ Energy(location,potential) END potential = F(totalEnergy) END Implemented as a PVM application with the Condor MW services. Two traces (execution and performance) visualized by DEVise.

36*31 Worker Tasks Logical worker ID Time (total 6 hours) # of Workers Node Utilization Task Duration vs. Location One Cycle (31 worker tasks) First Allocation Second Allocation Third Allocation Preemption

… back to the user withthe 600 jobs and only one workstation to run them

First step - get organized! • Turn your workstation into a single node “Personal” Condor pool • Write a script that creates 600 input files for each of the (x,y,z) combinations • Submit a cluster of 600 jobs to your personal Condor pool • Write a script that monitors the logs and collects the data from the 600 output files • Go on a long vacation … (2.5 months)

Your Personal Condor will ... • ... keep an eye on your jobs and will keep you posted on their progress • ... implement your policy on when the jobs can run on your workstation • ... implement your policy on the execution order of the jobs • .. add fault tolerance to your jobs • … keep a log of your job activities

personal Condor your workstation 600 Condor jobs

… and what about theunderutilized workstation in the next office or the one in the class room downstairs or the Linux cluster node in the other building or the O2K node at the other side of town or …

C High Throughput Computing ondor

Second step - become a scavenger • Install Condor on the machine next door. • Install Condor on the machines in the class room. • Configure these machines to be part of your Condor pool • Go on a shorter vacation ...

personal Condor Group Condor your workstation 600 Condor jobs

Third step - Take advantage of your friends • Get permission from “friendly” Condor pools to access their resources • Configure your personal Condor to “flock” to these pools • reconsider your vacation plans ...

The flight of the Condor - a decade of High Throughput Computing

The flight of the Condor - a decade of High Throughput Computing

Presentation Transcript

A High Throughput Computing Analysis of Rounding in the Beer Distribution Game

High Throughput Computing

The close of a decade

HTPC - High Throughput Parallel Computing (on the OSG)

The End of Denial Architecture and The Rise of Throughput Computing

High Throughput Parallel Computing (HTPC)

Experiences in running High Throughput Computing on the Cloud

High Throughput Parallel Computing (HTPC)

High Throughput Urgent Computing

Condor: High-throughput Computing From Clusters to Grid Computing

Deploying a High Throughput Computing Cluster

High Throughput Computing on the Open Science Grid

Grid Computing: the Next Decade

High Throughput Computing with Condor at Notre Dame

High Throughput Computing with Condor at Purdue XSEDE ECSS Monthly Symposium

High-Throughput Computing With Condor

HTPC - High Throughput Parallel Computing (on the OSG)

High Throughput Computing Notebooks

High Throughput Distributed Computing - 1

Condor In Flight at The Hartford

High Throughput Computing with Condor at Notre Dame

High Throughput Distributed Computing - 3