Installation of a Condor Supercomputing pool

Installation of a Condor Supercomputing pool Brain Campbell Bryce Carmichael Unquiea Wade Mentor: Dr. Eric Akers

Abstract The international polar year was designed to study and better understand the current state of the climatic changes to the world’s ice sheets. For the last few decades, there have been automated weather stations and satellites in geo-synchronous orbit that created data sets. Today, numerous amounts of data are unexplored due to insufficient funding and the scarcity of resources. For this reason, the polar grid concept was proposed to delegate the analysis of the existing data sets. The goal of the Elizabeth City State University’s Polar Grid Team was to construct a model network to serve as a base for a super computing pool. The super computing pool will be constructed on the university’s campus and linked to the overall polar grid system. Numerous Software and protocols were researched that are currently in use at other institutions around the nation. From the possible protocols, the condor software was chosen. Condor was created and developed at the University of Wisconsin because of easier usage and its willingness for expansion. An eighteen node computing pool was constructed and tested within Dixon Hall's second floor lab using Condor. This pool was comprised of seventeen desk-tops running on a Windows NT platform, with the pool's mater housed in Lane hall acting as a Linux based server.

Purpose The goal was to utilize all of our computers. Gain knowledge about Supercomputing. Setup a pool of computers that can be accessed by Polar Grid. Familiarize team members with job submission and overall operation of Condor.

Introduction to Supercomputing • What is Supercomputing? Supercomputing a term given to a system capable of processing at speeds much greater than commercially available CPU’s. High throughput computing is used in describing systems with intermediate processing abilities.

Distributive vs. Parallel • Distributed computing utilizes a network of many computers, each accomplishing a portion of an overall task, to achieve a computational result much more quickly than with a single computer. • Distributed computing also allows many users to interact and connect openly. • Parallel processing is the simultaneous processing of the same task on two or more microprocessors in order to obtain faster results. • The computer resources can include a single computer with multiple processors.

Size vs. Efficiency • Parallel processing allows more intimate communication between nodes increasing efficiency. • As the size of the network grows communication takes up a greater part of the CPU’s time • This can be limited by using more than one type of protocol in a system

Hardware/Software Options Condor is a specialized workload management system for compute-intensive jobs. Like other full-featured batch systems, Condor provides a job queueing mechanism, scheduling policy, priority scheme, resource monitoring, and resource management. Beowulf is a design for high-performance parallel computing clusters on inexpensive personal computer hardware. Beowulf cluster is a group of usually identical PC computers running a Free and Open Source Software (FOSS) Unix-like operating system, such as BSD, Linux or Solaris. BOINC is a software platform for volunteer computing and desktop Grid computing. BOINC is designed to support applications that have large computation requirements, storage requirements, or both.

History of Condor The Condor project was started in 1988. Condor was built from the results of the Remote Unix project and from the continuation of research in the area of Distribute Resource Management (DRM). Condor was created at the University of Wisconsin-Madison (UW-Madison), and it was first installed as a production system in the UW-Madison Department of Computer Science.

Why choose Condor? • Versatility • Capability of switching between distributive or parallel computing • Multiple programming codes for simple execution of jobs. • Operates on Multiple platforms

Resources Required • Availability – Open source software • Easy Expansion – Any number of nodes can be added to an existing pool • Cost efficiency – Any CPU meeting the base requirements can be use efficiently.

System Requirements • Windows • Condor for Windows requires Windows 2000 (or better) or Windows XP. • 300 megabytes of free disk space is recommended. Significantly more disk space could be desired to be able to run jobs with large data files. • Condor for Windows will operate on either an NTFS or FAT file system. However, for security purposes, NTFS is preferred. • Unix • The size requirements for the downloads are currently vary from about 20 Mbytes • (statically linked HP Unix on a PA RISC) to more than 50 Mbytes (dynamically linked Irix on an SGI). • In addition, you will need a lot of disk space in the local directory of any machines that are submitting jobs to Condor

Installation http://parrot.cs.wisc.edu/ . Condor software can be access through their main website. Condor can be downloaded on various platform such as: Solaris, Linux/Unix, Windows, and MAC Administrative and user manuals are also available on the website.

Configuration Installation – overseen through the windows installation wizard Changes to default : Pool master node – Linux base machine in lane hall 10.40.20.37 having a Linux based master will allow the eventual use of the full array of condor options. Read & Write access - parameters changed to include 10.*.*.* to allow fee back and access from different nodes. Due to the use of the CERSER labs during class hours each node is required to be idle for 15 minutes before it is available to perform tasks. If a tasks interrupted it will be restarted on a different machine, if the original node is not freed in less than ten minutes

Job Submission and Tracking Jobs can be submitted using any executable file format through the condor/bin directory. Jobs are submitted through the condor bin using the condor_submit filename,the status of the nodes within the system can be checked using the command condor_status,

Condor Status Menu condor _status command will bring up a menu given the condition that will list the current platform and availability of each node. Availability is signified by the one word qualifiers in the fourth column. Unclaimed: The node is open but is unable to perform the specified task Claimed: The node is currently running a specified task Matched : The node is opened and can perform a specified task Owner: The node has a local user demanding its attention

Job Submission and Tracking After submission a task can be traced through the pool using condor_q,command. The results of the tasks can be seen within the output files created through the executable. or through the .log file that is created automatically for each task.

Results Condor pool composed of 17 nodes running on windows NT platform has been established in the Dixon hall laboratory. Operating under a Linux based master housed at the lane hall offices. To date simple tasks have been submitted using C++ code and have ran successfully through the pool. Diagnostic assessment has shown two CPU’s unconnected to the network and that there were naming redundancies which hindered the installation of the condor system.

Conclusions Installation of Condor was a success . Expansion of the cluster is easy and can be done efficiently with minimal cost in resources. Management and Programming with Condor can be done on an undergraduate level and is encouraged.

Future Work Familiarize more of CERSER teams with Condor software. Continue the expansion of the Condor pool . Link ECSU to the Polar Grid network. Encourage the development of a programs to aide future CERSER research projects.

References • Andrew S. Tanenbaum, Maarten Van Steen (2002): Distributed Systems Principles and Paradigms. New Jersey: Prentice- Hall Inc. • Amza C., A.L. Cox, S. Dwarkadas, P. Keleher, R. Rajamony H. Lu, W. Yu, and W.Zwaenepoel. ThreadMarks: Shared memory computing on networks of workstations, to appear in IEEE Computer,(draft copy): www.cs.rice.edu/willy/TreadMarks/papers.html. • A.J. van der Steen, An evaluation of some Beowulf clusters, Technical Report WFI-00-07, Utrecht University, Dept. of Computational Physics, December 2000. (Also available through www.euroben.nl, directory reports/.) • A.J. van der Steen, Overview of recent supercomputers high-end servers, June 2005, www.euroben.nl, directory reports/. • http://www.cs.wisc.edu/condor/manual/v7.0/ • http://boinc.berkeley.edu/trac/wiki/BoincIntro • http://www.supercomputingonline.com/ads.php

Questions

Installation of a Condor Supercomputing pool

Installation of a Condor Supercomputing pool

Presentation Transcript

OpenDSP - Condor to DRMAA layer (DRMAA to Condor ;-) Krzysztof Kurowski Poznan Supercomputing and Networking Center Pola

A History of Supercomputing

Basics of Supercomputing

Basics of Supercomputing

Cardiff University’s Condor Pool

Securing Your Condor Pool With SSL

CONDOR-G Installation

The UCL Condor Pool Experience

Implementing a Central Quill Database in a Large Condor Installation

Implementing a Central Quill Database in a Large Condor Installation

Implementation of a Static Condor Cluster

Tarball server (for Condor installation)

Installation of Pool Fences

A Guide to Swimming Pool Installation

Above Ground Pool Installation

Glass Pool Fence Installation For Your Pool

Installing and Managing a Large Condor Pool

Implementing a Central Quill Database in a Large Condor Installation

Piscine inox - Installation of stainless steel pool

Pool Maintenance - Pool heater Installation

Pool Installation Gilbert AZ

Ideas of Above Ground Swimming pool Installation