1 / 12

Grid-Enabling Applications in a Heterogeneous Environment with Globus and Condor

Grid-Enabling Applications in a Heterogeneous Environment with Globus and Condor. Jeffrey Wells – SUNY Institute of Technology – wellsj1@csunyit.edu Scott Spetka – SUNYIT and ITT Corp. – scott@cs.sunyit.edu

benson
Download Presentation

Grid-Enabling Applications in a Heterogeneous Environment with Globus and Condor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grid-Enabling Applications in a Heterogeneous Environment with Globus and Condor Jeffrey Wells – SUNY Institute of Technology – wellsj1@csunyit.edu Scott Spetka – SUNYIT and ITT Corp. – scott@cs.sunyit.edu Virginia Ross – Air Force Research Laboratory, Information Directorate - Virginia.Ross@rl.af.mil Mardi Gras Distributed Applications Conference Baton Rouge, LA January 30 – February 2, 2008

  2. Test Environment • AFRL Globus Grid Testbed • AFRL Grid-Enabled FrameWork • Regional VPN Based Grid • Condor Globus Case Studies • Heterogeneous Grid • Corning Community College contains a Condor Submit/Execute and Globus toolkit in a Debian network. • SUNY Geneseo contains a Globus toolkit in a Debian network. • SUNYIT contains a Condor Scheduler, Submit/Execute and Globus toolkit in a Linux network.

  3. AFRL Globus Grid Testbed

  4. AFRL Grid-Enabled FrameWork

  5. SUNY Geneseo Debian Linux Cluster Globus Services • Services used, tested and evaluated: • GridFTP, RFT (Reliable File Transfer) • Delegation, authentication authorization • Credential management • Grid Security Infrastructure (GSI)

  6. SUNY Institute of Technology Linux Cluster Globus Services Condor-G manages jobs through the resource manager of the Globus Toolkit. Results of the Job passed to the Globus Toolkit are returned via the Condor-G interface. Condor Scheduler Condor Workstation Pool Condor_master is responsible for keeping all the rest of the Condor daemons running. Condor_schedd submits jobs to remote resources for the job queue. Condor_negotiator is responsible for the match making. Condor_startd advertises about the resource and executes the job. Condor_strater spawns the remote job. Condor_shadow maintains the resources.

  7. Corning Community College Linux Cluster Condor-G uses the Globus resource manager that starts a job on the remote machine. It also manages the job running on the remote resource. Globus Services Condor Workstation Pool Condor-G waits for the job to be completed and then returns the results. Condor-G interface

  8. Condor Central Manager (Scheduler) Submit/Execute Job Request ClassAd/Results Globus Globus ClassAd/Results Submit/Execute Job Request Job Request ClassAd/Results Submit/Execute ClassAd/Results Job Request Central Manager Central Manager • Condor Central Manager (Scheduler) submits jobs either to a Condor Submit/Execute or Globus Machine. • Each machine “advertises” via ClassAd to Central Manager its resources • Central Manager matches up resource with submitted job requires • Central Manger sends executable to remote resource that matches requirement. • Once job is completed, Execute Machine reports back to Central Manager • Central Manager reports final results.

  9. Condor Jobs Vanilla Standard Java Parallel Globus Globus Jobs Forwarded a job to Condor machines From a Condor scheduler to a Globus machine (Globus Job). Various Jobs Implemented

  10. Condor Job and Globus Script ====================== == Condor to Globus == test.submit ====================== universe = grid executable = myscript.sh arguments = TestJob 10 JobManager_type = Condor grid_type = gt4 globusscheduler = https://stengal.cs.sunyit.edu:8443/wsrf/services/ ManagedJobFactoryService/ log = test.log output = test.output error = test.error should_transfer_files = YES when_to_transfer_output = ON_EXIT Queue #! /bin/sh echo "I'm process id $$ on" `hostname` echo "This is sent to standard error" 1>&2date echo "Running as binary $0" "$@" echo "My name (argument 1) is $1" echo "My sleep duration (argument 2) is $2" sleep $2 echo "Sleep of $2 seconds finished. Exiting" echo "RESULT: 0 SUCCESS“ Condor Job and MPI Program ########################## # Submit description file # for /bin/hostname # (Parallel) ######################### universe = parallel executable = /bin/hostname machine_count = 2 log = parallellogfile output = outfileMPI.$(NODE) error = errfileMPI.$(NODE) should_transfer_files = YES when_to_transfer_output = ON_EXIT queue MPI Program #include "mpi.h" #include <stdio.h> int main( int argc, char* argv[] ) { int rank, size; MPI_Init( &argc, &argv ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); MPI_Comm_size( MPI_COMM_WORLD, & size ); printf( "I am %d of %d\n", rank, size ); MPI_Finalize(); return 0; t Job Examples

  11. Lessons Learned • Basic Globus configuration and functionality, used in AFRL implementation, is mature, but can be tedious • Mpiexe.py, mpdlib.py was modified so that ws-gram was able to send a distributed job to mpich2. Thanks to Dr. Ralph Butler of Middle Tennessee State University. • Applications are changing and maturing faster than the documentation. • Mail groups and lists are not always helpful nor do they respond to questions. • Documentation is scarce on the MPI-2 and Globus Toolkit connection and is also outdated. • Documentation on the Condor and Globus interface is outdated. Resolved by installing Condor and then Globus with Condor scheduler.

  12. References • 2006 - Ross, Virginia W.; Pryk, Zenon; Koziarz, Walter; Spetka, Scott; "Grid Computing for High Performance Computing (HPC) Data Centers", AFRL-IF-RS-TR-2007-91, Defense Technical Information Center, Technical Report, Accession Number : ADA458335, October, 2006 • 2005 - Spetka, S.E., Ramseyer, G.O., Linderman, R.W., "Using Globus Grid Objects to Extend a Corba-based Object-Oriented System", 20th Annual ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), ACM Special Interest Group on Programming Languages, Town and Country Resort & Convention Center San Diego, California, October 16-20, 2005. • 2005 - Spetka, S.E., Ramseyer, G.O., Linderman, R.W., "Grid Technology and Information Management for Command and Control", 10th International Command and Control Research and Technology Symposium, The Future of C2, McLean, Virginia, VA, June 13-16, 2005. • www.cs.sunyit.edu/~scott • www.cs.sunyit.edu/~wellsj1

More Related