1 / 17

Condor and the NGS John Kewley NGS Support Centre Manager

Condor and the NGS John Kewley NGS Support Centre Manager. Outline. What is Condor? What is High Throughput Computing Condor and the NGS. What is Condor?. A job submission framework which utilises the spare computing power within a heterogeneous computer network

rolfe
Download Presentation

Condor and the NGS John Kewley NGS Support Centre Manager

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Condor and the NGSJohn KewleyNGS Support Centre Manager

  2. Outline • What is Condor? • What is High Throughput Computing • Condor and the NGS NGS Innovation Forum, Manchester

  3. What is Condor? • A job submission framework which utilises the spare computing power within a heterogeneous computer network • Desktop PCs, Linux workstations, servers, clusters, teaching lab resources, all can be include in the Condor pool • It supports High-Throughput Computing (HTC), maximising the amount of processing capacity that is utilised over long periods of time. • Developed over the past 20 years at the University of Wisconsin in Madison NGS Innovation Forum, Manchester

  4. Terminology HPC (High Performance Computing) • Large of amounts of [simultaneous] computing power for short periods of time HTC (High Throughput Computing) • Large amounts of computing over longer periods, not necessarily all at once NGS Innovation Forum, Manchester

  5. Useful Features • Automatic resubmission when jobs fail • Ability to cluster groups of jobs • Checkpointing / migration • DAGMan - Directed Acyclic Graph / workflow manager • Integration with Grid resources, especially through Condor-G • Staging and retrieval of data • Glide-in – dynamically add Grid worker nodes to your Condor pool NGS Innovation Forum, Manchester

  6. Various job types Parameter Studies OpenMP Master-worker Parallel Parameter Search Serial Embarassingly Parallel Sequential Parameter Sweep Monte Carlo MPI PVM NGS Innovation Forum, Manchester

  7. Terminology Parallel • Tightly-coupled Processes • Need synchronisation • Information sharing • Message passing • Shared memory • 1 process fails, whole job fails • Single large homogenous resouce • Processors used simultaneously Independent • Unordered (so not serial/sequential) • Nothing embarrassing about it • No communication once job starts • Might not need all results • Could run on different machines with different operating systems. NGS Innovation Forum, Manchester

  8. Condor on the NGS • Cardiff: as well as their SGI cluster, 1000 WindowsXP (~200 available to NGS) of their Condor Pool • Bristol: ~50 WindowsXP in a Condor pool fronted by a Linux server • Reading: ~400 Linux (CoLinux under WindowsXP) NGS Innovation Forum, Manchester

  9. Condor with the NGS • Cardiff: as well as their SGI cluster, 1000 WindowsXP (~200 available to NGS) of their Condor Pool • Bristol: ~50 WindowsXP in a Condor pool fronted by a Linux server • Reading: ~400 Linux (CoLinux under WindowsXP) NGS Innovation Forum, Manchester

  10. University of Manchester, Research Computing Services • 100 cores (an additional 400 in 2nd pool) • Condor used as backfill for the SGE queues • IP-tunnelling used to enable connection to the NW-Grid backend nodes from Condor (rather than the provided GCB, the Generic Connection Broker) NGS Innovation Forum, Manchester

  11. OxGrid: Overview Department/College Department/College Oxford e-Research Centre Department/College Storage (SRB) BDII, VOMS, SSO CA... Resource Broker/ Login (Condor) Condor pool Departmental Clusters Condor pool Other University/Institution Other University/Institution Other University/Institution National Grid Service Resource Microsoft Cluster National Grid Service Cluster Super-computing centre NGS Innovation Forum, Manchester

  12. User login Condor-G portal MyProxy server Condor-G central manager Condor-G submit host CSD-Physics cluster (ulgbc2) CSD-Physics cluster (ulgbc2) CSD AMD cluster (ulgbc1) NW-GRID cluster (ulgbc3) NW-GRID/POL cluster (ulgp4) Condor ClassAds Globus file staging NGS Innovation Forum, Manchester

  13. Novel Architecture !? • Condor itself is not that new • Some NGS users request Windows resources, but most previous NGS nodes use PBS, LSF or SGE on Linux • Condor can provide access to Windows resources NGS Innovation Forum, Manchester

  14. Windows on the NGS Many users are looking for Windows resources on which to run their computations. • Cardiff: WindowsXP on Condor • Bristol: WindowsXP on Condor • Southampton: 100 processors running under Windows Compute Cluster Server NGS Innovation Forum, Manchester

  15. Other work • Jean-Alain Grunchec of the University of Edinburgh is trying Condor Glidein to add NGS resources to his condor pool • The e-Minerals project utilised a condor submission mechanism to submit jobs to both local Condor pools and Grid resources such as NGS and NW-Grid http://www3.interscience.wiley.com/journal/117909340/abstract?CRETRY=1 NGS Innovation Forum, Manchester

  16. Summary • Condor can be part of the NGS • Condor can be used with the NGS • Being combined with NGS in many Campus Grids • Incorporation of Windows into the NGS NGS Innovation Forum, Manchester

  17. Acknowledgements • Some slides are based on material from the University of Wisconsin-Madison Condor team. • Slides describing the UK university condor work are based on ones provided by them NGS Innovation Forum, Manchester

More Related