1 / 11

Batch Scheduling at LeSC with Sun Grid Engine

David McBride <dwm@doc.ic.ac.uk> Systems Programmer London e-Science Centre Department of Computing, Imperial College. Batch Scheduling at LeSC with Sun Grid Engine. Overview. End-user requirements Brief description of compute hardware Sun Grid Engine software deployment

iago
Download Presentation

Batch Scheduling at LeSC with Sun Grid Engine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. David McBride <dwm@doc.ic.ac.uk> Systems Programmer London e-Science Centre Department of Computing, Imperial College Batch Scheduling at LeSC with Sun Grid Engine

  2. Overview • End-user requirements • Brief description of compute hardware • Sun Grid Engine software deployment • Tweaks to the default SGE configuration • Future changes • References for more information and questions.

  3. End-User Requirements • We have many different users: high-energy physicists, bioinfomaticians, chemists, parallel software researchers. • Jobs are many and varied: • Some users run relatively few long running tasks, others submit large clusters of shorter jobs. • Some require several cluster nodes to be co-allocated at runtime (16, 32+ MPI hosts), others simply use a single machine. • Some require lots of RAM.. (1, 2, 4, 8GB+ per machine) • In general users are fairly happy so long as they get a reasonable response time.

  4. Hardware • Saturn: 24-way 750Mhz UltraSparcIII Sun E6800 • 36GB RAM, ~20TB online RAID storage, • 24TB tape library to support long-term offline backups. • Running Solaris 8 • Viking cluster: 260 Node dual P4 Xeon 2Ghz+ • 128 machines with Fast Ethernet; 2x64 machines also with Myrinet • 2 front-end nodes & 2 development nodes. • Running RedHat Linux 7.2 (plus local additions and updates) • Mars cluster: 204 Node dual AMD Opteron 1.8Ghz+ • 128 machines with Gigabit Ethernet; 72 machines also with Infiniband. • Running RedHat Enterprise Linux 3 (plus local refinements) • 4 front-end interactive nodes.

  5. Sun Grid Engine Deployment • Two separate logical SGE installations • Saturn acts as the master node for both cells. • However, Viking is running SGE 5.3 and Mars is running SGE 6.0. • Mars is still ‘in beta’; Viking is still providing the main production service. • When Mars’s configuration is finalized, end-users will be migrated to Mars – Viking will then be reinstalled with the new configuration.

  6. Changes to Default Configuration • Issue 1: • If all the available worker nodes are running long-lived jobs, then a new short-lived job added to the queue will not execute until one of the long-lived jobs has completed. (SGE does not provide a job checkpoint-and-preempt facility.) • Resolution: A subset of nodes are configured to only run short-lived jobs. • Trades slightly reduced cluster utilization for shorter average-case response time for short-lived jobs.

  7. Tweaks to SGE configuration • Issue 2: • If a job is submitted that requires the co-allocation of several cluster nodes simultaneously (eg for a 16-way MPI job) then that job can be starved by a larger number of single-node jobs. • Resolution: Manually intervene to manipulate queues so that the large 16-way job will be scheduled. (SGE 5.3) • Resolution: Upgrade to SGE 6 which uses a more advanced scheduling algorithm (advance reservation with backfill.)

  8. Future Changes • Default requirements for jobs: • Different cluster nodes have different resources; eg some have more memory, fast processors, than others. • Sometimes a low-requirement job will be allocated to one of these more capable machines unnecessarily because the submitter has not specified the job’s requirements. • This can prevent a job which does have high requirements from being run as quickly. • Plan to change the SGE configuration so that a job will, by default, only require the resources of the least-capable node. • Places onus on user to request extra resources if needed.

  9. Future Changes: LCG • We are participating in the Large Hadron Collider Compute Grid as part of the London Tier-2. • This has been non-trivial; the standard LCG distribution only supports PBS-based clusters. • We’ve developed SGE-specific Globus JobManager and Information Reporter components for use with LCG. • We have also been working with the developers to address issues with running on 64bit Linux distributions. • Currently deploying front-end nodes (CE, SE, etc.) to expose Mars as an LCG compute site. • We are also joining the LCG Certification Testbed to provide a SGE-based test site to help ensure future support.

  10. References • London e-Science Centre homepage: • http://www.lesc.ic.ac.uk/ • SGE intergration tools for Globus Toolkit 2, 3, 4 and LCG: • http://www.lesc.ic.ac.uk/projects/sgeindex.html

  11. Q&A

More Related