Problem Addressed
This presentation is the property of its rightful owner.
Sponsored Links
1 / 1

Scheduling of parallel jobs in a heterogeneous grid environment PowerPoint PPT Presentation


  • 72 Views
  • Uploaded on
  • Presentation posted in: General

Problem Addressed. Simulation Environment. Scheduling of parallel jobs in a heterogeneous grid environment Each site has a homogeneous cluster of processors, but processors at different sites have different speeds

Download Presentation

Scheduling of parallel jobs in a heterogeneous grid environment

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Scheduling of parallel jobs in a heterogeneous grid environment

Problem Addressed

Simulation Environment

  • Scheduling of parallel jobs in a heterogeneous grid environment

    • Each site has a homogeneous cluster of processors, but processors at different sites have different speeds

    • Much of the research on scheduling in heterogeneous systems has focussed on independent sequential jobs

  • Research on parallel job scheduling has concentrated primarily on the homogeneous context

  • The algorithms used for scheduling sequential tasks on heterogeneous systems are too computationally complex to extend to parallel jobs

  • We extend the techniques used for parallel job scheduling in a homogeneous context to the heterogeneous context

  • Heterogeneous sites, with a homogeneous cluster of processors at each site

  • 5000 job subset of either the 430 node Cornell Theory Center (CTC) trace or the 128 node IBM SP2 system at the San Diego Supercomputer Center (SDSC)

  • NAS Parallel Benchmarks 2.0 used to model the heterogeneous runtimes

SGI

Origin 2000

IBM SP

(WN/66)

Cray

T3E 900

IBM

SP (P2SC 160 MHz)

IS Class B

(8 Nodes)

23.3

22.6

16.3*

17.7

MG Class B

(8 Nodes)

35.5

34.3

25.3

17.2*

MG Class B

(256 Nodes)

1.3147

2.2724

1.8

1.1*

Conservative vs. Arrgessive

LU Class B

(256 Nodes)

20.328*

94.893

35.6

24.2

* Denotes best runtime for the job

Restricted Multi-Site Reservations

  • Jobs are processed in arrival order by the meta-scheduler

  • Greedy assigns each job to the site with the lowest instantaneous load

  • Greedy-MR (Multiple Requests) submits each job to all sites

  • When the job starts at a site, the other instances are removed

  • We have shown this mechanism to be effective in a homogenous context (HPDC ’02)

  • However, only a slight improvement is seen in a heterogeneous context

  • When network bandwidth is limited, jobs can be submitted to a smaller number of sites and the multi-reservation scheduler can still realize a substantial fraction of the benefits achievable from a scheduler that schedules each job at all sites, by making fewer reservations

  • Use a more accurate approach to select which sites the job is submitted, instead of using the instantaneous load, we query the site to determine the earliest completion time

  • When using completion time as the criteria, submitting to fewer sites can be almost as effective as submitting to all sites

Conclusions and Future Work

Efficacy Based Queues

  • Improvement in turn around time and effective utilization for parallel job scheduling in a heterogeneous grid environment have been demonstrated through simulation

  • Next Steps:

    • Incorporate these changes into the Silver/Maui Scheduler

    • Deploy and evaluate the scheduler first on our research clusters, and then at the Ohio Super Computer Center

  • Explicitly take into account efficacy to improve the effective utilization

  • Use efficacy as the priority order for the jobs in the reserved and idle queue

  • Starvation free

  • Effective utilization increases and turn around time decreases, in spite of the decreases in raw utilization

A Characterization of Approaches to Parrallel Job Scheduling

Gerald Sabin Rajkumar Kettimuthu Arun Rajan P SadayappanSupported in part by Sandia National Laboratory

Metrics

  • We use the following metrics for evaluating the proposed schemes

    • Average Slowdown

    • Average Turnaround Time

    • Utilization

    • Effective Utilization

Backfilling

  • Backfilling

    • A later arriving job is allowed to leap frog previously queued jobs

Aggressive vs. Conservative

Processors

Processors

Time

Time

Processors

Time

  • Jobs are processed in arrival order by the meta-scheduler

  • In a heterogeneous context, the site where the job starts the earliest may not be the best site

  • In order to get the completion of a job at a particular site, conservative backfilling has to be employed at the local site

  • Conservative performs better than aggressive in all case, quite the opposite of a homogenous context

  • Improved backfilling caused by holes created due to the dynamic removal of replicated jobs at each site, and an increased number of jobs to attempt to backfill at each site

  • Conservative

    • Every job is given a reservation when it enters the system and a job is allowed to backfill only if it does not violate any of the previous reservations.

  • EASY

    • Only the job at the head of the queue is given a reservation and a job is allowed to backfill if it does not violate this reservation


  • Login