1 / 29

Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications

Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications. Rohan Kurian, Pavan Balaji, P. Sadayappan The Ohio State University. Parameter Sweep Applications. An important class of applications Set of independent tasks MCell Application

moira
Download Presentation

Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Opportune Job Shredding:An Efficient Approach for Scheduling Parameter Sweep Applications Rohan Kurian, Pavan Balaji, P. Sadayappan The Ohio State University

  2. Parameter Sweep Applications • An important class of applications • Set of independent tasks • MCell Application • 3D simulations for sub-cellular architecture/physiology • GTOMO (Parallel Tomography) Application • Multiple view-point simulation • Systems exist for scheduling on the Grid • Cluster-based Scheduling?

  3. Application Level Schedulers • Manage the scheduling of applications • Break the application to appropriate chunks • APST (AppLeS Parameter Sweep Template) • NIMROD • Greedy approach to schedule PSA chunks

  4. Presentation Roadmap • Job Scheduling in Clusters • Multi-Site Job Scheduling • PSA Scheduling Strategies • Multi-Site Scheduling of PSAs • Performance Evaluation • Conclusions

  5. Job Scheduling in Clusters • Mapping arriving jobs to available resources • Multiple Schemes for Scheduling • First Come First Serve (FCFS) • Conservative Scheduling • Aggressive or EASY Scheduling • Fair-Share Constraints • A user can not have more than ‘N’ queued jobs • Submitting the multiple chunks of a PSA job • Violation of Fair-Share constraints • Combine chunks to form a single parallel job

  6. Formation of PSAs in Clusters Small Independent Tasks Parallel Parameter Sweep Application

  7. Presentation Roadmap • Job Scheduling in Clusters • Multi-Site Job Scheduling • PSA Scheduling Strategies • Multi-Site Scheduling of PSAs • Performance Evaluation • Conclusions

  8. Multi-Site Job Scheduling • Multiple Simultaneous Requests • Job submitted to multiple sites • Started on the earliest cluster • Existing schemes have limitations • Heterogeneous Clusters • Different Scheduling Schemes

  9. Jobs Jobs Jobs Meta Scheduler Meta Scheduler Meta Scheduler Local Scheduler Local Scheduler Local Scheduler Multiple-simultaneous-requests Site 1 Site 2 Site 3

  10. Presentation Roadmap • Job Scheduling in Clusters • Multi-Site Job Scheduling • PSA Scheduling Strategies • Multi-Site Scheduling of PSAs • Performance Evaluation • Conclusions

  11. PSA Scheduling Strategies • Flooding based Job Shredding • Submit all chunks in the PSA at once • Greedy approach • Improves User and System metrics • Doesn’t ensure fairness to Non-PSA jobs • Opportune Job Shredding • Uses an additional Application-Level Scheduler • Monitors the current schedule of the system • If no normal backfill is possible • Allow PSA jobs to shred and backfill

  12. Presentation Roadmap • Job Scheduling in Clusters • Multi-Site Job Scheduling • PSA Scheduling Strategies • Multi-Site Scheduling of PSAs • Performance Evaluation • Conclusions

  13. Multi-Site Scheduling for PSAs • Two-level Application Level Schedulers • No constraints on sites • Allowed to have different speeds • Allowed to have different scheduling policies • Similar to “Multiple Simultaneous Requests” • Simultaneous requests only for PSAs

  14. Multi-Site Scheduling for PSAs Meta Application-Level Scheduler Site 1 App-Level Scheduler App-Level Scheduler Site 2 Job Queue Local Scheduler Job Queue Local Scheduler App-Level Scheduler Job Queue Local Scheduler Site 3

  15. Presentation Roadmap • Job Scheduling in Clusters • Multi-Site Job Scheduling • PSA Scheduling Strategies • Multi-Site Scheduling of PSAs • Performance Evaluation • Conclusions

  16. Performance Metrics • Response Time • Completion Time – Submit Time • Slowdown • Response Time / Runtime • Loss of Capacity (LOC) • LOC = min {(waiting jobs procs), idle procs} • T = Time for which this state lasts • LOC = LOC x T

  17. Evaluation Scheme • Simulation based Approach • CTC trace from Feitelson’s archive • EASY backfilling used • For multi-site evaluation • CTC traces from 3 different months • Processing speeds in the ratio 2:1:3

  18. Flooding Based Job Shredding • Up to 60% improvement for PSA Jobs • Up to 90% worse performance for Non-PSA Jobs

  19. Flooding: Job Category wise breakup • Narrow Short Non-PSA jobs suffer most • Loss of back-filling opportunities is the main reason

  20. Flooding: Loss of Capacity • Up to 75% improvement in the Loss of Capacity

  21. Opportune Job Shredding • Up to 70% improvement for PSA Jobs • Less than 2% worsening in performance for Non-PSA Jobs

  22. Opportune: Job Category wise breakup • No category of Non-PSA jobs suffers more than 7%

  23. Opportune: Loss of Capacity • Up to 12% improvement in the Loss of Capacity

  24. Opportune (Multi-Site) • Up to 95% improvement for PSA Jobs • No significant loss of performance for Non-PSA jobs

  25. Opportune (Multi-Site):Response Time • Up to 75% improvement for PSA Jobs • No significant loss of performance for Non-PSA jobs

  26. Opportune (Multi-Site):Slowdown • Up to 95% improvement for PSA Jobs • No significant loss of performance for Non-PSA jobs

  27. Opportune (Multi-Site):Loss of Capacity • Up to 45% improvement in the Loss of Capacity

  28. Concluding Remarks • Opportune Job Shredding • Efficient Scheduling of PSAs • Single Site and Multi-Site versions • Significant improvement for PSA jobs • Ensures that Non-PSA jobs are not affected • Plan to integrate this with Prod. Schedulers

  29. Thank You!

More Related