1 / 25

jkj@uk.ibm

Scheduling. John Easton. jkj@uk.ibm.com. What is a scheduler?. “A means of employing one or more predictive models to evaluate the performance of an application in a system and use this information to assign tasks communications and data to resources”. What is a scheduler?.

zody
Download Presentation

jkj@uk.ibm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scheduling John Easton jkj@uk.ibm.com

  2. What is a scheduler? • “A means of employing one or more predictive models to evaluate the performance of an application in a system and use this information to assign tasks communications and data to resources”

  3. What is a scheduler? • “A means of employing one or more predictive models to evaluate the performance of an application in a system and use this information to assign tasks communications and data to resources”

  4. What is a scheduler? • “A means of employing one or more predictive models to evaluate the performance of an application in a system and use this information to assign tasks communications and data to resources” • Which is all pretty easy until we have to handle: • Serial, parallel and distributed applications • Simultaneous execution on shared resources • Enterprise-quality service levels and response times

  5. But let’s talk about locking first • Simple scheduling assumes no locking model • Most data access needs to be coordinated • If you get it wrong, then it doesn’t matter how good your scheduler is, it simply won’t perform • Which defeats the whole idea / purpose of doing it • So schedulers need to understand locking models

  6. Different sorts of scheduling • Job scheduling • Optimise the throughput of the system (measured by number of jobs executed) • Resource scheduling • Coordinate access to a resource by managing multiple requests for access whilst optimising utilisation • Application scheduling • Optimise (promote) the performance of a given application • These requirements conflict because each has a different view of what “performance” means

  7. So what does a scheduler need to do? • Select a set of resources on which to schedule the task(s) of the application • Resource discovery - Identify what resources are present • Resource location - Determine which resources are available for use • Resource selection - Select candidate resources • Assign application task(s) to compute resources • Distribute data or co-locate data an computation • The easy bit to do (really) badly • Order tasks on compute resources • Order communication between tasks

  8. Scheduling model • Sets of rules used to produce schedules • Description of the performance activity to be optimised by the performance model • Abstraction of the programs to be scheduled Scheduling policy Program model Performance model • Abstraction of the behaviour of the program in the system

  9. Grid scheduling needs • Produce performance predictions that are timeframe-specific • Since performance of resources varies over time, so predictions of performance must do likewise • Utilise dynamic information to represent performance variations • Adapt to a wide range of infrastructural factors

  10. Why not use an MPP or cluster scheduler for the grid? • MPP schedulers control all the resources • All resources lie in a single administrative domain • Resource pool is invariant • Impacts of contention from other applications in the system is minimal • All compute and communications resources exhibit similar performance characteristics

  11. Unwritten truths • Efficient application performance and efficient system performance are NOT the same thing • It is not possible to obtain optimal performance for multiple applications simultaneously • Load balancing may not provide optimal application scheduling or system utilisation • You can’t create a performance-efficient schedule without modelling the system in detail

  12. Comparison of cluster scheduling technologies -I

  13. Comparison of cluster scheduling technologies - II

  14. Comparison of cluster scheduling technologies - III

  15. The realities of the commercial world… • Very few of these technologies deliver the necessary functionality for them to play anything more than a niche role • For example, being used on a single function cluster within a department • So what sort of problems do we need to address for commercial scheduling to be more of a reality • Platform support • Guaranteed levels of service, LoB requirements, response time etc. • Security • More complex inter/intra company environments • Commercial ROIs, TCOs etc.

  16. Cross LOB Batch workload submissions 1 1 LOB A LOB B 2 2 CIO 3 Value Proposition: Maximizes infrastructure ROI by driving 90%+ cross-enterprise utilization against resource policy across a heterogeneous environment. ITWS-A Intelligent schedule and guarantee the completion of batch workload 4 4 4 3 LOB A LOB B 70/30 Fair Share Shared virtual pool of grid-enabled, heterogeneous IT resources including desktops, servers, supercomputers and mainframes Automate Cross-Enterprise Workloads in an On Demand Environment

  17. Automate Business Scheduling Across Multiple Scheduling Clusters Design and schedule batch workload and dependencies such as: time, data and job events to be executed Complete automation and real-time monitoring of mission-critical batch workload and dependencies Reliable and predictable results provided faster Shared virtual grid-enabled pool of heterogeneous IT resources for each batch job within a complex business process Value Proposition: Coordinates the cross-enterprise scheduling of workload execution across clusters of heterogeneous scheduling environments

  18. Presentation & user interfaces Management Service virtualisation Compute virtualisation Data virtualisation Grid control points Service pools FW FW FW FW Portal Metascheduler FW FW FW FW FW FW FW Idle pool FW FW Data management Grid reference architecture

  19. Service pools FW Service pool 1 FW Service pool 2 Policy (rules)-based Metascheduler Alternative (3rd party) service provider FW OR VPN FW Service pool 3 FW Service requests FW FW 3rd party service provider VPN Service pool 4 FW 3rd party “burst in” compute capacity Service pool 5 FW VPN FW FW . . . FW Service pool n Compute complexity

  20. Storage network Filesystem data FW Structured data FW Unstructured non-file data FW 3rd party data sources FW FW Public data sources FW FW Remote disaster recovery data repository FW Data complexity

  21. A D 20Mbs-1 9337km 400Mbs-1 99km 40Mbs-1 B C 10058km 200Mbs-1 3795km 244km 10Mbs-1 400Mbs-1 213km E OK – so let’s do this for real…

  22. User Web Portal Control script(s) Scheduler daemon Network of Grids Monitor daemon Policy-based metascheduler implementation

  23. Presentation & user interfaces Management Service virtualisation Compute virtualisation Data virtualisation Grid control points Service pools FW FW FW FW Portal Metascheduler FW FW FW FW FW FW FW Idle pool FW FW Data management Grid reference architecture

  24. Trends & challenges • Trends • Use of increasingly dynamic information • Use of meta-information • Scheduling of more real-world programs • Restrictions on the program domain • Deriving scheduling information from programming language(s) • Challenges • Portability vs. performance • Grid-aware programming • Scalability, Efficiency, Repeatability • Metascheduling

  25. Questions?

More Related