PhD Confirmation 19574746

PhD Confirmation19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors: Dr. Chris Ling Dr. Maria Indrawan

Content • Introduction • Motivation • Related Work • Research Objective • Proposed Methodology • Evaluation • Project Plan • Conclusion

Introduction • Workflow has been employed in business domain to streamline business process. • A representation of business process which contains of a set of tasks to achieve a goal. • Workflow became popular for its ability to represent the orchestration of services from heterogeneous sources. • Recently, workflow technology has been introduced into scientific domains to help scientists perform their work. • Scientific workflows usually require high computation power which grid environment can provide.

Motivation • To execute a workflow, tasks in the workflow need to be scheduled onto grid resources. • Scheduling from application side. • Several scheduling algorithms exist but few deal with multiple workflows. • Multiple grid workflows scheduled separately might compete for the same resources which may result in the constraint imposed on each workflow being violated. • Parameter sweep workflow – the workflow that is executed several times with different data for optimisation purpose.

Motivation (2) • Execute multiple instances of parameter sweep workflow concurrently. • Resources competition will occur thus necessitate the need to efficiently schedule tasks to avoid bottleneck and delay. • Each instance (or workflow) might have different constraint.

Related Work • Workflow • Business Workflow & Scientific Workflow • Scientific Workflow Management Systems • Grid Workflow Scheduling

Workflow • Workflow is a collection of tasks which are connected together to achieve a goal. • A task is a single unit of work. • Tasks in workflow can be arranged in four basic control structures • Sequential • Choice • Concurrent • Loop • Single starting point and single exit point. • Workflow can be specified in language such as BPEL4WS. • Graphical models such as Petri net and DAG (Directed Acyclic Graph) can be used to model workflow.

Workflow Example • Example of a workflow modelled as Petri Net

Executing Workflow • To execute workflow, an instance of workflow definition is created. • The tasks within workflow instance are bound to actual services (e.g. web services, software packages). • Static binding (fully ahead) • Dynamic binding (late binding during execution) • Workflow engine in Workflow Management System executes the workflow instance.

Business Workflow & Scientific Workflow • The focus of business workflow is on the composition of services, flow control, and coordination of tasks to provide a more powerful and meaningful support for users . • Dynamism in business workflow mainly concerns the ability to customise the workflow to suit the user and the ability to adapt to failure of workflow execution. • Scientific workflow captures and automates scientific process to help scientists from various domains to perform their experiments. • Scientific workflow is data and computation intensive. Scientific workflows can contain a large number of tasks, which require high computation power and involve complex data of various sizes. • Dynamism in scientific workflow currently focuses on performance optimisation which involves mapping and scheduling of workflow execution

Scientific Workflow Management Systems • Pegasus WMS [1] • Taverna [2] • Kepler [3]

Pegasus WMS • Developed by the Information Sciences Institute of the University of Southern California and the Computer Science department of the University of Wisconsin Madison. • Pegasus allows scientists to define abstract workflow without specifying detail of execution such as resources and data location. • Abstract scientific workflow is mapped to an executable workflow by “Pegasus Mapper” and passed on to DAGMan for execution.

Taverna • Designed for bioinformatics workflows as part of myGrid project. • Taverna focuses on the orchestration of bioinformatics web services and applications by using “Scufl” workflow definition language. • User can create and edit workflow using Scufl workbench. The workflow is then executed by “Freefluo” enactment engine.

Kepler • Designed specifically for scientific workflows and has been used in various scientific domains. • Kepler is developed based on “Ptolemy II” and adopts the “Vergil” GUI [4]. • Kepler provides “actors” which are used to perform tasks. • Workflow is created by connecting actors via input and output ports. • Built-in actors • Actors created by users • Workflow execution in Kepler is controlled by directors.

Directors in Kepler • SDF Director and DDF Director execute tasks sequentially. A task is started once the required inputs are available. • DDF allows choice structure (if-then) while SDF does not. • PN Director executes tasks in parallel. • CT Director and DE Director include time aspect to control execution of tasks, used for simulation.

TDA Director • TDA Director proposed by Abramson et al. [5] clones tasks involved in parameter sweep and execute them in parallel (Nimrod/K). • Each copy can be seen as an instance of parameter sweep workflow. • Every copy is of the same workflow definition and requires the same resource. • Efficient workflow scheduling is required to avoid delay and bottleneck.

Grid Workflow Scheduling • Best-Effort • Batch mode • Dependency mode • Cluster and duplication based scheduling • Meta-heuristics based scheduling • QoS-constraint Scheduling • Deadline-constraint based scheduling • Budget-constraint based scheduling *Classification based on Yu et al. [6]

Static and Dynamic Aspects • Static scheduling algorithms schedule tasks to resources once. • Dynamic scheduling usually involves • Run-time rescheduling using static algorithm • Adaptation triggered by availability of resources and performance

Batch Mode • Task Prioritising Phase – creates a list of tasks that are ready to execute and finds the resource that can execute each task fastest with Minimum Estimated Completion Time (MCT) • Resource Selection Phase :- • Min-Min - pair of task and resource with minimum MCT is scheduled first [7]. • Max-Min - pair of task and resource with maximum MCT is scheduled first [7]. • Sufferage - task that will suffer most if not scheduled first (sufferage determined by min MCT – second min MCT) [7]

Batch Mode - Extended • XSufferage – sufferage value also considers the time to transfer file. A task should be assigned to resource that already has the file that tasks requires [8]. • QoS guided Min-Min – tasks are grouped into tasks requiring high and low bandwidth. Tasks requiring high bandwidth are scheduled first using Min-min [9]. • Selective Min-Min Max-Min – uses Standard Deviation to determine if there are a few longer tasks and many shorter tasks. If so, Max-min is used, otherwise Min-min is used [10].

Dependency Mode – HEFT • HEFT - Heterogeneous-Earliest-Finish-Time [11] • Rank tasks based on • Average execution time of the tasks • Average time required for the transferring data between resources • Position of the tasks in the workflow. • The tasks with higher rank are scheduled first to the resource that can complete that task at the earliest time.

Dependency Mode – CPOP • CPOP - Critical-Path-on-a-Processor [11] • Find “critical path processor” that minimise critical path. • Tasks in critical path are assigned to the critical path processor. Other tasks are assigned to the resource that minimise their execution time. • Fail if there is no processor that can execute every task in the critical path, as pointed out by Shi and Dongarra [12].

Dependency Mode – Hybrid HEFT • Hybrid HEFT [13] • Rank tasks as normal HEFT. • Group tasks that do not depend on each other, similar to batch mode algorithms. • Schedule tasks in each group using “Balanced Minimum Completion Time” (BMCT) algorithm. Tasks scheduled on a resource can be moved to another resource to balance resource load.

Dependency Mode – SDC • SDC - Scheduling algorithm for heterogeneous processors with Different Capabilities. by Shi & Dongarra [12] • A task can only be executed by certain resources. • Consider tasks with scarce capable resource” - the tasks that fewer resources are able to execute. • Capable resource / Total resource • Tasks with scarce capable resource are scheduled first (rank higher) to avoid delay caused by allocating scarce resource to the task which can be executed elsewhere. • Does not deal with resource competition • For example, there are 10 resources that can execute task T but there are also 100 other tasks that need the same 10 resources.

Cluster and Duplication Based Scheduling -TDS • TDS - Task Duplication Based Scheduling Scheme [14] • The tasks that have lower communication cost between each other are clustered together. • Each cluster is assigned the processor that takes minimum execution time to execute the tasks within. • Where possible, duplicate the predecessor task of a tasks on the same process to minimise or eliminate the communication cost.

Cluster and Duplication Based Scheduling - TANH • TANH - Task Duplication Based Scheduling Algorithm for Heterogeneous Systems [15] • Clusters tasks based on the number of available processors. • If there are more processors than clusters, tasks are duplicated and scheduled to the available processors • If there are more clusters than processors, clusters are merged until the number of clusters is equal to the number of processors • Both TDS and TANH fail if there is no processor that can execute every task in a cluster [12].

Meta-heuristics Based Scheduling • GRASP [16] • Greedy Randomised Adaptive Search Procedure • Iteratively generate randomised schedule • Find local optimal solution in each iteration • Once iteration stops, best solution stored is returned as result • Genetic Algorithm [17] • Generate initial set of solutions (first parent) • Create new solution (children) based on known good solution (parent) • Repeat until a preset condition is met. • Gives solution based on entire workflow but takes longer scheduling time.

Deadline-constraint Based Scheduling • Backtracking [18] • Minimise cost while meeting deadline • Allocates the ready tasks to the resources with least cost then calculate the execution time. • If deadline is violated, the last allocated task is reallocated to a faster resource. Multiple backtracking may be required. • Partitioning [19] • Workflow is partitioned into branches of sequential tasks. • The deadline constraint of the workflow is then distributed to each of the partition. • If the deadline of a partition is violated during execution, the subsequent partition(s) adjust to handle the delay.

Scheduling Multiple Grid Workflows • Merge multiple non-loop workflows (DAG) then use any of the existing algorithms for single workflow to schedule [20] • Merge entry and exit points of all DAG • Connect shorter DAG in the middle of longer DAG • Schedule tasks in each DAG • Sequential – one DAG after another • Round-Robin • Selected tasks based on fairness • Does not deal with constraints and resource aspect

Scheduling Multiple Grid Workflows (2) • xDCP (extended Dynamic Critical Path) [21] • Initialise schedule by randomly allocating resources to parameter sweep tasks. • For each task, if there is a resource that can execute faster, move that task to that resource. • pM-S (priority based Master-Slave) [21] • Arrange the execution order of parameter sweep tasks to allow dependent tasks further down the workflow to execute earlier on another resource. • Increase parallelisation of parameter sweep tasks. • This technique assumes that resources are assigned to each set of parameter sweep tasks exclusively.

Scheduling Multiple Grid Workflows (3) • Game-quick & Game-cost by Duan et al. [22] • Based on game theory • Iterating through each workflow, tasks that are ready to execute are grouped together and compete for resources. • A task can win the game and get a resource but will lose on another resource. • Does not look at tasks further down in the entire of every workflow. • Applicability to our work needs further investigation.

Discussion • Few works have been done on scheduling multiple workflows, all of which do not consider enough resources aspect and do not utilise the information of the resources required by the entire workflows.

Research Objective • To propose a scheduling technique for executing multiple scientific grid workflows • The scheduling must be aware of the resources required by every task in every workflow. • The scheduling algorithm must be able to schedule tasks in multiple workflows based on resources required to avoid bottleneck and delay • Time constraints of every workflow should be satisfied. • Implementation in Kepler system • Dynamic change of resources are excluded from our scope.

Ongoing Scenario • Quantum Chemical Calculation using GAMESS quantum chemistry package [23]. • Optimise four parameters that gives the best pseudo atom surface [5].

Scenario Explanation • Parameter Sweep actor creates combinations of parameters and assigns a unique token to each combination. • TDA director (Nimrod/K) in Kepler system clones the four GAMESS tasks for each parameter combination. • The outputs of the four GAMESS tasks are sent to RMS task to calculate Root Mean Square error. • The results are ordered using Order Tags task and displayed in graph More Detail

Assumptions from Scenario • The same tasks in every instance require the same resource/resource types. • Within the same instance, the four GAMESS tasks may require the same or different resource types. • A task can only be executed by certain resources. • A resource cannot executed every task. • Depending on the currently available resources in the grid and the number of instances cloned, the execution plan for this scenario may be different.

Proposed Methodology • Task and Resource Prioritising Phase • Resource Selection Phase • Implementation Prototype in Kepler

Task and Resource Prioritising Phase • Tasks which are independent on other tasks from multiple workflows (or instances) can be grouped together. • Resources are also ranked based on the scarcity of resources and the degree of competition. • Tasks in each group can be ranked based on: • Execution time • Dependencies between tasks • Ranks of the resources they required. • Need to know the resources required by every task in every workflow instance.

Possible Issues • A resource is a non-scarce resource to a workflow but is a scarce resource to another workflow. • A resource might not be a highly demanded resource at the beginning of the workflow but becomes one later on. • Complexity introduced by multiple workflows

Possible Issues (2) • Input model for algorithm • Existing work mostly use Directed Acyclic Graph (DAG). • Loop is not allowed • Kepler allows loop. • Petri Net which also supports workflow analyses might be a good alternative.

Resource Selection Phase • Resources are allocated to tasks based on task rank and resource rank. • At this stage, fastest execution and deadline constraint will be considered in resource allocation. • Each workflow (or workflow instance) might specify different constraints. • Allocation of resources to the tasks that are ready to execute must also consider the tasks further down the workflow. • Resource ranks • Estimated time at which these resources are in demand. • Output of this phase is the execution plan for executing multiple grid workflows.

Example of Execution Plan for 3 Instances See Assumptions

Implementation in Kepler • As a multiple workflow scheduler for Kepler. • As part of TDA director (Nimrod/K) for parameter sweep.

Evaluation • The proposed scheduling technique will finally be implemented and tested on Kepler using real scientific workflow scenarios against the following criteria. • The time required for the scheduling process for different number of workflows • The efficiency of the algorithm in comparison with existing work • How efficient is the resulting execution plan? • Need to identify proper comparison scheme as most existing work are for single workflow • Comparison with existing Kepler’s directors including TDA director without the proposed algorithm.

Evaluation (2) • The ability of the algorithm to satisfy the user’s requirement • Time constraints of each workflow (instance) specified by the users must be satisfied. • It might be possible to apply the verification technique proposed by Chen and Yang [24] for this purpose. • The ability of the algorithm to maintain task dependencies • The proposed algorithm should preserve the original task dependencies and the order of tasks in the original workflows.

Project Plan

Published Paper • S. Smanchat, S. Ling, and M. Indrawan, "A Survey on Context-aware Workflow Adaptations," in Proceedings of the 3rd International Workshop on Trustworthy Ubiquitous Computing (TwUC 2008). Linz, Austria, 2008, pp. 422-425.

Conclusion • The need for scheduling multiple scientific workflows in grid environment is identified. • Existing grid workflow scheduling techniques are described. • We aim to develop an algorithm that can schedule multiple workflows into execution plan so that the time constraints of those workflows are considered together. • Our work should be able to help improve the performance of the execution of multiple scientific workflows over the existing approaches.

Question?

References [1] E. Deelman, G. Singh, M.-H. Su, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, K. Vahi, G. B. Berriman, J. Good, A. Laity, J. C. Jacob, and D. S. Katz, "Pegasus: A framework for mapping complex scientific workflows onto distributed systems," Sci. Program., vol. 13, pp. 219-237, 2005. [2] T. Oinn, M. Addis, J. Ferris, D. Marvin, M. Senger, M. Greenwood, T. Carver, K. Glover, M. R. Pocock, A. Wipat, and P. Li, "Taverna: a tool for the composition and enactment of bioinformatics workflows," Bioinformatics, vol. 20, pp. 3045-3054, 2004. [3] B. Ludäscher, I. Altintas, C. Berkley, D. Higgins, E. Jaeger, M. Jones, E. A. Lee, J. Tao, and Y. Zhao, "Scientific workflow management and the Kepler system," Concurr. Comput. : Pract. Exper., vol. 18, pp. 1039-1065, 2006. [4] Ptolemy II, http://ptolemy.eecs.berkeley.edu/ptolemyII/, Accessed October, 2008 [5] D. Abramson, C. Enticott, and I. Altinas, "Nimrod/K: towards massively parallel dynamic grid workflows," in Proceedings of the 2008 ACM/IEEE conference on Supercomputing. Austin, Texas: IEEE Press, 2008. [6] J. Yu, R. Buyya, and K. Ramamohanarao, "Workflow Scheduling Algorithms for Grid Computing," in Metaheuristics for Scheduling in Distributed Computing Environments, 2008, pp. 173-214.

PhD Confirmation 19574746

PhD Confirmation 19574746

Presentation Transcript

Confirmation

Confirmation

Confirmation

Confirmation

Confirmation

Confirmation

CONFIRMATION

CONFIRMATION

Confirmation

Confirmation

Confirmation

CONFIRMATION

Confirmation

Confirmation...

CONFIRMATION

CONFIRMATION

Confirmation

CONFIRMATION

Confirmation

CONFIRMATION

Confirmation

Confirmation