GridFlow: Workflow Management for Grid Computing

GridFlow: Workflow Management for Grid Computing Kavita Shinde

Outline • Introduction • Grid Resource Management • Grid Workflow Management • An Example Scenario • Conclusion

Introduction • GridFow • given a set of workflow tasks and a set of resources,how do we map them to Grid resources? • workflow management systems developed at University of Warwick • developed on top of an agent-based resource management system for Grid computing(ARMS) • focus is on service-level scheduling and workflow management

Grid Resource Management • Three Layers of resource management system within the GridFlow system • Grid Resource • high-end computing or storage resource • accessed remotely • Multiprocessors, or clusters of workstations or PCs with large disk storage space • Local Grid • multiple grid resources that belong to one organization • resources are connected with high speed networks • Global Grid • consists of all local Grids

Grid Resource Management • PACE • a toolset for resource performance and usage analysis • takes separate resource and application models as inputs and is able to predict the execution time of a task prior to run time • scalability(execution time vs. level of parallelism) can be determine • helps in preventing over-occupying of resources • useful when trying to interleave sub-workflows as much as possible

Grid Resource Management • Titan • grid resource manager • locates a suitable resource set and passes the sub-workflow to a local scheduler • utilizes free processors to minimize idle-time and improve throughput • supported by the PACE performance predictive data

Grid Resource Management • ARMS • main component – agent • agent – representative of a local grid at a global level of grid resource management • agents cooperate with each other to find the available resources and there characteristics • dispatch requests that can not be satisfied locally to neighboring agents

Grid Workflow Management The implementation of grid workflow management is carried out at multiple layers • Tasks • basic building block of application • e.g.. MPI(Message Passing Interface) and PVM(Parallel Virtual Machine) jobs running on multiple processors tasks • Sub-workflows • a flow of closely related tasks that is to be executed in a predefined sequence on grid resources of a local grid • usually significant communication between tasks, but resource conflicts may occur when multiple sub-workflows require the same resource simultaneously • Workflows • a flow of several different sub-workflows

GridFlow user portal • provides graphical user interface to compose workflow elements and access additional grid services • LGSS • handles conflicts - scheduled sub-workflows may belong to different workflows • ARMS • represents a local Grid at a global level of Grid resource management, and conducts local Grid sub-workflow scheduling • Globus MDS • provides information about the available resources on the Grid and their status • Titan • utilizes performance data obtained from PACE for resource scheduling

Grid Workflow Management • GGWM • Simulation • takes place before a grid workflow is actually executed, workflow schedule is achieved • returns simulation results to GridFlow portal for user agreement • Execution • executed according to the simulated schedule • the actual execution may differ - dynamic nature of grid • delays - send back to the simulation engine & rescheduled • Monitoring • provides access to real-time status reports of tasks or sub-workflow execution

Global Grid Workflow Management • Scheduling Algorithm • initialize all properties of each sub-workflow – null • look for a schedulable sub-workflow • ensure pre- sub-workflows have all been scheduled • configure the start time of the chosen sub-workflow to be the latest end time of its pre- sub-workflows • submit the start time and the sub-workflow to a grid level Agent(ARMS) • finds a suitable local grid using LGSS

Global Grid Workflow Management • ARMS reschedules the less critical sub-workflows • algorithm relies heavily on the simulation results of LGSS

Workflow W : a set of sub-workflows Si(i=1,….n) Si and Sn starting and ending points pi : number of pre- sub-workflows of Si qi : number of post- sub-workflows of Si G: global grid – set of local grids Lj(j=1….m) k: true if sub-workflow is scheduled else false

Local Grid Sub-Workflow Scheduling • Scheduling Algorithm • very similar to GGWM • has to deal with multiple tasks that may belong to different workflows • start time of the chosen task can’t be configured with the latest end time of its pre-tasks directly • resource conflicts • Executes the task with the higher priority first • gives higher priority to a possibly earlier enabled task

Fuzzy Time Operations • LGSS and GGWM algorithms are implemented using fuzzy timing techniques • fuzzy time function – • gives numerical estimate of the possibility that an event arrives at time  advantages: can be computed very fast suitable for scheduling time critical applications • they do not necessarily provide the best scheduling solution

1() = 0.5(0,2,6,7) 2() = (2,4,4,6) • a: possibility distributions of 1 and 2 • b: latest arrival distribution of 1 and 2 • c: earliest enabling time • d: operator min – intersection of 1 and 2 • e: operator max – union of 1 and 2 • f: sum of 1 and 2 • min(0.5,1)(0+2, 2+4, 6+4, 7+6)=0.5(2, 6, 10, 13)

An Example Scenario • W1, W2: Workflows • L1, L2: Local Grids • task A2 of sub-workflow S3 from W1 is being executed • S3 from W2 is to be scheduled • resource conflict between A3 and A4 • schedule aims to find the e5()

An Example Scenario • task enabling times – from pre-task end times • task execution times – from TITAN system supported by PACE functions a3()=(3,5,5,7); d3()=(5,6,7,8); a4()=(0,3,3,5); d4()=(10,12,14,16); d5()=(2,5,6,9);

An Example Scenario using LGSS s3() = min{(3,5,5,7),earliest{(3,5,5,7),(0,3,3,5)}} = min{(3,5,5,7),(0,3,3,5)} = 0.5(3,4,4,5) s4() = min{(0,3,3,5),earliest{(3,5,5,7),(0,3,3,5)}} = min{(0,3,3,5),(0,3,3,5)} = (0,3,3,5) e13()= sum{0.5(3,4,4,5),(5,6,7,8)} = 0.5(8,10,11,13)

An Example Scenario e14()= sum{latest{0.5(8,10,11,13),(0,3,3,5)},(10,12,14,16)} = sum{0.5(8,10,11,13),(10,12,14,16)} = 0.5(18,22,25,29) e24()= sum{(0,3,3,5)},(10,12,14,16)} = (10,15,17,21) e23()= sum{latest{ (10,15,17,21),0.5(3,4,4,5)},(5,6,7,8)} = sun{0.5(10,12.5,26,29),(5,6,7,8)} = 0.5(15,18.5,26,29) e4()= max{0.5(18,22,25,29),(10,15,17,21)} = (10,15,17,29)

An Example Scenario e5()= sum{(10,15,17,29),(2,5,6,9)} = (12,20,23,38) so S3 fromW2 will complete on local grid L1 most likely between 20 to 23 submit this data to GGWM – decides whether the local grid L1 should be allocated the sub-workflow S3 from W2

Conclusion • the fuzzy timing technique provides a good solution to the conflict solving problem arising from grid workflow management issue • results indicate that local and global grid workflow management can coordinate with each other to optimize workflow execution time and solve conflicts of interest • useful in highly dynamic grid environments • large network latencies exists and application performance is difficult to predict accurately • needs more flexible cooperation among different grid services and components which challenges security

GridFlow: Workflow Management for Grid Computing