Grid Resource Brokering and Cost-based Scheduling With Nimrod-G and Gridbus Case Studies

Grid Resource Brokering and Cost-based Scheduling With Nimrod-G and Gridbus Case Studies Cloud Computing and Distributed Systems (CLOUDS) Lab. The University of MelbourneMelbourne, Australiawww.cloudbus.org Rajkumar Buyya

Grid Grid Economy Scheduling Economics Agenda • Introduction to Grid Scheduling • Application Models and Deployment Approaches • Economy-based “Computational” Grid Scheduling • Nimrod-G -- Grid Resource Broker • Scheduling Algorithms and Experiments on World Wide Grid testbed • Economy-based “Data Intensive” Grid Scheduling • Gridbus -- Grid Service Broker • Scheduling Algorithms and Experiments on Australian Belle Data Grid testbed

Grid Scheduling: Introduction

2100 2100 2100 2100 2100 2100 2100 2100 Grid Resources and Scheduling User Application Grid Resource Broker Grid Information Service Local Resource Manager Local Resource Manager Local Resource Manager Single CPU (Time Shared Allocation) SMP (Time Shared Allocation) Clusters (Space Shared Allocation)

Grid Scheduling • Grid scheduling: • Resources distributed over multiple administrative domains • Selecting 1 or more suitable resources (may involve co-scheduling) • Assign tasks to selected resources and monitoring execution. • Grid schedulers are Global Schedulers • They have no ownership or control over resources • Jobs are submitted to local resource managers (LRMs) as user • LRMs take care of actual execution of jobs

Example Grid Schedulers • Nimrod-G - Monash University • Computational Grid & Economic-based • Condor-G – University of Wisconsin • Computational Grid & System-centric • AppLeS–University of California@San Diego • Computational Grid & System centric • Gridbus Broker – University of Melbourne • Data Grid & Economic based

Phase I-Resource Discovery 1. Authorization Filtering Phase III- Job Execution 2. Application Definition 6. Advance Reservation 3. Min. Requirement Filtering 7. Job Submission 8. Preparation Tasks 9. Monitoring Progress Phase II - Resource Selection 10 Job Completion 4. Information Gathering 11. Clean-up Tasks 5. System Selection Key Steps in Grid Scheduling Source: J. Schopf, Ten Actions When SuperScheduling, OGF Document, 2003.

Movement of Jobs: Between the Scheduler and a Resource • Push Model • Manager pushes jobs from Queue to a resource. • Used in Clusters, Grids • Pull Model • P2P Agent request for a job for processing from job-pool • Commonly used in P2P systems such as Alchemi and SETI@Home • Hybrid Model (both push and pull) • Broker deploys an agent on resources, which pulls jobs from a resource. • May use in Grid (e.g., Nimrod-G system). • Broker also pulls data from user host or separate data host (distributed datasets) (e.g., Gridbus Broker).

Example Systems

Application Models and their Deployment on Global Grids

Grid Applications and Parametric Computing Bioinformatics: Drug Design / Protein Modelling Natural Language Engineering Ecological Modelling: Control Strategies for Cattle Tick Sensitivityexperiments on smog formation Data Mining Electronic CAD: Field Programmable Gate Arrays High Energy Physics: Searching for Rare Events Computer Graphics: Ray Tracing Finance: Investment Risk Analysis VLSI Design: SPICE Simulations Civil Engineering: Building Design Network Simulation Automobile: Crash Simulation Aerospace: Wing Design astrophysics

How to Construct and Deploy Applications on Global Grids ? • Three Options/Solutions: • Manual Scheduling - Use pure Globus commands • Application Level Scheduling - Build your own Distributed App & Scheduler • Application Independent Scheduling – Grid Brokers • Decouple App Construction from Scheduling • Perform parameter sweep (bag of tasks) (utilising distributed resources) within “T” hours or early and cost not exceeding $M.

Using Pure Globus commands Do all yourself! (manually) Total Cost:$???

Build Distributed Application & Application-Level Scheduler Build App and scheduler case by case basis E.g., MPI Approach Total Cost:$???

Compose and Deploy using Brokers – Nimrod-G and Gridbus Approach • Compose Apps and Submit to the Broker • Define QoS requirements • Aggregate View Compose, Submit & Play!

The Nimrod-G Grid Resource Broker and Economy-based Grid Scheduling [Buyya, Abramson, Giddy, 1999-2001] Deadline and Budget Constrained Algorithms for Scheduling Applications on “Computational” Grids

Nimrod-G : A Grid Resource Broker • A resource broker (implemented using Python) for managing, steering, and executing task farming (parameter sweep) applications on global Grids. • It allows dynamic leasing of resources at runtime based on their quality, cost, and availability, and users’ QoS requirements (deadline, budget, etc.) • Key Features • A declarative parameter programming language • A single window to manage & control experiment • Persistent and Programmable Task Farming Engine • Resource Discovery • Resource Trading • (User-Level) Scheduling & Predications • Generic Dispatcher & Grid Agents • Transportation of data & results • Steering & data management • Accounting

A Glance at Nimrod-G Broker Nimrod/G Client Nimrod/G Client Nimrod/G Client Nimrod/G Engine Schedule Advisor Trading Manager Grid Store Grid Dispatcher Grid Explorer Grid Middleware Globus, Legion, Condor, etc. TM TS GE GIS Grid Information Server(s) RM & TS RM & TS RM & TS G $ C $ L $ G Legion enabled node. Globus enabled node. L G C L RM: Local Resource Manager, TS: Trade Server Condor enabled node. See HPCAsia 2000 paper!

Nimrod/G Grid Broker Architecture Legacy Applications Nimrod-G Clients Customised Apps (Active Sheet) Monitoring and Steering Portals P-Tools (GUI/Scripting) (parameter_modeling) Farming Engine Meta-Scheduler Algorithm1 Programmable Entities Management Schedule Advisor . . . Resources Jobs Tasks Channels AlgorithmN Nimrod-G Broker IP hourglass! AgentScheduler Agents JobServer Grid Explorer Trading Manager Database Dispatcher & Actuators . . . Globus-A Legion-A Condor-A P2P-A . . . Condor GMD Globus Legion P2P GTS G-Bank Middleware . . . Computers Local Schedulers Storage Networks Instruments Fabric . . . PC/WS/Clusters Condor/LL/NQS Database Radio Telescope

Cost A Nimrod/G Monitor Deadline Legion hosts Globus Hosts Bezek is in both Globus and Legion Domains

User Requirements: Deadline/Budget

Grid Info Server Nimrod-G Grid Broker Task Farming Engine Grid Scheduler Grid Trade Server Do this in 30 min. for $10? Grid Tools And Applications Nimrod Agent User Process Local Resource Manager ProcessServer Grid Dispatcher File Server File access Nimrod/G Interactions Grid Node Compute Node User Node

Adaptive Scheduling Steps Discover More Resources Discover Resources Establish Rates Compose & Schedule Evaluate & Reschedule Meet requirements ? Remaining Jobs, Deadline, & Budget ? Distribute Jobs

Deadline and Budget Constrained Scheduling Algorithms

Deadline and Budget-based Cost Minimization Scheduling • Sort resources by increasing cost. • For each resource in order, assign as many jobs as possible to the resource, without exceeding the deadline. • Repeat all steps until all jobs are processed.

Scheduling Algorithms and Experiments

WW Grid WW Grid World Wide Grid (WWG) Australia North America ANL: SGI/Sun/SP2 USC-ISI: SGI UVa: Linux Cluster UD: Linux cluster UTK: Linux cluster UCSD: Linux PCs BU: SGI IRIX Melbourne U. : Cluster VPAC: Alpha Nimrod-G+Gridbus Globus+Legion GRACE_TS Globus/Legion GRACE_TS Solaris WS Internet Europe Asia ZIB: T3E/Onyx AEI: Onyx Paderborn: HPCLine Lecce: Compaq SC CNR: Cluster Calabria: Cluster CERN: Cluster CUNI/CZ: Onyx Pozman: SGI/SP2 Vrije U: Cluster Cardiff: Sun E6500 Portsmouth: Linux PC Manchester: O3K Tokyo I-Tech.: Ultra WS AIST, Japan: Solaris Cluster Kasetsart, Thai: Cluster NUS, Singapore: O2K Globus + GRACE_TS Chile: Cluster Globus + GRACE_TS Globus + GRACE_TS South America

Application Composition Using Nimrod Parameter Specification Language #Parameters Declaration parameter X integer range from 1 to 165 step 1; parameter Y integer default 5; #Task Definition task main #Copy necessary executables depending on node type copy calc.$OS node:calc #Execute program with parameter values on remote node node:execute ./calc $X $Y #Copy results file to use home node with jobname as extension copy node:output ./output.$jobname endtask • calc 1 5  output.j1 • calc 2 5  output.j2 • calc 3 5  output.j3 • … • calc 165 5  output.j165

Experiment Setup • Workload: • 165 jobs, each need 5 minute of CPU time • Deadline: 2 hrs. and budget: 396000 G$ • Strategies: 1. Minimise cost 2. Minimise time • Execution: • Optimise Cost: 115200 (G$) (finished in 2hrs.) • Optimise Time: 237000 (G$) (finished in 1.25 hr.) • In this experiment: Time-optimised scheduling run costs double that of Cost-optimised. • Users can now trade-off between Time Vs. Cost.

Resources Selected & Price/CPU-sec.

Deadline and Budget Constraint (DBC) Time Minimization Scheduling • For each resource, calculate the next completion time for an assigned job, taking into account previously assigned jobs. • Sort resources by next completion time. • Assign one job to the first resource for which the cost per job is less than the remaining budget per job. • Repeat all steps until all jobs are processed. (This is performed periodically or at each scheduling-event.)

Resource Scheduling for DBC Time Optimization

Resource Scheduling for DBC Cost Optimization

Nimrod-G Summary • One of the “first” and most successful Grid Resource Brokers world-wide! • Project continues to be active and being used in many e-Science applications. • For recent developments, please see: • http://messagelab.monash.edu.au/Nimrod

Gridbus Broker “Distributed” Data-Intensive Application Scheduling

Gridbus Grid Service Broker (GSB) • A Java-based resource broker for Data Grids (Nimrod-G focused on Computational Grids). • It uses computational economy paradigm for optimal selection of computational and data services depending on their quality, cost, and availability, and users’ QoS requirements (deadline, budget, & T/C optimisation) • Key Features • A single window to manage & control experiment • Programmable Task Farming Engine • Resource Discovery and Resource Trading • Optimal Data Source Discovery • Scheduling & Predications • Generic Dispatcher & Grid Agents • Transportation of data & sharing of results • Accounting

workload Gridbus User Console/Portal/Application Interface App, T, $, Optimization Preference Gridbus Broker Gridbus Farming Engine Schedule Advisor Trading Manager RecordKeeper Grid Dispatcher Grid Explorer TM TS $ GE GIS, NWS Core Middleware Grid Info Server RM & TS G $ Data Catalog Data Node C $ U G Globus enabled node. L A Amazon EC2/S3 Cloud.

Home Node/Portal Gridbus Broker batch() -PBS -Condor -SGE -Aneka -XGrid fork() Data Catalog Globus Aneka Amazon EC2 SSH Job manager fork() AMI batch() fork() batch() -PBS -Condor -SGE -XGrid -PBS -Condor -SGE Gridbus agent Gridbus agent Gridbus Broker: Separating “applications” from “different” remote service access enablers and schedulers Application Development Interface Single-sign on security Alogorithm1 SchedulingInterfaces AlogorithmN Plugin Actuators Data Store Access Technology SRB Grid FTP

Gridbus Services for eScience applications • Application Development Environment: • XML-based language for composition of task farming (legacy) applications as parameter sweep applications. • Task Farming APIs for new applications. • Web APIs (e.g., Portlets) for Grid portal development. • Threads-based Programming Interface • Workflow interface and Gridbus-enabled workflow engine. • …Grid Superscalar– in cooperation with BSC/UPC • Resource Allocation and Scheduling • Dynamic discovery of optional computational and data nodes that meet user QoS requirements. • Hide Low-Level Grid Middleware interfaces • Globus (v2, v4), SRB, Aneka, Unicore, and ssh-based access to local/remote resources managed by XGrid, PBS, Condor, SGE.

Click Here for Demo Drug Design Made Easy!

Case Study: High Energy Physics and Data Grid • The Belle Experiment • KEK B-Factory, Japan • Investigating fundamental violation of symmetry in nature (Charge Parity) which may help explain “why do we have more antimatter in the universe OR imbalance of matter and antimatter in the universe?”. • Collaboration 1000 people, 50 institutes • 100’s TB data currently

Case Study: Event Simulation and Analysis B0->D*+D*-Ks • Simulation and Analysis Package - Belle Analysis Software Framework (BASF) • Experiment in 2 parts – Generation of Simulated Data and Analysis of the distributed data Analyzed 100 data files (30MB each) that were distributed among the five nodes within Australian Belle DataGrid platform.

Australian Belle Data Grid Testbed VPACMelbourne

Belle Data Grid (GSP CPU Service Price: G$/sec) G$4 NA G$4 G$6 VPACMelbourne G$2 Datanode

Belle Data Grid (Bandwidth Price: G$/MB) 32 33 36 G$4 31 30 34 NA 38 31 G$4 G$6 VPACMelbourne G$2 Datanode

Deploying Application Scenario • A data grid scenario with 100 jobs and each accessing remote data of ~30MB • Deadline: 3hrs. • Budget: G$ 60K • Scheduling Optimisation Scenario: • Minimise Time • Minimise Cost • Results:

fleagle.ph.unimelb.edu.au belle.anu.edu.au belle.physics.usyd.edu.au brecca-2.vpac.org 80 70 60 50 Number of jobs completed 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 Time (in mins.) Time Minimization in Data Grids

fleagle.ph.unimelb.edu.au belle.anu.edu.au belle.physics.usyd.edu.au brecca-2.vpac.org 100 90 80 70 60 50 Number of jobs completed 40 30 20 10 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 Time(in mins.) Results : Cost Minimization in Data Grids

Observation

Grid Resource Brokering and Cost-based Scheduling With Nimrod-G and Gridbus Case Studies

Grid Resource Brokering and Cost-based Scheduling With Nimrod-G and Gridbus Case Studies

Presentation Transcript

Grid Systems and scheduling

Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study

Case Studies of Resource Management

THE RESOURCE BOOK Methodology and Case Studies

Teaching and Learning with Case Studies

Episode-Based Cost and Resource Use Measurement

Definition of Grid Resource Scheduling

Superscheduling and Resource Brokering

Resource Brokering and Modeling

Cost-based scheduling algorithm for workflow-based application in optical grid

Agent Teams in Grid Resource Brokering and Management (preliminary considerations)

Resource Brokering on Complex Grids EUROGRID and GRIP

Computational Grids and Computational Economy: Nimrod/G Approach

Job Submission and Resource Brokering WP 1

Nimrod and Abraham

GRID Resource Scheduling Subsidiarity and Networking Jon Crowcroft Jon.Crowcroft@clm.ac.uk

Case Studies and Analysis with MATSim

THE RESOURCE BOOK Methodology and Case Studies

Grid Scheduling and Multithreading

Job Submission and Resource Brokering WP 1

Grid Systems and scheduling