1 / 52

Grid Resource Brokering and Cost-based Scheduling With Nimrod-G and Gridbus Case Studies

Grid Resource Brokering and Cost-based Scheduling With Nimrod-G and Gridbus Case Studies. Clou d Computing and D istributed S ystems (CLOUDS) Lab . The University of Melbourne Melbourne, Australia www.cloudbus.org. Rajkumar Buyya. Grid. Grid Economy. Scheduling. Economics. Agenda.

oliverb
Download Presentation

Grid Resource Brokering and Cost-based Scheduling With Nimrod-G and Gridbus Case Studies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grid Resource Brokering and Cost-based Scheduling With Nimrod-G and Gridbus Case Studies Cloud Computing and Distributed Systems (CLOUDS) Lab. The University of MelbourneMelbourne, Australiawww.cloudbus.org Rajkumar Buyya

  2. Grid Grid Economy Scheduling Economics Agenda • Introduction to Grid Scheduling • Application Models and Deployment Approaches • Economy-based “Computational” Grid Scheduling • Nimrod-G -- Grid Resource Broker • Scheduling Algorithms and Experiments on World Wide Grid testbed • Economy-based “Data Intensive” Grid Scheduling • Gridbus -- Grid Service Broker • Scheduling Algorithms and Experiments on Australian Belle Data Grid testbed

  3. Grid Scheduling: Introduction

  4. 2100 2100 2100 2100 2100 2100 2100 2100 Grid Resources and Scheduling User Application Grid Resource Broker Grid Information Service Local Resource Manager Local Resource Manager Local Resource Manager Single CPU (Time Shared Allocation) SMP (Time Shared Allocation) Clusters (Space Shared Allocation)

  5. Grid Scheduling • Grid scheduling: • Resources distributed over multiple administrative domains • Selecting 1 or more suitable resources (may involve co-scheduling) • Assign tasks to selected resources and monitoring execution. • Grid schedulers are Global Schedulers • They have no ownership or control over resources • Jobs are submitted to local resource managers (LRMs) as user • LRMs take care of actual execution of jobs

  6. Example Grid Schedulers • Nimrod-G - Monash University • Computational Grid & Economic-based • Condor-G – University of Wisconsin • Computational Grid & System-centric • AppLeS–University of California@San Diego • Computational Grid & System centric • Gridbus Broker – University of Melbourne • Data Grid & Economic based

  7. Phase I-Resource Discovery 1. Authorization Filtering Phase III- Job Execution 2. Application Definition 6. Advance Reservation 3. Min. Requirement Filtering 7. Job Submission 8. Preparation Tasks 9. Monitoring Progress Phase II - Resource Selection 10 Job Completion 4. Information Gathering 11. Clean-up Tasks 5. System Selection Key Steps in Grid Scheduling Source: J. Schopf, Ten Actions When SuperScheduling, OGF Document, 2003.

  8. Movement of Jobs: Between the Scheduler and a Resource • Push Model • Manager pushes jobs from Queue to a resource. • Used in Clusters, Grids • Pull Model • P2P Agent request for a job for processing from job-pool • Commonly used in P2P systems such as Alchemi and SETI@Home • Hybrid Model (both push and pull) • Broker deploys an agent on resources, which pulls jobs from a resource. • May use in Grid (e.g., Nimrod-G system). • Broker also pulls data from user host or separate data host (distributed datasets) (e.g., Gridbus Broker).

  9. Example Systems

  10. Application Models and their Deployment on Global Grids

  11. Grid Applications and Parametric Computing Bioinformatics: Drug Design / Protein Modelling Natural Language Engineering Ecological Modelling: Control Strategies for Cattle Tick Sensitivityexperiments on smog formation Data Mining Electronic CAD: Field Programmable Gate Arrays High Energy Physics: Searching for Rare Events Computer Graphics: Ray Tracing Finance: Investment Risk Analysis VLSI Design: SPICE Simulations Civil Engineering: Building Design Network Simulation Automobile: Crash Simulation Aerospace: Wing Design astrophysics

  12. How to Construct and Deploy Applications on Global Grids ? • Three Options/Solutions: • Manual Scheduling - Use pure Globus commands • Application Level Scheduling - Build your own Distributed App & Scheduler • Application Independent Scheduling – Grid Brokers • Decouple App Construction from Scheduling • Perform parameter sweep (bag of tasks) (utilising distributed resources) within “T” hours or early and cost not exceeding $M.

  13. Using Pure Globus commands Do all yourself! (manually) Total Cost:$???

  14. Build Distributed Application & Application-Level Scheduler Build App and scheduler case by case basis E.g., MPI Approach Total Cost:$???

  15. Compose and Deploy using Brokers – Nimrod-G and Gridbus Approach • Compose Apps and Submit to the Broker • Define QoS requirements • Aggregate View Compose, Submit & Play!

  16. The Nimrod-G Grid Resource Broker and Economy-based Grid Scheduling [Buyya, Abramson, Giddy, 1999-2001] Deadline and Budget Constrained Algorithms for Scheduling Applications on “Computational” Grids

  17. Nimrod-G : A Grid Resource Broker • A resource broker (implemented using Python) for managing, steering, and executing task farming (parameter sweep) applications on global Grids. • It allows dynamic leasing of resources at runtime based on their quality, cost, and availability, and users’ QoS requirements (deadline, budget, etc.) • Key Features • A declarative parameter programming language • A single window to manage & control experiment • Persistent and Programmable Task Farming Engine • Resource Discovery • Resource Trading • (User-Level) Scheduling & Predications • Generic Dispatcher & Grid Agents • Transportation of data & results • Steering & data management • Accounting

  18. A Glance at Nimrod-G Broker Nimrod/G Client Nimrod/G Client Nimrod/G Client Nimrod/G Engine Schedule Advisor Trading Manager Grid Store Grid Dispatcher Grid Explorer Grid Middleware Globus, Legion, Condor, etc. TM TS GE GIS Grid Information Server(s) RM & TS RM & TS RM & TS G $ C $ L $ G Legion enabled node. Globus enabled node. L G C L RM: Local Resource Manager, TS: Trade Server Condor enabled node. See HPCAsia 2000 paper!

  19. Nimrod/G Grid Broker Architecture Legacy Applications Nimrod-G Clients Customised Apps (Active Sheet) Monitoring and Steering Portals P-Tools (GUI/Scripting) (parameter_modeling) Farming Engine Meta-Scheduler Algorithm1 Programmable Entities Management Schedule Advisor . . . Resources Jobs Tasks Channels AlgorithmN Nimrod-G Broker IP hourglass! AgentScheduler Agents JobServer Grid Explorer Trading Manager Database Dispatcher & Actuators . . . Globus-A Legion-A Condor-A P2P-A . . . Condor GMD Globus Legion P2P GTS G-Bank Middleware . . . Computers Local Schedulers Storage Networks Instruments Fabric . . . PC/WS/Clusters Condor/LL/NQS Database Radio Telescope

  20. Cost A Nimrod/G Monitor Deadline Legion hosts Globus Hosts Bezek is in both Globus and Legion Domains

  21. User Requirements: Deadline/Budget

  22. Grid Info Server Nimrod-G Grid Broker Task Farming Engine Grid Scheduler Grid Trade Server Do this in 30 min. for $10? Grid Tools And Applications Nimrod Agent User Process Local Resource Manager ProcessServer Grid Dispatcher File Server File access Nimrod/G Interactions Grid Node Compute Node User Node

  23. Adaptive Scheduling Steps Discover More Resources Discover Resources Establish Rates Compose & Schedule Evaluate & Reschedule Meet requirements ? Remaining Jobs, Deadline, & Budget ? Distribute Jobs

  24. Deadline and Budget Constrained Scheduling Algorithms

  25. Deadline and Budget-based Cost Minimization Scheduling • Sort resources by increasing cost. • For each resource in order, assign as many jobs as possible to the resource, without exceeding the deadline. • Repeat all steps until all jobs are processed.

  26. Scheduling Algorithms and Experiments

  27. WW Grid WW Grid World Wide Grid (WWG) Australia North America ANL: SGI/Sun/SP2 USC-ISI: SGI UVa: Linux Cluster UD: Linux cluster UTK: Linux cluster UCSD: Linux PCs BU: SGI IRIX Melbourne U. : Cluster VPAC: Alpha Nimrod-G+Gridbus Globus+Legion GRACE_TS Globus/Legion GRACE_TS Solaris WS Internet Europe Asia ZIB: T3E/Onyx AEI: Onyx Paderborn: HPCLine Lecce: Compaq SC CNR: Cluster Calabria: Cluster CERN: Cluster CUNI/CZ: Onyx Pozman: SGI/SP2 Vrije U: Cluster Cardiff: Sun E6500 Portsmouth: Linux PC Manchester: O3K Tokyo I-Tech.: Ultra WS AIST, Japan: Solaris Cluster Kasetsart, Thai: Cluster NUS, Singapore: O2K Globus + GRACE_TS Chile: Cluster Globus + GRACE_TS Globus + GRACE_TS South America

  28. Application Composition Using Nimrod Parameter Specification Language #Parameters Declaration parameter X integer range from 1 to 165 step 1; parameter Y integer default 5; #Task Definition task main #Copy necessary executables depending on node type copy calc.$OS node:calc #Execute program with parameter values on remote node node:execute ./calc $X $Y #Copy results file to use home node with jobname as extension copy node:output ./output.$jobname endtask • calc 1 5  output.j1 • calc 2 5  output.j2 • calc 3 5  output.j3 • … • calc 165 5  output.j165

  29. Experiment Setup • Workload: • 165 jobs, each need 5 minute of CPU time • Deadline: 2 hrs. and budget: 396000 G$ • Strategies: 1. Minimise cost 2. Minimise time • Execution: • Optimise Cost: 115200 (G$) (finished in 2hrs.) • Optimise Time: 237000 (G$) (finished in 1.25 hr.) • In this experiment: Time-optimised scheduling run costs double that of Cost-optimised. • Users can now trade-off between Time Vs. Cost.

  30. Resources Selected & Price/CPU-sec.

  31. Deadline and Budget Constraint (DBC) Time Minimization Scheduling • For each resource, calculate the next completion time for an assigned job, taking into account previously assigned jobs. • Sort resources by next completion time. • Assign one job to the first resource for which the cost per job is less than the remaining budget per job. • Repeat all steps until all jobs are processed. (This is performed periodically or at each scheduling-event.)

  32. Resource Scheduling for DBC Time Optimization

  33. Resource Scheduling for DBC Cost Optimization

  34. Nimrod-G Summary • One of the “first” and most successful Grid Resource Brokers world-wide! • Project continues to be active and being used in many e-Science applications. • For recent developments, please see: • http://messagelab.monash.edu.au/Nimrod

  35. Gridbus Broker “Distributed” Data-Intensive Application Scheduling

  36. Gridbus Grid Service Broker (GSB) • A Java-based resource broker for Data Grids (Nimrod-G focused on Computational Grids). • It uses computational economy paradigm for optimal selection of computational and data services depending on their quality, cost, and availability, and users’ QoS requirements (deadline, budget, & T/C optimisation) • Key Features • A single window to manage & control experiment • Programmable Task Farming Engine • Resource Discovery and Resource Trading • Optimal Data Source Discovery • Scheduling & Predications • Generic Dispatcher & Grid Agents • Transportation of data & sharing of results • Accounting

  37. workload Gridbus User Console/Portal/Application Interface App, T, $, Optimization Preference Gridbus Broker Gridbus Farming Engine Schedule Advisor Trading Manager RecordKeeper Grid Dispatcher Grid Explorer TM TS $ GE GIS, NWS Core Middleware Grid Info Server RM & TS G $ Data Catalog Data Node C $ U G Globus enabled node. L A Amazon EC2/S3 Cloud.

  38. Home Node/Portal Gridbus Broker batch() -PBS -Condor -SGE -Aneka -XGrid fork() Data Catalog Globus Aneka Amazon EC2 SSH Job manager fork() AMI batch() fork() batch() -PBS -Condor -SGE -XGrid -PBS -Condor -SGE Gridbus agent Gridbus agent Gridbus Broker: Separating “applications” from “different” remote service access enablers and schedulers Application Development Interface Single-sign on security Alogorithm1 SchedulingInterfaces AlogorithmN Plugin Actuators Data Store Access Technology SRB Grid FTP

  39. Gridbus Services for eScience applications • Application Development Environment: • XML-based language for composition of task farming (legacy) applications as parameter sweep applications. • Task Farming APIs for new applications. • Web APIs (e.g., Portlets) for Grid portal development. • Threads-based Programming Interface • Workflow interface and Gridbus-enabled workflow engine. • …Grid Superscalar– in cooperation with BSC/UPC • Resource Allocation and Scheduling • Dynamic discovery of optional computational and data nodes that meet user QoS requirements. • Hide Low-Level Grid Middleware interfaces • Globus (v2, v4), SRB, Aneka, Unicore, and ssh-based access to local/remote resources managed by XGrid, PBS, Condor, SGE.

  40. Click Here for Demo Drug Design Made Easy!

  41. s

  42. Case Study: High Energy Physics and Data Grid • The Belle Experiment • KEK B-Factory, Japan • Investigating fundamental violation of symmetry in nature (Charge Parity) which may help explain “why do we have more antimatter in the universe OR imbalance of matter and antimatter in the universe?”. • Collaboration 1000 people, 50 institutes • 100’s TB data currently

  43. Case Study: Event Simulation and Analysis B0->D*+D*-Ks • Simulation and Analysis Package - Belle Analysis Software Framework (BASF) • Experiment in 2 parts – Generation of Simulated Data and Analysis of the distributed data Analyzed 100 data files (30MB each) that were distributed among the five nodes within Australian Belle DataGrid platform.

  44. Australian Belle Data Grid Testbed VPACMelbourne

  45. Belle Data Grid (GSP CPU Service Price: G$/sec) G$4 NA G$4 G$6 VPACMelbourne G$2 Datanode

  46. Belle Data Grid (Bandwidth Price: G$/MB) 32 33 36 G$4 31 30 34 NA 38 31 G$4 G$6 VPACMelbourne G$2 Datanode

  47. Deploying Application Scenario • A data grid scenario with 100 jobs and each accessing remote data of ~30MB • Deadline: 3hrs. • Budget: G$ 60K • Scheduling Optimisation Scenario: • Minimise Time • Minimise Cost • Results:

  48. fleagle.ph.unimelb.edu.au belle.anu.edu.au belle.physics.usyd.edu.au brecca-2.vpac.org 80 70 60 50 Number of jobs completed 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 Time (in mins.) Time Minimization in Data Grids

  49. fleagle.ph.unimelb.edu.au belle.anu.edu.au belle.physics.usyd.edu.au brecca-2.vpac.org 100 90 80 70 60 50 Number of jobs completed 40 30 20 10 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 Time(in mins.) Results : Cost Minimization in Data Grids

  50. Observation

More Related