Generalized Resource Management In Computational Grids

Generalized Resource Management In Computational Grids Carl Kesselman Information Sciences Institute University of Southern California http://www.globus.org/

Acknowledgements • Presentation is on work in progress • Joint work with Ian Foster • Contributions by: • Steve Tuecke, Alain Roy (ANL) • Soonwook Hwang, Bob Lindell (ISI) • Craig Lee, James Stepanek (The Aerospace Corporation)

Computational Grids • Assemble distributed resources ... • High-end computers • Information sources • Scientific instruments, etc. • …and apply to challenging problems • Smart instruments • Collaborative engineering • Data mining

Example: Real-time Microtomography: APS beamline @ Argonne Resource location “10 Gflop/sec, 20 Mb/sec, 10 minutes; rendering, 10 GB storage” Resource allocation Configuration Parallel computation Remote I/O

Resource Management in Grids • Resources include: • computers, networks, data, people.. • Problem: How do we manage heterogeneous collections of distributed high-performance resources • Locating resources, • Allocating resources, • Authentication and access control, • Activities to prepare a resource for use

Why it is Hard? • Site autonomy • No control over local administration • Heterogeneous substrate • Many different platforms • Policy extensibility • Application specific allocation requirements • Co-allocation • simultaneous access of resources • Online control • Must access resources from applications

Resource Allocation • Interact with local allocation systems • LoadLeveler, NQE, LSF, etc. • Coordinate allocation across multiple domains • Control resulting resource allocation • status, terminate, etc. • Must deal with un-availability of resource • no guarantees

Initial Approach • Local resource managers • Site autonomy, heterogeneous substrate • Resource specification language • Online control, policy extensibility • Resource brokers • Map high-level requests into local requests • Resource co-allocators • Co-allocation

Broker Co-allocator Resource Management Architecture RSL specialization RSL Application Information Service Queries & Info Ground RSL Simple ground RSL Local resource managers GRAM GRAM GRAM LSF EASY-LL NQE

Local Resource Management MDS client API calls to locate resources GRAM Client MDS Update MDS with resource state information GRAM client API calls to request resource allocation and process creation. Gram Reporter Site boundary Query current status of resource Gatekeeper Local Resource Manager Allocate & create processes Authentication Create Request Job Manager Process Globus Security Infrastructure Parse Process Monitor & control RSL Library Process

Limitations • Focus on resource allocation, not scheduling • necessary but not sufficient • Cumbersome support for introducing quality of service constraints • RSL extensions • Difficult to support advanced reservation • needed for effective co-allocation

Resource Scheduling • Traditionally allocations have been “best effort” • IP networks, time-sharing CPU schedulers, queueing systems • Not sufficient to support advanced applications • Advanced reservation essential for effective co-allocation • Integration of quality of service into resource management architecture • Quality of service concerns

Extending the Architecture • Support end-to-end management of networks, computers, memory, disks, etc. • Advance reservations, QoS, adaptation, etc. • Integrate diverse approaches • Current Globus CPU scheduling • Klara Nahrstedt’s work on CPU/ATM scheduling • Work on RSVP signaling (Qualis) • Differentiated services (Clipper) • Proposed new approach • Enhance Globus RM architecture

Generalized GRAMs + Reservations • Enhance scope of Globus RM elements • Use “GRAMs” to control network, memory, etc. • Brokers for networks, computers, etc. • Treat end-to-end management as co-allocation • Separate concepts of reservation and creation • Reservation as explicit abstraction in architecture • Reservation specifies when and how much • Reservation doesn't guarantee allocation success

Advanced Reservations • Required for resource allocation request • default “best-effort” reservation preserves current behavior • Specifies when and how-much • start-time, duration, both can be unknown • how-much initially focused on fractional resources such as CPU cycles or bandwidth • RSL used to express reservations • Reservation request produces handle which can be passed around and reused.

Allocation With Reservations • Allocation can be for range of resources • flow, thread, process, etc. • Reservation provided with allocation request • Reservation can be changed for established allocation • Object can be destroyed without destroying reservation • RSL used to specify objects

API Overview • create_reservation • Maps RSL to reservation handle • create_object • Maps RSL and reservation handle to process, flow, etc. • modify_reservation • alters reservation associated with an object • callback interface • monitors state of reservation and object.

Requirements candidate resources Reservation Co-allocator {ResHandles} Broker(s) Object creation Co-allocator {net a: 100 Mb/s, MPP 1: 40 nodes, net b: 30 Mb/s, CPU: 0.5 } 40 nodes Online monitor CPU 100 Mb/s 50 Mb/s 30 Mb/s 0.5 CPU Exclusive {ObjHandles} Modify Reservation Example: Online Data Analysis Information Service G MPP1 G G b G a G c G d MPP2 G

Object manager Object Object root API object creation file operations globus Generalized GRAM Architecture user MDS client API calls to locate resources GRAM Client MDS update MDS with resource state information GRAM client API calls request reservation and object creation Site boundary GRAM reporter create object manager Object GRAM read object records write Resv. GRAM object records create & monitor object delete object, modify reservation authenticate Auth/map server create reservation, delete reservation or register reservation, delete reservation either read reservation list Resource manager Resv. manager resv. list check policy policy manager write reservation list

Issues • Open versus closed systems • we may not control all access to resources • Limited support for advanced reservation on current platforms • may have to provide reservation support as part of system • Preemption and failure • notification mechanism needed • Reservation brokering and co-allocation techniques

Summary • Advanced reservation critical for computational grid applications • Existing Globus resource management architecture can be extended to include reservation • Power of brokers, RSL and GRAMS can be applied to reservations as well as allocations • addresses end-to-end problem • More detailed design in progress

Generalized Resource Management In Computational Grids

Generalized Resource Management In Computational Grids

Presentation Transcript

Workflow Management in Grids

Resource and Test Management in Grids

Decentralized Resource Management for Multi-core Desktop Grids

Resource and Test Management in Grids

Distributed Asymmetric Verification in Computational Grids

Computational grids and grids projects

Computational Steering on Grids

Computational Steering on Grids

An Economy Driven Resource Management Architecture for Global Computational Power Grids

Parallel Programming on Computational Grids

Computational Grids and Computational Economy: Nimrod/G Approach

Generalized Resource Scarcity

Resource Selection in Grids Using Contract Net

DRM/Computational Grids

A Framework for Trust Management System in Computational Grids

Resource and Test Management in Grids

Resource and Test Management in Grids

Parallel Programming on Computational Grids

Generalized Resource Scarcity

Grids and Computational Science