1 / 27

Resource Management

Resource Management. Reading: “A Resource Management Architecture for Metacomputing Systems”. What is Resource Management?. Mechanisms for locating and allocating computational resources Authentication Process creation Remote job submission Scheduling

allie
Download Presentation

Resource Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

  2. What is Resource Management? • Mechanisms for locating and allocating computational resources • Authentication • Process creation • Remote job submission • Scheduling • Other resources that can be managed: • Memory • Disk • Networks

  3. Resource Management Issues for Grid Computing • Site autonomy • Resources owned by different organizations, in different administrative domains • Local policies for use, scheduling, security • Heterogeneous substrate • Different local resource management systems • Policy extensibility • Local sites need ability to customize their resource management policies

  4. More Issues for Grid Computing • Co-allocation • May need resources at several sites • Mechanism for allocating multiple resources, initiating computation, monitoring and managing • On-line control • Adapt application requirements to resource availability

  5. Specifying Resource and Job Requirements • Resource requirements: • Machine type • Number of nodes • Memory • Network • Job or scheduler parameters: • Directory • Executable • Arguments • Environment • Maximum time required

  6. Resource and Job Specification • Globus: Resource Specification Language (RSL) • &(executable=myprog) (|(&(count=5)(memory>=64)) (&(count=10)(memory>=32))) • Condor: Classified ads • Resource owners advertise abilities and constraints • Applications advertise resource requests • Matchmaking: match offers & requests

  7. Components of Globus Resource Management Architecture • Resource specification using RSL • Resource brokers: translate resource requirements into specifications • Co-allocators: break down requests for multiple sites • Local resource managers: apply local, site-specific resource management policies • Information about available compute resources and their characteristics

  8. Resource Specification Language • Common notation for exchange of information between components • API provided for manipulating RSL

  9. RSL Syntax • Elementary form: parenthesis clauses • (attribute op value [ value … ] ) • Operators Supported: • <, <=, =, >=, > , != • Some supported attributes: • executable, arguments, environment, stdin, stdout, stderr, resourceManagerContact,resourceManagerName • Unknown attributes are passed through • May be handled by subsequent tools

  10. Constraints: “&” • For example: & (count>=5) (count<=10) (max_time=240) (memory>=64) (executable=myprog) • “Create 5-10 instances of myprog, each on a machine with at least 64 MB memory that is available to me for 4 hours”

  11. Multirequest: “+” • A multirequest allows us to specify multiple resource needs, for example + (& (count=5)(memory>=64) (executable=p1)) (&(network=atm) (executable=p2)) • Execute 5 instances of p1 on a machine with at least 64M of memory • Execute p2 on a machine with an ATM connection • Multirequests are central to co-allocation

  12. Resource Broker • Takes high-level RSL specification • Transforms into concrete specifications through “specialization” process • Locate resources that meet requirements • Multiple brokers may service single request • Application-specific brokers translate application requirements • Output: complete specification of locations of resources; given to co-allocator

  13. Examples of Resource Brokers • Nimrod-G • Automates creation and management of large parametric experiments • Run application under wide range of input conditions and aggregate results • Queries MDS to find resources • Generates number of independent jobs • GRAM allocates jobs to computational nodes • Higher-level broker: allows user to specify time and cost constraints

  14. Examples of Resource Brokers • AppLeS • Application Level Scheduler • Map large number of independent tasks to dynamically varying pool of available computers • Use GRAM to locate resources and initiate and manage computation

  15. Resource co-allocators • May request resources at multiple sites • Two or more computers and networks • Break multi-request into components • Pass each component to resource manager • Provide means for monitoring job status or terminating job • Complex: • Two or more resource managers • Global state like availability of resources difficult to determine

  16. Different co-allocation services • Require all resources to be available before job proceeds; fail globally if failure occurs at any resource • Allocate at least N out of M resources and return • Return immediately, but gradually return more resources as they become available • Each useful for some class of applications

  17. Concurrent Allocation • If advance reservations are available: • Obtain list of available time slots from each participating resource manager and choose timeslot • Without reservations: • Optimistically allocate resources • Hope desired set will be available at future time • Use information service (MDS) to determine current availability of resources • Construct RSL request that is likely to succeed • If allocation fails, all started jobs must be terminated

  18. Disadvantages of Concurrent Allocation Scheme • Computational resources wasted while waiting for all requested resources to become available • Application must be altered to perform barrier to synchronize startup across components • Detecting failure of a resource is difficult, e.g. in queue-based local resource managers

  19. Local Resource Managers • Implemented with Globus Resource Allocation Manager (GRAM) • Processing RSL specifications representing resource requests • Deny request • Create one or more processes (jobs) that satisfy request • Enable remote monitoring and management of jobs • Periodically update MDS information service with current availability and capabilities of resources

  20. GRAM (cont.) • Interface between grid environment and entity that can create processes • E.g., Parallel scheduler or Condor pool • GRAM may schedule resource itself • More commonly, maps resource specification into a request to a local resource allocation mechanism • E.g., Condor, LoadLeveler, LSF • Co-exists with local mechanisms

  21. GRAM (cont.) • GRAM API has functions for: • Submitting a job request: produces globally unique job handle • Canceling a job request • Asking when job request is expected to run • Upon submission, can request that progress be signaled asynchronously to callback URL

  22. GRAM Scheduling Model • Jobs are either: • Pending: resources have not yet been allocated to the job • Active: resources allocated, job running • Done: when all processes have terminated and resources have been deallocated • Failed: job terminates due to : • explicit termination • error in request format • failure in resource management system • denial of access to resource

  23. GRAM Components • Gatekeeper Responds to a request: • Performs mutual authentication of user and resource • Determines local user name for remote user • Starts a job manager that executes as local user and handles request

  24. GRAM Components (cont.) • Job manager • Creates processes requested by user • Submits resource allocation requests to underlying resource management system (or does fork) • Monitors state of created processes • Notifies callback contact of state transitions • Implements control operations like termination

  25. GRAM Components (cont.) • GRAM reporter Responsible for storing into MDS (information service) info about: • Scheduler structure • Support reservations? • Number of queues • Scheduler state • Currently active jobs • Expected wait time in queue • Total number of nodes and available nodes

  26. Broker Co-allocator Resource Management Architecture RSL specialization RSL Application Information Service Queries & Info Ground RSL Simple ground RSL Local resource managers GRAM GRAM GRAM LSF EASY-LL NQE

  27. Job Submission Interfaces • Globus Toolkit includes several command line programs for job submission • globus-job-run: Interactive jobs • globus-job-submit: Batch/offline jobs • globusrun: Flexible scripting infrastructure

More Related