Condor g
This presentation is the property of its rightful owner.
Sponsored Links
1 / 24

Condor- G PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

FATIH UNIVERSITY Computer Engineering. Condor- G. A Computation Management Agent for Multi-Institutional Grids. Helton MALAMBANE. Outline. INTRODUCTION Large -scale sharing of computational resources Grid Protocols Computation management (Condor-G Core) GlideIn mechanism .

Download Presentation

Condor- G

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Condor g


Computer Engineering


A Computation Management Agent for Multi-Institutional Grids





  • Large-scale sharing of computational resources

  • GridProtocols

  • Computation management (Condor-G Core)

  • GlideInmechanism

1 introduction grid user requirements

1. INTRODUCTIONGrid user requirements

  • They want to be able to discover, acquire, and reliably manage computational resources dynamically, in the course of their everyday activities

  • They do not want to be bothered with the location of these resources, the mechanisms that are required to use them, with keeping track of the status of computational tasks operating on these resources, or with reacting to failure

  • They do care about how long their tasks are likely to run and how much these tasks will cost

Solution the condor g

Solution: The Condor-G

Leverages software from Globus and Condor.

“allows the user to control multi-domain resources as if they all belong to one personal domain “

Globus Toolkit : inter-domainresource management protocols.

Condor: intra-domain resource management methods.

2 large scale sharing of computational resources

2. Large-scale sharing of computational resources

How to build and manage a multi-site computation that uses resources that belong to different sites?


  • Different sites may feature different authentication and authorization mechanisms, schedulers, hardware architectures, operating systems, file systems, etc.

  • The user has little knowledge of the characteristics of resources at remote sites, and no easy means of obtaining this information

  • Due to the distributed nature of the multi-site computing environment, computers, networks, and subcomputations can fail in various ways.

  • Keeping track of the status of different elements of a computation involves tedious bookkeeping, especially in the event of failure and dependencies among subcomputations.

2 large scale sharing of computational resources1

2. Large-scale sharing of computational resources

How to build and manage a multi-site computation that uses resources that belong to different sites?


  • Remote resource access issues are addressed by requiring that remote resources speak standard protocols for resource discovery and management.

  • Computation management issues are addressed via the introduction of a robust, multi-functional user computation management agent responsible for resource discovery, job submission, job management, and error recovery. From Condor

  • Remote execution environment issues are addressed via the use of mobile sandboxing technology that allows a user to create a tailored execution environment on a remote node.

3 grid p rotocols o utline

3. GridProtocols - Outline

Protocolsused in the Condor-G system:

3.1. GSI (Grid Security Infrastructure)

3.2. GRAM (Grid Resource Allocation and Management)

3.3. MDS-2 (Monitor and Discovery System)

3.4. GASS (Global Access to Secondary Storage)

3 1 gis

3.1. GIS

The Globus Toolkit’s Grid Security Infrastructure

  • Makes it possible to authenticate a user just once.

  • Uses Public Key Infrastructure (PKI)

  • GSI employs the user’s private key to create a proxy credential, which serves as a new private-public key pair that allows a proxy (such as the Condor-G agent) to make remote requests on behalf of the user

3 2 gram protocol

3.2. GRAM protocol

The Grid Resource Allocation and Management

  • The Grid Resource Allocation and Management protocol supports remote submission monitoring and control of a computational request to a remote computational resource. Eg: “run program P”.

  • Uses GSI for authentication/authorization.

  • Two-phase commit (using requests sequences and commit command) .

  • Logs details of all active jobs (useful for crash recovery).

3 3 mds protocols

3.3. MDS protocols

Monitor and Discovery System

  • Allows discovering and disseminating information about the structure and state of Grid resources.

  • Uses GSI for access control.

    The idea:

    • A resource uses the Grid Resource Registration Protocol (GRRP) to notify other entities that it is part of the Grid.

    • Those entities can then use the Grid Resource Information Protocol (GRIP) to obtain information about resource status

3 4 gass service

3.4. GASS service

The Globus Toolkit’s Global Access to Secondary Storage

  • Provides mechanisms for transferring data between a remote HTTP, FTP, or GASS server

  • In the current context, we use these mechanisms to stage executables and input files to a remote computer

  • GSI mechanisms are used for authentication

4 computation management

4. Computation management

The Condor-G agent:

  • 4.1. User interface

  • 4.2. Supporting remote execution

  • 4.3. Credential management

  • 4.4. Resource discovery and scheduling

4 1 user interface

4.1. User interface

  • The Condor-G agent allows the user to treat the Grid as an entirely local resource, with an API and command line tools that allow the user to perform the following job management operations:

  • Submit jobs, indicating an executable name, input/output files and arguments;

  • Query a job’s status, or cancel the job;

  • Be informed of job termination or problems, via callbacks or asynchronous mechanisms such as email;

  • Obtain access to detailed logs, providing a complete history of their jobs’ execution.

4 1 user interface1

4.1. User interface

  • The innovation in Condor-G is that these capabilities are provided by a personal desktop gent and supported in a Grid environment, while guaranteeing fault tolerance and exactly-once execution semantics.

  • providing the user with a familiar and reliable single access point to all the resources he/she is authorized to use.

4 2 supporting remote execution job submission process

4.2. Supporting remote executionJob Submission Process

  • User indicates jobs to the scheduler.

  • Schedulercreates a GridManagerdaemon.

  • For each job the GriManagercreates a JobManagerusingtwo-phasecommit GRAM.

  • GASS is usedto transfer job executables, input files andtoprovide output.

  • JobManagersubmits the jobs to the localscheduling system.

4 2 supporting remote execution crash tolerance

4.2. Supporting remote executionCrash Tolerance

Condor-G is built to tolerate four types of failure:

1. Crash of the Globus JobManager:

  • The GridManager then probes the GateKeeper.

  • If Gatekeeper respondsthen a new JobManager is started.

4 2 supporting remote execution crash tolerance1

4.2. Supporting remote executionCrash Tolerance

Condor-G is built to tolerate four types of failure:

2 & 3. Resource Management Machine Or Network Failure:

  • The GridManager waits until connection is re-established.

  • Thenreconnectsto the jobManager.

4 2 supporting remote execution crash tolerance2

4.2. Supporting remote executionCrash Tolerance

Condor-G is built to tolerate four types of failure:

4. Job Submission Machine:

  • The GridManager gives the jobManager its New IP and PORT.

4 3 credential management

4.3. Credential Management

  • GSI proxy credentialis usedtoauthenticatewithresorces.

  • Because Proxy credentialsexpire the agent periodically checks user creentials.

  • Whencredentialsexpire the jobs are put on holdand the user is notified.

  • Problem: long taskswillrequire frequent proxy updates.

4 3 credential management1

4.3. Credential Management


MyProxySystem (Long-lived proxy credentials)

Remote services acting on behalf of the user can then obtain short-lived proxies (e.g. 12 hours) from the server.

4 4 resource discovery and scheduling

4.4. Resource discovery and scheduling

  • The Simple Approach:

    • a user-supplied list of GRAM servers.

  • The resource broker:

    • gathers information about available GRAM servers using the Monitor and Discovery System (MDS).

    • User Canthenchoosefrom the list of available servers.

      For the case of high throughput computations“flooding” is applied.

5 glidein mechanism

5. GlideIn mechanism

What happens when a job executes on a remote platform where required files are not available and local policy may not permit access to local file systems?



5 glidein mechanism1

5. GlideIn mechanism

The Idea:

  • Starts a daemon on the remote computer thatlearnsabout the availablesettingsand resources.

  • Runs eachuser task in a “sandbox”: where system calls are redirectedto the local system.



Computer Engineering




  • Login