grids and condor barcelona 2006 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Grids and Condor Barcelona, 2006 PowerPoint Presentation
Download Presentation
Grids and Condor Barcelona, 2006

Loading in 2 Seconds...

play fullscreen
1 / 37

Grids and Condor Barcelona, 2006 - PowerPoint PPT Presentation


  • 127 Views
  • Uploaded on

Grids and Condor Barcelona, 2006. Agenda. Extended user’s tutorial Advanced Uses of Condor Java programs DAGMan Stork MW Grid Computing Case studies, and a discussion of your application‘s needs. Resources.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Grids and Condor Barcelona, 2006' - yardan


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
agenda
Agenda
  • Extended user’s tutorial
  • Advanced Uses of Condor

Java programs

DAGMan

Stork

MW

Grid Computing

  • Case studies, and a discussion of your application‘s needs
resources
Resources
  • There are many resources (machines) in the world, and many are or can be made available!
  • Groups of machines may be labeled as grids
  • Welcome to the power of the grid !
condor and grids
Condor and Grids
  • Condor has always been a tool to harness grid computing
  • Condor’s mechanisms have evolved as technologies have evolved. Roughly categorized:
    • Flocking
    • Glidein
    • The grid universe
flocking
Flocking
  • A way for jobs to run within a different, separate Condor pool
  • Condor runs here, and Condor runs there

there

here

connect condor pools with flocking
Connect Condor Poolswith Flocking
  • Flocking is a Condor-specific technology
  • Flocking is enabled with configuration
  • Jobs flock from here to there when they cannot be run here due to lack of available machines
configuration
Configuration
  • Configuration files contain lots of the administrative information used by Condor
  • Format is like that in submit description files:

AttributeName = Value

configuration here
Configuration here
  • For jobs to be able to flock from here to there
  • In the configuration file on the pool where jobs flock from:

FLOCK_TO = <central manager machine name>

FLOCK_COLLECTOR_HOSTS = $(FLOCK_TO)

FLOCK_NEGOTIATOR_HOSTS = $(FLOCK_TO)

HOSTALLOW_NEGOTIATOR_SCHEDD = $(COLLECTOR_HOST), $(FLOCK_NEGOTIATOR_HOSTS)

configuration there
Configuration there
  • In the configuration file on the pool where jobs flock to:

FLOCK_FROM = <submit machine name>, . . . , <submit machine name>

  • To make security work:

HOSTALLOW_WRITE_COLLECTOR = $(HOSTALLOW_WRITE), $(FLOCK_FROM)

HOSTALLOW_WRITE_STARTD = $(HOSTALLOW_WRITE), $(FLOCK_FROM)

HOSTALLOW_READ_COLLECTOR = $(HOSTALLOW_READ), $(FLOCK_FROM)

HOSTALLOW_READ_STARTD = $(HOSTALLOW_READ), $(FLOCK_FROM)

submit description file
Submit Description File

Enable file transfer:

universe = vanilla

executable = myjob.exe

input = myjob.input

output = myjob.output

log = myjob.log

should_transfer_files = YES

when_to_transfer_output = ON_EXIT

queue

the glidein concept
The Glidein Concept
  • Assume:

We need more machines, and we have permission to use a set of machines

  • Glidein temporarily adds a set of machines to the local pool
glidein
Glidein
  • In addition, Glidein solves the problem:

“My job needs to run on that particular resource, and my job needs Condor.”

    • For example: a job that must run under the standard universe
glidein1
Glidein
  • Condor sends and runs its own executables on the resource
  • The needed resource appears to temporarily join the local Condor pool !
glidein2
Glidein

run condor_glidein to add the remote resource to the local pool

the master and startd daemons become grid universe jobs using gt2

remote resource

local pool

making glidein work
Making Glidein Work
  • Change the configuration to give access permission (HOSTALLOW_WRITE) to the remote resource
  • No changes to jobs’ submit description files!
  • But, do enable file transfer in the submit description file:

universe = vanillaexecutable = myjob.exeinput = myjob.inputoutput = myjob.outputlog = myjob.logshould_transfer_files = YESwhen_to_transfer_output = ON_EXITqueue

force job to glidein resource
Force Job to Glidein Resource

In the submit description file:

universe = standardexecutable = ajob.exeinput = ajob.inputoutput = ajob.outputlog = ajob.logrequirements = \ ( machine == “example.mcs.anl.gov" ) \ && Arch != "" && OpSys != ""queue

the grid universe
The Grid Universe

Most useful when

  • We want to send a job off to a far away machine
  • We want to hand a job to another batch processing system on the local machine
  • We want to send a job off to a far away machine, in order to hand that job to another batch processing system on that machine
the grid universe1
The Grid Universe
  • All handled in the submit description file
  • Supports several back end types:
    • Globus: GT2, GT3, GT4
    • NorduGrid
    • UNICORE
    • Condor
    • PBS
    • LSF
condor g
Condor-G
  • Condor-G describes jobs to be handed off to a machine, and the machine is utilizing Globus middleware
    • gt 2: Globus Toolkit 1 or 2 or the pre-web services GRAM
    • gt 3: Globus Toolkit 3
    • gt 4: Globus Toolkit 4 or WS GRAM
submit description file1
Submit Description File

One of:

For gt2:

universe = grid

input = job1.input

output = job1.result

log = job1.log

grid_resource = gt2 example.wisc.edu/jobmanager

queue

jobmanagerjobmanager-condorjobmanager-pbsjobmanager-lsfjobmanager-sge

submit description file2

XXX is one of:

ForkCondorPBSLSFSGE

Submit Description File

For gt3:

universe = grid

input = job2.input

output = job2.result

log = job2.log

grid_resource = gt3 http://198.51.254.40:8080/osga/services/base /gram/XXXManagedJobFactoryService

queue

IP address:Port number

submit description file3

XXX is one of:

ForkCondorPBSLSFSGE

Submit Description File

For gt4:

universe = grid

input = job3.input

output = job3.result

log = job3.log

grid_resource = gt4 https://198.51.254.40:8080/wsrf/service/ManagedJobFactoryService XXX

queue

IP address:Port numberORHost name:Port number

nordugrid and the submit description file
Nordugrid and the Submit Description File

universe = grid

input = job4.input

output = job4.result

log = job4.log

grid_resource = nordugrid ngexample.com

queue

unicore and the submit description file
Unicore and the Submit Description File

vsite is the name of the Unicore virtual resource

universe = grid

input = job5.input

output = job5.result

log = job5.log

grid_resource = unicore usite.example.comvsite

keystore_file = /frieda/certificates/keystore

keystore_alias = “frieda”

keystore_passphrase_file = /frieda/private/passphrase

queue

pbs and the submit description file
PBS and the Submit Description File
  • Details of the PBS installation in$(GLITE_LOCATION)/etc/batch_gahp.config

universe = grid

input = job6.input

output = job6.result

log = job6.log

grid_resource = pbs

queue

lsf and the submit description file
LSF and the Submit Description File
  • Details of the LSF installation in$(GLITE_LOCATION)/etc/batch_gahp.config

universe = grid

input = job7.input

output = job7.result

log = job7.log

grid_resource = lsf

queue

condor c
Condor-C
  • Condor is running here,and Condor is running over there
  • For the case where

We want to send a job off to a far away machine, in order to hand that job to another batch processing system on that machine

condor c and the submit description file
Condor-C and the Submit Description File

universe = grid

input = job8.input

output = job8.result

log = job8.log

grid_resource = condor joe@remotemachine.example.com remotecentralmanager.example.com

+remote_jobuniverse = 5

+remote_requirements = True

+remote_ShouldTransferFiles = "YES"

+remote_WhenToTransferOutput = "ON_EXIT"

queue

schedd name

collector machine name

vanilla universe

credentials
Credentials
  • Not just anybody can use any resource at any time. . .
  • Key concepts:

Authentication

verification of an identity

Authorization

permission to do something

authentication
Authentication

If Frieda says “I am Frieda.”,

how do we distinguish this from

if Frieda says “I am George Bush.” ?

authentication1
Authentication
  • Bush can do whatever he pleases
  • If Frieda claims to be Bush, (and this is accepted), then Frieda can do whatever she pleases
  • Authentication attempts to verify the identity of the entity that is communicating
authorization
Authorization
  • Who is allowed (permitted) to do what
    • Frieda may run gt4 jobs on the Open Science Grid machines
    • Fred may write to files in /usr/bin
    • the Unix user root may do anything!
  • Can be implemented with a list of those authorized
condor and authentication
Condor and Authentication

Authentication within Condor comes in many forms. Here are three.

  • File system: Have the entity write a file. The OS attaches a name to the file owner. Condor checks that the entity’s claim is the same as the file owner.
  • GSI (Grid Security Infrastructure)
  • Kerberos
authentication idea
Authentication Idea

CA

  • A centralized certificate authority (CA) does verification of an entity’s identity.
  • When satisfied, the CA issues a signed certificate (also called a credential)

I am Frieda

authentication2
Authentication

CA

  • To authenticate, the entity presents the certificate
  • All is well, if we trust the CA and the remote machine

I am Frieda

gsi authentication
GSI Authentication
  • GSI uses X.509 certificates
  • Grid universe, submitting to back end types using Globus middleware (gt2, gt3, gt4), as well as nordugrid, and unicore use X.509 certificates
  • Condor can also use GSI
revocation trust and proxies
Revocation, Trust, and Proxies
  • The CA may revoke a credential
  • Frieda gives the signed credential to the remote machine. If the remote machine is malicious, it could impersonate Frieda. Therefore, a password protects the credential.
  • A proxy is a credential that includes the password, but is only valid for a specific (short) time period.
  • MyProxy software enables GSI proxy management