Int eu grid experiences with condor to run interactive and parallel applications on the grid
This presentation is the property of its rightful owner.
Sponsored Links
1 / 28

int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid PowerPoint PPT Presentation


  • 47 Views
  • Uploaded on
  • Presentation posted in: General

int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid. Elisa Heymann Department of Computer Architecture and Operating Systems. Outline. Introduction CrossBroker Parallel Job Support Interactive Job Support Conclusions. Introduction.

Download Presentation

int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Int eu grid experiences with condor to run interactive and parallel applications on the grid

int.eu.grid:Experiences with Condor to Run Interactive and Parallel Applications on the Grid

Elisa Heymann

Department of Computer Architecture and Operating Systems


Outline

Outline

  • Introduction

  • CrossBroker

  • Parallel Job Support

  • Interactive Job Support

  • Conclusions

Condor Week 2008, May 2008


Introduction

Introduction

  • int.eu.grid Environment:

    • gLite (EGEE Grid Middleware)

    • Extensions

      • CrossBroker

      • Migrating Desktop

  • Jobs not handled by gLite:

    • parallel jobs (MPI)

      • Run in more than one resource

    • Interactive jobs

      • The user interacts with the application during its execution

Condor Week 2008, May 2008


Batch execution on grids

Job

F1

F2

O1

O2

SERVICES

Middleware

Middleware

Middleware

Batch execution on Grids

Internet

REMOTE SITE

REMOTE SITE

Condor Week 2008, May 2008


Parallel interactive job execution

Job

Job

F1

F1

F2

F2

I/O forwarding

SERVICES

Middleware

Middleware

Middleware

Parallel & Interactive Job Execution

  • Use of resources from different sites

  • Resource-sets search

  • Co-allocation & synchronization

  • Fast start-up

  • Execution in high-occupancy situations

Internet

REMOTE SITE

REMOTE SITE

MPI

Condor Week 2008, May 2008


Architecture

EGEE/Globus

EGEE/Globus

CE

CE

WN

WN

WN

WN

Architecture

CrossBroker

Information

Index

Migrating

Desktop

Scheduling

Agent

Resource

Searcher

Replica

Manager

Application

Launcher

Condor-G

DAGMan

Condor Week 2008, May 2008


Architecture crossbroker

Architecture - CrossBroker

  • Scheduling Agent

    • Receives each job and keeps it in a persistent queue

    • Contacts Resource Searcher and gets a list of available resources

    • Selects resources and passes them to the Application Launcher

  • Resource Searcher

    • Given a job description (JobAd), performs the matchmaking between job needs and available resources.

    • Uses the Condor ClassAd library, originally designed for matches of a single job with a single resource.

    • A set matching has been developed to support matches of a single job to a group of resources.

  • Application Launcher

    • Responsible for providing a reliable submission service of parallel applications on the Grid.

    • Responsible for file staging at the remote site (executable and input/output files)

    • Uses the services of Condor-G

Condor Week 2008, May 2008


Parallel job support

Parallel Job Support

  • Support for parallel jobs:

    • Open MPI

    • PACX-MPI

    • MPICH-P4

    • MPICH-G2

  • Takes into account sites capabilites

  • Ability to define starter scripts/process to start the parallel job

    • mpi-start is configured automatically and used by default.

Condor Week 2008, May 2008


Parallel job support1

Parallel Job Support

  • Job Description Language file:

    • JOBTYPE:

      • Normal: sequential jobs, just one CPU

      • Parallel: more than one CPU

    • SUBJOBTYPE:

      • openmpi

      • pacx-mpi

      • mpich

      • mpich-g2

      • plain

    • JOBSTARTER (if not defined, mpi-start)

    • JOBSTARTERARGUMENTS

Condor Week 2008, May 2008


Parallel job support2

Parallel Job Support

Type = "Job";

VirtualOrganisation = "imain";

JobType = "Parallel";

SubJobType = "pacx-mpi";

NodeNumber = 5;

Executable = "test-app";

Arguments = "-v";

InputSandbox = {"test-app", "inputfile"};

OutputSanbox = {"std.out", "std.err"};

StdErr = "std.err“;

StdOutput = "std.out";

Rank = other.GlueHostBenchmarkSI00 ;

Requirements =

other.GlueCEStateStatus == "Production";

Condor Week 2008, May 2008


Mpi across sites

MPI Across Sites

  • CrossBroker search and selects sets of resources for the jobs

  • There is no guarantee that all tasks of the same job will start at the same time

    • 1st choice: select only sites with free resources. The job will run immediately. Unfortunately, free resources are not always available

    • 2nd choice: allocate a resource temporally and wait until all other tasks show up. Timeshare the resource with a backfilling policy to avoid resource idleness

Condor Week 2008, May 2008


Mpi across sites1

CE2=aocegrid.uab.es

FreeCPUs = 10

Disk =100

AverageSI = 4000

CE1=zeus.cyf-kr.edu.pl

FreeCPUs = 2

Disk =100

AverageSI = 2000

CE

CE

CE3=bee001.ific.uv.es

FreeCPUs = 3

Disk =100

AverageSI = 1000

RS

CE

CE5=lngrid02.lip.pt

FreeCPUs = 2

Disk =100

AverageSI = 1000

CE

CE4= xgrid.icm.edu.pl

FreeCPUs = 6

Disk =100

AverageSI = 1000

CE

[Groups with 1 CEs]

[Rank=2000]

aocegrid.uab.es:2119/jobmanager-pbs-workq

freeCPUs = 10

MPI enabled CE

[Rank=1500]

zeus.cyf-kr.edu.pl:2119/jobmanager-pbs-workq

freeCPUs = 2

bee001.ific.uv.es:2119/jobmanager-pbs-workq

freeCPUs = 3

Non-MPI enabled CE

Rank=1000]

lngrid02.lip.pt/jobmanager-pbs-workq

freeCPUs = 2

bee001.ific.uv.es:2119/jobmanager-pbs-workq

freeCPUs = 3

MPI Across Sites

[Groups with 1 CEs]

[Rank=2000]

aocegrid.uab.es:2119/jobmanager-pbs-workq

freeCPUs = 10

[Groups with 2 CEs]

[Rank=1500]

zeus.cyf-kr.edu.pl:2119/jobmanager-pbs-workq

freeCPUs = 2

bee001.ific.uv.es:2119/jobmanager-pbs-workq

freeCPUs = 3

[Rank=1000]

bee001.ific.uv.es:2119/jobmanager-pbs-workq

freeCPUs = 3

lngrid02.lip.pt:2129/jobmanager-pbs-workq

freeCPUs = 2

Condor Week 2008, May 2008


Time sharing

Time Sharing

Grid Resource

CrossBroker

LRMS

MPI

JOB

Scheduling

Agent

Condor-G

Condor Week 2008, May 2008


Time sharing1

Time Sharing

Grid Resource

CrossBroker

LRMS

MPI

JOB

Scheduling

Agent

Application

Launcher

Condor-G

Condor Week 2008, May 2008


Time sharing2

Time Sharing

Grid Resource

CrossBroker

LRMS

MPI

JOB

Scheduling

Agent

Condor GlideIn

Application

Launcher

VM1

VM2

Condor-G

Condor Week 2008, May 2008


Time sharing3

Time Sharing

Grid Resource

CrossBroker

LRMS

MPI

JOB

Scheduling

Agent

Condor GlideIn

Application

Launcher

VM1

VM2

Condor-G

Condor Week 2008, May 2008


Time sharing4

Time Sharing

Grid Resource

CrossBroker

LRMS

Scheduling

Agent

Condor GlideIn

Application

Launcher

VM1

VM2

Condor-G

MPI

TASK

Wait

for the rest of MPI tasks

Condor Week 2008, May 2008


Time sharing5

Time Sharing

Grid Resource

CrossBroker

JOB

LRMS

Scheduling

Agent

Condor GlideIn

Application

Launcher

VM1

VM2

Condor-G

MPI

TASK

Condor Week 2008, May 2008


Time sharing6

Time Sharing

Grid Resource

CrossBroker

LRMS

Scheduling

Agent

Condor GlideIn

Application

Launcher

VM1

VM2

Condor-G

JOB

MPI

TASK

BackFilling

while the MPI waits

Condor Week 2008, May 2008


Time sharing7

Time Sharing

Grid Resource

CrossBroker

LRMS

Scheduling

Agent

Condor GlideIn

Application

Launcher

VM1

VM2

Condor-G

MPI

TASK

JOB

All tasks

Ready!

Condor Week 2008, May 2008


Interactive job support

Interactive Job Support

  • Scheduling priority

    • Interactive jobs are sent to sites with available machines

    • If there are not available machines, use time sharing

  • Support for interactivity in all kinds of jobs

    • sequential and all the MPI flavors

  • CrossBroker injects interactive agents that enable communication between user and job

    • Transparent to the user

    • Full integration with glogin & gVid

    • Condor Bypass supported

Condor Week 2008, May 2008


Interactive job support1

Interactive Job Support

  • Job Description Language file:

    • INTERACTIVE: true/false. Indicates that the job is interactive and the broker should treat it with higher proirity

    • INTERACTIVEAGENT

    • INTERACTIVEAGENTARGUMENTS

      • These attributes specify the command (and its arguments) used to communicate with the user.

Condor Week 2008, May 2008


Interactive job support2

Interactive Job Support

Type = "Job";

VirtualOrganisation = "imain";

JobType = "Parallel";

SubJobType = “openmpi";

NodeNumber = 11;

Interactive = TRUE;

InteractiveAgent = “glogin“;

InteractiveAgentArguments = “-r –p 195.168.105.65:23433“;

Executable = "test-app";

InputSandbox = {"test-app", "inputfile"};

OutputSanbox = {"std.out", "std.err"};

StdErr = "std.err“;

StdOutput = "std.out";

Rank = other.GlueHostBenchmarkSI00 ;

Requirements =

other.GlueCEStateStatus == "Production";

Condor Week 2008, May 2008


Interactive job support3

Interactive Job Support

Particle trajectories in Fusion devices

Increasing the temperature of a gas, we get a plasma state

  • At this temperature, the union of light atom nuclei is possible through an exothermal process:

    • Mass after fusion process is less than before it

    • Exceeding mass -> energy

Condor Week 2008, May 2008


Time sharing8

Time Sharing

Grid Resource

CrossBroker

INT.

JOB

LRMS

Scheduling

Agent

Condor GlideIn

Application

Launcher

VM1

VM2

Condor-G

BATCH

Condor Week 2008, May 2008


Time sharing9

Time Sharing

Grid Resource

CrossBroker

LRMS

Scheduling

Agent

Agent

Application

Launcher

VM1

VM2

Condor-G

INT.

JOB

BATCH

Startup-time

Reduction

Only one layer involved

Condor Week 2008, May 2008


Conclusions

Conclusions

  • CrossBroker supports both Parallel and Interactive jobs

    • Automatically

    • Interoperable with EGEE

  • Glide In

    • Fast startup of jobs

    • Co-allocation without reservation or wasting resources

  • Real Applications

    • Visualization of plasma in fusion devices

    • Evolution of pollution clouds in the atmosphere

    • Ultrasound Computing Tomography: Reconstruction of a 3D volume

    • FLUIDYNAMICS application

Condor Week 2008, May 2008


Questions

Questions?

Elisa Heymann

Department of Computer Architecture and Operating Systems


  • Login