High throughput urgent computing
Download
1 / 16

High Throughput Urgent Computing - PowerPoint PPT Presentation


  • 60 Views
  • Uploaded on

Condor Week 2008. High Throughput Urgent Computing. Jason Cope [email protected] Project Collaborators. Argonne National Laboratory / University of Chicago Pete Beckman Suman Nadella Nick Trebon University of Wisconsin-Madison Ian Alderman Miron Livny. Urgent Computing Use Cases.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' High Throughput Urgent Computing' - abra


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
High throughput urgent computing

Condor Week 2008

High Throughput Urgent Computing

Jason Cope

[email protected]


Project collaborators
Project Collaborators

  • Argonne National Laboratory / University of Chicago

    • Pete Beckman

    • Suman Nadella

    • Nick Trebon

  • University of Wisconsin-Madison

    • Ian Alderman

    • Miron Livny



High throughput urgent computing1
High Throughput Urgent Computing

  • Urgent computing provides immediate, cohesive access to computing resources for emergency computations

  • Support for urgent high throughput computing environments is necessary

    • Support for high throughput emergency computing applications

    • Urgent cycle scavenging



Spruce
SPRUCE

  • Special PRiority Urgent Computing Environment (SPRUCE)

    • TeraGrid Science Gateway

    • http://spruce.teragrid.org

  • GOAL: Provide cohesive urgent computing infrastructure for emergency computations

    • Authorization

    • Resource Selection

    • Resource Allocation


Spruce architecture overview 1 2

Event

Automated

Trigger

2

First Responder

SPRUCE Gateway / Web Services

1

Right-of-Way

Token

Human

Trigger

Right-of-Way Token

SPRUCE Architecture Overview ( 1 / 2 )

Source: Pete Beckman, ‘SPRUCE: An Infrastructure for Urgent Computing’


Spruce architecture overview 2 2
SPRUCE Architecture Overview ( 2 / 2 )

User Team

Authentication

4

?

Urgent Computing

Job Submission

Conventional

Job Submission

Parameters

Priority Job

Queue

Choose a

Resource

SPRUCE Job

Manager

3

!

5

Local Site

Policies

Urgent Computing

Parameters

Supercomputer

Resource

Source: Pete Beckman, ‘SPRUCE: An Infrastructure for Urgent Computing’


Spruce resources
SPRUCE Resources

  • Deployed on TeraGrid resources at IU, NCSA, NCAR, Purdue, TACC, SDSC, UC/ANL

  • Supported Resource Managers

    • PBS

    • PBS Pro

    • LSF

    • SGE

    • LoadLeveler

    • Cobalt

  • Local and Grid resource managers supported


Spruce and condor
SPRUCE and Condor

User Team

Authentication

?

Urgent Computing

Job Submission

Conventional

Job Submission

Parameters

Choose a

Resource

SPRUCE Job

Manager

3

!

4

Local Site

Policies

Urgent Computing

Parameters

Condor Pool

Adapted from Pete Beckman, ‘SPRUCE: An Infrastructure for Urgent Computing’


Spruce condor integration
SPRUCE / Condor Integration

  • Added support for urgent computing ClassAds

    • SPRUCE_URGENCY

    • SPRUCE_TOKEN_VALID

    • SPRUCE_TOKEN_VALID_CHECK_TIME

  • Modifications to the Condor schedd that support identifying SPRUCE jobs

  • SPRUCE Grid ASCII Helper Protocol (GAHP) Server

    • Asynchronously invoke SPRUCE Web service operations

    • GAHP calls integrated into the Condor schedd



Spruce condor integration2
SPRUCE / Condor Integration

  • SPRUCE provides an authorization mechanism for access to Condor resources

    • “Right-of-Way” access to Condor resources

    • Same authorization infrastructure for supercomputer and Grid resource access

  • Leverage existing Condor features to enhance scheduling policies

    • Job ranking / suspension / preemption

    • Site administrators define local scheduling policies


Spruce condor status
SPRUCE / Condor Status

  • Prototype complete August, 2007

    • Demonstrated urgent authorization and scheduling capabilities

    • Deployed and tested on equipment at the University of Colorado

  • Currently revising the prototype for a stable software release

    • Condor 7.0 support

    • Final software development iteration before official release

    • Evaluation of SPRUCE-related software integrated into larger Condor pools


Future work
Future Work

  • High throughput support for urgent computing applications

    • SURA SCOOP CH3D Grid Appliance

  • Many additional evaluation tasks

    • Application requirements

    • Security

    • Deadline scheduling / response time

    • Reliability / fault tolerance analysis

    • Data management


High throughput urgent computing2
High Throughput Urgent Computing

Questions?

[email protected]

http://spruce.teragrid.org


ad