1 / 17

Lessons from LEAD/VGrADS Demo

Lessons from LEAD/VGrADS Demo. Yang-suk Kee, Carl Kesselman ISI/USC. Outline. SC’06 Demo Summary New Features of VGES Year-5 Development and Research Plans VGES Support for SC’07 Demo. LEAD/VGrADS Demo at SC’06. The first integration of LEAD/VGrADS software stacks

Download Presentation

Lessons from LEAD/VGrADS Demo

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lessons from LEAD/VGrADS Demo Yang-suk Kee, Carl Kesselman ISI/USC

  2. Outline • SC’06 Demo Summary • New Features of VGES • Year-5 Development and Research Plans • VGES Support for SC’07 Demo

  3. LEAD/VGrADS Demo at SC’06 • The first integration of LEAD/VGrADS software stacks • Identified functionalities and requirements of core components • Demonstrated the resource slot concept and the QBETS (BQP) potential • Showed slot-based scheduling using performance model

  4. DAG + Constraint Workflow Configuration Service Schedule toward a workflow deadline Virtual Grid Execution System Workflow Annotated DAG LEAD Resource Broker Performance Model Create Services Portal LEAD BPEL Workflow Engine App. Factory Launch Services Application Service (per task) Run job Scheduler Mapper Job Notification Run workflow one step at a time Workflow and File Status Batch Queue Prediction myLEAD (subscribes to messages from the broker and knows what magic to do with input/output files and talks to RLS/DRS Event Broker Adaptation LEADLinked Environments for Atmospheric Discovery

  5. If reserved submit PBS-glidin at slot start time else submit when BQP suggests DAG + Constraint Slot Globus Gateway Annotated DAG Find me two slots (vgFind) Query the performance model for task’s resource requirements Here is the workflow and constraints + pointer to performance model. Give me a mapping Slot PBS Run job Job Notification Return slots above threshold Return mapping Use performance model and map the tasks to the slots. If deadline can’t be met, return. Bind Resources (vgBind) Query Batch Queue prediction about probabilities of getting slots Run Job vgLaunch Query status (vgStatus ) Send job notifications If not reserved resource, ask - Is it time to submit? Constantly collecting data over time Schedule toward a workflow deadline (Reserved) Virtual Grid Execution System GT4 GRAM Resource Broker PBS Performance Model (Reserved) (Reserved) Scheduler Mapper Batch Queue Prediction

  6. New Features of Current VGES • Language • Support of resource equivalence (limited implementation) • WS-GRAM schema wrapper for execution on the personalized resources • Execution system • Probabilistic guarantee of resource binding • Resource orchestration and personalization

  7. Resource Equivalence • Specifying exchangeable constraints • Provides flexibility in resource discovery • Specifies constraints with precedence in order of appearance • PE = “Opteron” <> 4 * “Itanium”; vgdl = ClusterOf (node) [4] { node = [Processor == PE] }

  8. WS-GRAM Schema Wrapper • Providing abstract job description • Hides WS-GRAM schemas that are irrelevant for specifying applications • Application-related WS-GRAM schema • argument, count, directory, environment, executable, job, jobType, library, path, stderr, stdin, stdout • cf) host, factoryEndpoint

  9. Guarantee of Resource Binding • Deterministic guarantee • Batch with advance reservation • Probabilistic guarantee • Predicts resource availability for batch-scheduled resources • Models resource allocation of individual resource providers as a random variable with a binomial distribution

  10. Resource Actualization acquire 2 PBS LSF Resource actualization engine bind 1 GRAM Condor 3 check 4 notification WS-GRAM submit 6 Application launcher PBS launch 5 update 7 8 notification vgES Cluster

  11. vgdl=CluserOf (nd) [4] <10:00:00@1:0:0> { nd=[Processor=“P4”] } P1 P2 unavailable described submit sdsc (p=0.90) ncsa (p=0.85) iu (p=0.70) ada (p=0.65) P4 P3 11:00 A.M 9:00 A.M (cleanup) select active discovered Time 10:00 A.M 9:00 A.M (activate) bind P1 P2 sdsc (p=0.90) ncsa (p=0.85) iu (p=0.70) inactive bound (actualize) P4 P3 9:10 A.M 9:55 A.M sdsc ncsa iu

  12. Year-5 Plans • Extended implementation of slot allocation • Support of various resource managers (e.g., PBS, LSF, Load-leveler, Condor) • Personalization over multiple clusters • Consistent resource slot provisioning • Provides efficient resource scheduling techniques • Tradeoffs between quality, availability, and cost • Slot optimization • Optimization of inter/intra slot allocation • Deploying to as many TeraGrid sites as possible

  13. Consistent Resource Provisioning • Motivation • Can we get a slot for a specified time period in practice? • Limitation in both number of processors and wall time • Goals • Exploring offline/online algorithms • Presents system sub-slot schedules to users

  14. Resource Slot Provisioning Problem LooseBag for 2 days Slot duration S-slot U-slot Slot size MaxCPU This slot will be never satisfied! MaxWallTime

  15. Resource Slot Provisioning Problem LooseBag for 2 days S-slot S-slot S-slot S-slot U-slot S-slot S-slot

  16. VGES Support for the SC’07 Demo • Resource equivalence • Enables flexible resource discovery • Provides more reliable resource discovery service • New semantic of binding • Separates slot binding from actual resource allocations • Enables the LEAD workflow manager to exploit parallelism • Probabilistic guarantee of binding • Provides high slot availability virtually • Minimizes resource allocation failures due to late resource arrivals

  17. VGES Support for the SC’07 Demo • Support of various resource managers • Plugs in Loadleveler (Bigred) and LSF (Tungsten) • Covers most resource managers in TeraGrid • Callback mechanisms for resource arrivals • Provides asynchronous event notification • Lessens the burdens on both the client and the server • Consistent resource provision for LooseBag slots • Provisions resources proactively • Realizes slot in practice

More Related