Grid middleware for high performance computing
Download
1 / 14

Grid Middleware for High Performance Computing - PowerPoint PPT Presentation


  • 76 Views
  • Uploaded on
  • Presentation posted in: General

Grid Middleware for High Performance Computing. Sathish Vadhiyar Grid Applications Research Lab (GARL) Supercomputer Education and Research Centre (SERC) Indian Institute of Science (IISc) Bangalore - 560012. ATIP 1 st Workshop on HPC in India @ SC-09. Grid Applications Research Lab.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha

Download Presentation

Grid Middleware for High Performance Computing

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Grid Middleware for High Performance Computing

Sathish Vadhiyar

Grid Applications Research Lab (GARL)

Supercomputer Education and Research Centre (SERC)

Indian Institute of Science (IISc)

Bangalore - 560012

Workshop on HPC in India

ATIP 1st Workshop on HPC in India @ SC-09


Grid Applications Research Lab

  • Grid and Parallel Computing with primary focus on

    • developing grid applications,

    • building strategies for checkpointing, migration, rescheduling, and fault-tolerance for parallel applications on grid systems, and

    • performance modeling of parallel applications on grids

ATIP 1st Workshop on HPC in India @ SC-09


Motivation

  • Developing solutions for deployment and use of large-scale scientific applications on grids

  • Will result in exploration of large-sized problems and long-running applications

ATIP 1st Workshop on HPC in India @ SC-09


Grid ApplicationsClimate Modeling

CCSM

  • Enable efficient executions of long-running climate modeling simulations on grid systems with the objective of solving climate science problems

  • Community Climate System Model (CCSM) – a multi-component global general circulation model

  • Analyzed the benefits of executing different components with checkpointing and rescheduling in different batch systems of a grid with a novel execution model

ATIP 1st Workshop on HPC in India @ SC-09


Grid ApplicationsClimate Modeling – General IdeaIJHPCA, FGCS

Novel Execution Model

  • Job submission to a batch system incurs queue waiting time

  • Waiting time depends on processor requirements

  • How about decomposing a job into small subjobs with small processor requirements and submitting the subjobs to multiple batch systems of a grid?

  • Efficiency depends on effective system utilization using checkpointing, migration and rescheduling

  • Leads to 55% average increase in throughput

ATIP 1st Workshop on HPC in India @ SC-09


Grid ApplicationsDNA Sequence Evolutions JPDC, escience 2009

Master-Worker Architecture for Analyzing Mutations

  • Predictions of future sequences in an evolutionary tree important for drug discovery, pharmaceutical research and disease control

  • Different ways of an ancestor sequence to transform to a progeny sequence

  • Formulated as a search-space exploration problem and used computational grids for explorationsof the huge space of possible mutations

  • Used popular mutations to predict future evolutionary paths.

  • Performed predictions for hiv sequences and other protein sequences

  • 40% better than random methods

40% Better Predictions

ATIP 1st Workshop on HPC in India @ SC-09


Rescheduling

  • It is necessary to adapt application execution to grid resource and application dynamics

  • SRS – a checkpointing library for malleable applications

  • Can allow processor reconfiguration between migrations

  • Supports different data distributions, storage infrastructure, active migration and fault tolerance

ATIP 1st Workshop on HPC in India @ SC-09


Resheduling Strategies

  • Given a parallel application consisting of multiple phases and given a set of resources, the problem is to derive a rescheduling plan

    • Where to execute the different phases and when to migrate/reschedule

Application Phases

Cluster-1

2

3

Interval 1 (t1)

  • To find {I1, I2, …,ILopt} such that

Interval 2 (t2)

is minimized

where Lopt – number of intervals; ti – predicted execution time of each interval; rcost – rescheduling cost

Interval 3 (t3)

  • Developed 3 novel algorithms for deriving a rescheduling plan

    • Incremental algorithm, division heuristic and genetic algorithm

Interval i (ti)

Division heuristic

ATIP 1st Workshop on HPC in India @ SC-09


Rescheduling Strategies

  • Performed experiments with five large-scale multi-phase parallel applications

    • Molecular dynamics, n-body simulations, astrophysical gas dynamics, crack propagation, electromagnetics.

Huge Benefits due to Rescheduling

ATIP 1st Workshop on HPC in India @ SC-09


Performance ModelingJPDC,CPE

Performance Model Accuracy for Parallel QR

  • It is imperative to automatically derive “knowledge” (performance characteristics) of applications

  • Can be used for effective mapping of applications to resources

  • Built techniques for automatically deriving performance model functions for predicting execution costs of parallel applications on grids

  • First effort to deal with load changes during application executions

  • Less than 30% modeling errors – best reported for non-dedicated systems

  • Have also developed novel scheduling algorithms that use the model functions

  • Generates 80% better schedules than existing approaches

Scheduling Results

Box Elimination (BE) [red bars]

50-80% more efficient!

ATIP 1st Workshop on HPC in India @ SC-09

Scheduling Method


Grid Middleware

  • Created a grid middleware for parallel multi-phase applications with rescheduling capabilities

  • Have successfully run multi-phase applications on grid consisting of multiple batch and interactive clusters in two geographically distributed sites

  • Also created a grid middleware for multi-component applications for coordinating the executions of the components on the different systems

Grid Middleware for Multi-Component Applications

Grid Middleware for Multi-Phase Applications

ATIP 1st Workshop on HPC in India @ SC-09


Other Research

  • Checkpointing Interval Selection

    • For efficient execution in the presence of failures

    • A Markov Model consisting of 3 kinds of states for performance prediction

    • Extensive simulations with 9-year real supercomputer failure traces on 8 parallel systems, 3 rescheduling policies, and 3 parallel applications

    • Our model’s checkpointing intervals lead to high amount of useful work by the applications in the presence of failures

  • Compiler-aided checkpointing instrumentation

    • A source-to-source precompiler for automatic insertion of checkpointing calls

    • Performs live-variable analysis for determining data and wrappers for finding data sizes

    • Can handle parallel applications with block-distribution (molecular dynamics)

ATIP 1st Workshop on HPC in India @ SC-09


Summary

  • Primary endeavor to aid scientific advancement in different domain areas using grid systems

  • Grid research in two different application areas that resulted in significant application benefits using grids

  • Contributed novel scheduling and rescheduling algorithms, performance modeling strategies and robust grid middleware for use by scientific community

ATIP 1st Workshop on HPC in India @ SC-09


Areas of Collaborations

  • Scalability of large-scale and peta applications

  • Fault tolerance in high performance systems

  • Setting up Indo-US grids

  • Grid middleware collaborations

Thank You

ATIP 1st Workshop on HPC in India @ SC-09


ad
  • Login