Loading in 2 Seconds...
Loading in 2 Seconds...
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Chris Dabrowski Geoff Fox
Seattle, Washington, USA
October 17, 2007
II. Presentation/Review of Draft OGF Informational Document “Reliability in Grid Computing Systems”
Purpose:Make recommendations and explore methods for improving reliability and robustness of standards-based grid systems.
Main Product:Produce OGF Informational Document that
Summarizes the state of work on Grid system reliability and identifies reliability and robustness issues/requirements for grid systems
First draft in progress Contributions, review needed!
Facilitate collaborations between researchers on grid reliability
Preliminary requirements for reliability measurement methods and tools
Web pages and reflector
Unofficial: http://gridreliability.nist.gov/ List of resources (in progress)
Reflector: [email protected]
Title: Reliability in Grid Computing Systems:
Summarizes the state of work on Grid system reliability based on input from grid system practitioners/researchers
Identifies issues that must be addressed/solved to ensure reliability and robustness in grid systems
Provides basis for identifying requirements for establishing and maintaining high levels of reliability in large-scale Grids
Basis for preliminary requirements for methods and tools to measure grid system reliability
Focus on current practices and research that provide insight on how WS and grid specifications may affect grid reliability
Serve as resource on reliability issues for OGF working groups developing specifications and for grid developers.
First workshop (GGF16, Athens, Greece)
Site Assessment and Probabilistic Risk Analysis (PRA) of Grid Computing Facilities, by Joe Higgins and Robert Sewell of Sun Microsystems
Methods for analyzing risks involved in deploying and configuring grid computing sites
Reliable Messaging for Grids and Web Services, by Geoffrey Fox, Shrideep Pallickara, Damodar Yemme, Hasan Bulut and Sima Patel, Community Grids Lab, Indiana University
NaradaBrokering: scalable, standards-based management architecture for fault-tolerant grids
Providing Fault-tolerance for Parallel Programs on Grid (FT-MPICH), by Heon Y. Yeom of Distributed Computing Systems Laboratory, Seoul National University
Fault-tolerant MPI (FT-MPICH) with coordinated checkpointing of interacting, parallel processes
QoS-Aware Fault Tolerance in Grid Computing, by L. Valcarenghi, F. Cugini, F. Paolucci, and P. Castoldi, Scuola Superiore of Sant’Anna and CNIT, Pisa, Italy
Fault-tolerance thru integrating replicated services and QoS capable network protocol layer
A Program of Work for Understanding Emergent Behavior in Global Grid Systems, by Kevin Mills and Chris Dabrowski, of the U.S. NIST
Developing methods for understanding and controlling complex systems behavior in grids
Second workshop (OGF19, Chapel Hill, USA)
Using a Large-Scale Survivability Architecture to Control Grids: A Status Report, by Zach Hill, Jonathan Rowanhill, Jim Basney, Glenn Wasson, John Knight, Anh Nguyen-Tuong, Andrew Grimshaw and Marty Humphrey, University of Virginia and NCSA/University of Illinois, Urbana-Champaign
Reconfigurable Grid system architecture (Willow) for promoting survivability & dependability
Platform Symphony Reliability, by Nick Werstiuk, Platform Computing
Grid architecture for promoting reliability & dependability through failure detection and failover
Managing Grid and Web Services and their exchanged messages, by Harshawardhan Gadgil, Geoffrey Fox, Shrideep Pallickara, and Marlon Pierce, Indiana University
Results showing performance, scalability and cost-effectiveness of NaradaBrokering architecture
Reliability Assessment of Grid Software Systems Using Emergent Features, by Carol Song, Umut Topkara, Jungha Woo, and Sang Phill Park, Purdue University
Method for identifying centralized software components likely to impact grid system reliability
Reflections on Reliability Issues in OGSA, by Matti Hiltunen, AT&T Labs
Summary of requirements for ensuring reliability and availability of OGSA-based services
“Basic Concepts and Taxonomy of Dependable and Secure Computing,”
Checkpoint and recovery through process migration, grid
resource replication, replication in data grids