1 / 14

The Globus Replica Management System

The Globus Replica Management System. The Problem. “Enable a geographically distributed community [of thousands] to perform sophisticated, computationally intensive analyses on Petabytes of data”. Example: CERN Large Hadron Collider. Multiple petabytes of data per year

Download Presentation

The Globus Replica Management System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Globus Replica Management System

  2. The Problem “Enable a geographically distributed community [of thousands] to perform sophisticated, computationally intensive analyses on Petabytes of data”

  3. Example: CERN Large Hadron Collider • Multiple petabytes of data per year • Copy of everything at CERN (Tier 0) • Subsets at national centers (Tier 1) • Smaller regional centers (Tier 2) • Individual researchers have copies • How to keep track of all copies? • Select among available copies or create a new copy?

  4. Outline • Globus Replica Management • Replica catalog • Cooperation with other Information Services • Replica selection • Dynamic replica creation • Metadata catalogs • Application scenario • Outstanding issues

  5. Our Approach to Replica Management • Identify replica cataloging and reliable replication as two fundamental services • Layer on other Grid services: GSI, transport, MDS Information Service • Use LDAP as catalog format and protocol, for consistency • These services can be used as building blocks for higher-level services

  6. The Replica Catalog:An Information Service • Registers new copies of files and collections • Responds to queries about existing replicas • Maintains a mapping between logical names for files and collections and one or more physical locations • Uses the LDAP protocol • Accessed by higher-level tools that perform: • Selection of replicas based on performance • From Information Services (MDS, NWS) • Dynamic creation of replicas in response to demand

  7. Replica Catalog Structure: A Climate Modeling Example Replica Catalog Logical Collection C02 measurements 1998 Logical Collection C02 measurements 1999 Filename: Jan 1998 Filename: Feb 1998 … Logical File Parent Location jupiter.isi.edu Location sprite.llnl.gov Filename: Mar 1998 Filename: Jun 1998 Filename: Oct 1998 Protocol: gsiftp UrlConstructor: gsiftp://jupiter.isi.edu/ nfs/v6/climate Filename: Jan 1998 … Filename: Dec 1998 Protocol: ftp UrlConstructor: ftp://sprite.llnl.gov/ pub/pcmdi Logical File Jan 1998 Logical File Feb 1998 Size: 1468762

  8. Components of the GlobusReplica Manager • Replica catalog definition • LDAP object classes for representing logical-to-physical mappings in an LDAP catalog • Low-level replica catalog API • globus_replica_catalog library • Manipulates replica catalog: add, delete, etc. • High-level reliable replication API • globus_replica_manager library • Combines calls to file transfer operations and calls to low-level API functions: create, destroy, etc.

  9. Replica Catalog API • globus_replica_catalog_collection_create() • Create a new logical collection • globus_replica_catalog_collection_open() • Open a connection to an existing collection • globus_replica_catalog_location_create() • Create a new location (replica) of a complete or partial logical collection • globus_replica_catalog_collection_list_filenames() • List all logical files in a collection • globus_replica_catalog_location_search_filenames() • Search for the locations (replicas) that contain a copy of all the specified files

  10. Replica Selection Relies on Information Services • Replica catalog identifies all existing copies of files or collections • Select among them based on performance • Consult other Information Services • Network Weather Service: network performance between source, destination • Information Service for Storage Systems: file system capacity and performance • Wide variety of selection algorithms

  11. Dynamic Replica Creation andInformation Services • Application manager needs to guarantee a certain level of performance • Bandwidth from source to destination • Rate of accesses • Using information services (NWS, MDS): • Determine that existing replicas can’t provide that performance • Identify location to create a new replica with desired capacity and performance • Data distribution services

  12. Relationship of Replica Managerand Metadata Catalogs • Metadata Services: • Information Services that describe data contents • Replica Management Service interacts with a variety of metadata catalogs • Globus: simple set of object classes • MCAT • Community-defined metadata catalogs using common set of attributes • Metadata service produces logical names needed by replica catalog: • Logical collections • Logical files

  13. A Model Architecture for Data Grids Attribute Specification Replica Catalog Metadata Catalog Application Multiple Locations NWS Logical Collection and Logical File Name Selected Replica Replica Selection MDS gsiftp commands Performance Information and Predictions Disk Cache TapeLibrary Disk Array Disk Cache Replica Location 1 Replica Location 2 Replica Location 3

  14. Outstanding Issues for Replica Management • Early architecture assumed a read-only workload • What update models should we support? • What high-level operations are needed? • Combine storage and catalog operations • Relationship to databases • Replicating the replica catalog • Alternate catalog views: files belong to more than one logical collection

More Related