1 / 20

Data Replication Service

Data Replication Service. Sandeep Chandra GEON Systems Group San Diego Supercomputer Center. Motivation Data Replication Service (DRS) Components for DRS RLS, GridFTP, RFT DRS Deployment DRS setup on GEON Next Steps. Outline.

taji
Download Presentation

Data Replication Service

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center www.geongrid.org

  2. Motivation Data Replication Service (DRS) Components for DRS RLS, GridFTP, RFT DRS Deployment DRS setup on GEON Next Steps Outline www.geongrid.org

  3. Science domains spend considerable effort collecting and managing large amounts of data Science domains develop customized data management services that vary with the type of application Common data management requirements Publish and replicate large datasets Register data replicas in catalogs and discover them Perform metadata-based discovery of datasets May require ability to validate correctness of replicas Motivation www.geongrid.org

  4. These systems demand considerable resources to design, implement & maintain Typically cannot be re-used by other applications Need for a long-term solution Generalize functionality provided by these data management systems Provide suite of application-independent services Design and build on lower-level grid services Globus Reliable File Transfer (RFT) service Replica Location Service (RLS) GridFTP Motivation (cont.) www.geongrid.org

  5. Higher level data management service based on low level data management components like RLS and RFT The primary functionality is to Allow users to identify a set of desired files existing in their grid environment Make local replicas of those data files by transferring files from one or more source locations Register the new replicas in a Replica Location Service A possible solution:Data Replication System (DRS) www.geongrid.org

  6. A simple registry that keeps track of where replicas exist on physical storage systems. Users or services register files in RLS when the files are created. Query RLS servers to find these replicas. RLS can be a distributed registry, consisting of multiple servers at different sites. Distributed RLS increases the overall scale and store more mappings than would be possible in a single, centralized catalog. Replica Location Service (RLS) www.geongrid.org

  7. RLS (cont.) Logical File Name XYZ • A logical file name is a unique identifier for the contents of a file. • A physical file name is the location of a copy of the file on a storage system. • RLS maintains mappings between logical file names and one or more physical file names of replicas. • Users can provide a logical file name to an RLS server and ask for all the registered physical file names of replicas. • Users can also query an RLS server to find the logical file name associated with a particular physical file location. XYZ replica 1 XYZ replica 2 XYZ replica 3 Site 3 Site 1 Site 2 www.geongrid.org

  8. RLS (cont.) • Two servers: LRI, LRC • LRC stores mappings between logical names for data items and the physical locations of replicas. • Query the LRC to discover replicas associated with a logical name. • RLI server collects information about the logical name mappings stored in one or more LRCs. • RLI returns a list of all the LRCs it is aware of that contain mappings for the logical name contained in a query. • The client then queries these LRCs to find the physical locations of replicas. Replica Location Index (RLI) Nodes RLI RLI RLI LRC LRC LRC LRC Local Replica Catalogs (LRC) www.geongrid.org

  9. RLS in Context • The RLS is one component in a layered data management architecture • Consistency management provided by higher-level services Replica Consistency Management Services Reliable Replication Service Metadata Service Replica Location Service Reliable Data Transfer Service GridFTP www.geongrid.org

  10. The GridFTP protocol provides for the secure, robust, fast and efficient transfer of (especially bulk) data. Globus Toolkit provides the most commonly used implementation of the protocol, though others exist. The Globus Toolkit provides server implementation called globus-gridftp-server scriptable command line client called globus-url-copy a set of development libraries for custom clients GridFTP www.geongrid.org

  11. A WSRF compliant web service that provides “job scheduler” like functionality for data movement. You provide a list of source and destination URLs (including directories or files), then the service writes your job description into a database and moves the files on your behalf. Reliable File Transfer (RFT) www.geongrid.org

  12. Accepts SOAP description of a desired transfer Service methods are provided for querying the transfer status WSRF tools to subscribe for notifications of state change events Supports all the same options as globus-url-copy (buffer size, etc) Increased reliability because state is stored in a database Supports concurrency, multiple files transferred for better performance RFT (cont.) www.geongrid.org

  13. Globus Services • WSRF Services • Data Replication Service • Delegation Service • Reliable File Transfer Service • Pre WSRF Components • Replica Location Service (Local Replica Catalog, Replica Location Index) • GridFTP Server Local Site Reliable Data Delegation File Replication Service Transfer Service Service Replicator Delegated RFT Resource Credential Resource Web Service Container Local Replica GridFTP Replica Location Server Catalog Index www.geongrid.org

  14. DRS Deployment • Local storage system • GridFTP server for file transfer • Replica Location Service: • LRCs stores mappings from logical names to storage locations • RLI collects state summaries from LRCs • RFT: WSRF service to perform data transfer • DRS: The master replication service Create a Transfer request RFT Service DRS Service Replica Location Index Location Replica Catalog GridFTP Server Database Site Storage System www.geongrid.org

  15. Local Site Client 1 3 Reliable Data Delegation File 2 Replication Service Transfer 9 Service Service Request File 5 4 Replicator Delegated RFT 8 Resource Credential Resource 12 6 10 Web Service Container Replica Local GridFTP Location Replica 13 Server Index Catalog 7 Remote Sites 1…N 11 Reliable Data Delegation File Replication Service Transfer Service Service Replicator Delegated RFT Resource Credential Resource Web Service Container Replica Local GridFTP Location Replica Server Index Catalog www.geongrid.org

  16. Initiate a DRS Request Create a delegated credential (Delegate Authority) Create a Replicator resource (Replication Service) Monitor Replicator resource (Status) Discover replicas of files in RLS, select among replicas Start data transfer to local site with RFT service Check status Register new replicas in RLS catalogs Allow client inspection of DRS results Destroy Replicator resource DRS Functionality www.geongrid.org

  17. Geon DRS Test Setup ASU SDSC Globus Container Globus Container Create a Transfer request Create a Transfer request RFT Service RFT Service DRS Service DRS Service Replica Location Index Replica Location Catalog Replica Location Index Replica Location Catalog GridFTP GridFTP Server Server Database Site Storage System Database Site Storage System Data Transfer www.geongrid.org

  18. Transfer LIDAR data from ASU to SDSC resource. (HPSS, etc) Extend the testbed to include more nodes. Benchmarking data movement. Package DRS and components with GEON software stack version 2.0 Next Tasks www.geongrid.org

  19. Ann Chervenak & Robert Schuler (ISI) www.globus.org (slides) Acknowledgement www.geongrid.org

  20. Questions? www.geongrid.org

More Related