1 / 14

PPDG Data Handling System

PPDG Data Handling System. Reagan Moore Bing Zhu Arcot Rajasekar Michael Wan Wayne Schroeder moore@sdsc.edu. PPDG Data Grid Requirements. Access legacy systems Interface to local storage managers Manage replicas Interoperate with GSS API Interoperate with data grid manager.

nash
Download Presentation

PPDG Data Handling System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PPDG Data Handling System Reagan Moore Bing Zhu Arcot Rajasekar Michael Wan Wayne Schroeder moore@sdsc.edu

  2. PPDG Data Grid Requirements • Access legacy systems • Interface to local storage managers • Manage replicas • Interoperate with GSS API • Interoperate with data grid manager

  3. PPDG Support Tasks • Software development • Interface to LBNL Storage Manager • Collection creation • Data handling system installation • Demonstration of replication

  4. Data Handling System • SDSC Storage Resource Broker • Collection based management of distributed data sets • Designed to: • Function over Wide Area Network • Support access to archives, file systems, databases • Work across administration domains • Manage replicas, containers, metadata

  5. File SID DBLobj SID Obj SID SRB Unix DB2 Oracle ADSM HPSS SDSC Storage Resource Broker & Meta-data Catalog Application Resource Third-party copy User Remote Proxies MCAT Dublin Core DataCutter Application Meta-data

  6. Digital Library Data Management • Persistent identifiers • Ability to move a data set without the name changing • Data set replicas • Management of multiple copies of a data set • Archival backup of data sets • Integration of disk data caches with archival storage • Persistent archives • Management of a collection through multiple cycles of technology evolution

  7. Software Development • Sstage - request to an HRM to pre-stage a file • SfileStatus - check state • File cached locally • File being cached, time to complete is returned • File in staging queue • Query rejected, HRM down

  8. Software Development • Sget - synchronous request to stage, transfer file, and purge the local cache • Register a data set as a replica • Allows data sets to be moved independently of the SRB, and then registered

  9. Collection Creation • Assembled 4750 datasets into SRB collection • /home/lblsrb.lbl/PPDG • SRB server ‘unix-test2-lbl’ • Replicated 30 data sets • From starsu00.nersc.edu • To vulture.cs.wisc.edu

  10. PPDG Data Grid • Sites: • LBNL - HRM interface • LBNL - file system • Wisconsin - file system • CalTech - HPSS and file system • Fermi Lab - file system • Stanford - file system • SDSC - HPSS and file system

  11. S-Commands S-Commands Wisc Client 2 SRB Server @Wisc Wisc Client 1 SRB Server @LBL SRB Server @LBL file caching esrb.driver esrb.driver IPC IPC Stage() purge() fileStatus() file purging File caching request Stage() purge() fileStatus() FC HRM HPSS esrb.server Stage() purge() fileStatus() Disk cache Current Data Grid

  12. PPDG FY2000 Tasks • Upgrade to version 1.1.7 • Supports GSI authentication • Revise SfileStatus to meet current design changes • Integrate registering of replica into production system • Support data subsetting

  13. Data Set Management • Model-Based Information Management • Rule-based ontology mapping, conceptual-level mediation - CMIX • Data Grid • Data federation across multiple libraries - MIX • Digital Library • Interoperable services for information discovery and presentation - SDLIP • Data Collection • Tools for managing data set collections on databases - MCAT • Data Handling • Systems for data retrieval from remote storage - SRB • Persistent Archives • Storage of data collections for 30 years - HPSS

  14. Further Information http://www.npaci.edu/DICE

More Related