140 likes | 247 Views
PPDG Data Handling System. Reagan Moore Bing Zhu Arcot Rajasekar Michael Wan Wayne Schroeder moore@sdsc.edu. PPDG Data Grid Requirements. Access legacy systems Interface to local storage managers Manage replicas Interoperate with GSS API Interoperate with data grid manager.
E N D
PPDG Data Handling System Reagan Moore Bing Zhu Arcot Rajasekar Michael Wan Wayne Schroeder moore@sdsc.edu
PPDG Data Grid Requirements • Access legacy systems • Interface to local storage managers • Manage replicas • Interoperate with GSS API • Interoperate with data grid manager
PPDG Support Tasks • Software development • Interface to LBNL Storage Manager • Collection creation • Data handling system installation • Demonstration of replication
Data Handling System • SDSC Storage Resource Broker • Collection based management of distributed data sets • Designed to: • Function over Wide Area Network • Support access to archives, file systems, databases • Work across administration domains • Manage replicas, containers, metadata
File SID DBLobj SID Obj SID SRB Unix DB2 Oracle ADSM HPSS SDSC Storage Resource Broker & Meta-data Catalog Application Resource Third-party copy User Remote Proxies MCAT Dublin Core DataCutter Application Meta-data
Digital Library Data Management • Persistent identifiers • Ability to move a data set without the name changing • Data set replicas • Management of multiple copies of a data set • Archival backup of data sets • Integration of disk data caches with archival storage • Persistent archives • Management of a collection through multiple cycles of technology evolution
Software Development • Sstage - request to an HRM to pre-stage a file • SfileStatus - check state • File cached locally • File being cached, time to complete is returned • File in staging queue • Query rejected, HRM down
Software Development • Sget - synchronous request to stage, transfer file, and purge the local cache • Register a data set as a replica • Allows data sets to be moved independently of the SRB, and then registered
Collection Creation • Assembled 4750 datasets into SRB collection • /home/lblsrb.lbl/PPDG • SRB server ‘unix-test2-lbl’ • Replicated 30 data sets • From starsu00.nersc.edu • To vulture.cs.wisc.edu
PPDG Data Grid • Sites: • LBNL - HRM interface • LBNL - file system • Wisconsin - file system • CalTech - HPSS and file system • Fermi Lab - file system • Stanford - file system • SDSC - HPSS and file system
S-Commands S-Commands Wisc Client 2 SRB Server @Wisc Wisc Client 1 SRB Server @LBL SRB Server @LBL file caching esrb.driver esrb.driver IPC IPC Stage() purge() fileStatus() file purging File caching request Stage() purge() fileStatus() FC HRM HPSS esrb.server Stage() purge() fileStatus() Disk cache Current Data Grid
PPDG FY2000 Tasks • Upgrade to version 1.1.7 • Supports GSI authentication • Revise SfileStatus to meet current design changes • Integrate registering of replica into production system • Support data subsetting
Data Set Management • Model-Based Information Management • Rule-based ontology mapping, conceptual-level mediation - CMIX • Data Grid • Data federation across multiple libraries - MIX • Digital Library • Interoperable services for information discovery and presentation - SDLIP • Data Collection • Tools for managing data set collections on databases - MCAT • Data Handling • Systems for data retrieval from remote storage - SRB • Persistent Archives • Storage of data collections for 30 years - HPSS
Further Information http://www.npaci.edu/DICE