1 / 15

GGF Data Grid Interoperability Demonstration

GGF Data Grid Interoperability Demonstration. Organizers: Erwin Laure ( Erwin.Laure@cern.ch ) Reagan Moore ( moore@sdsc.edu ) Arun Jagatheesan ( arun@sdsc.edu ) - grid coordination Sheau-Yen Chen ( sheauc@sdsc.edu ) - data grid administrator

milo
Download Presentation

GGF Data Grid Interoperability Demonstration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GGF Data Grid Interoperability Demonstration • Organizers: Erwin Laure (Erwin.Laure@cern.ch) Reagan Moore (moore@sdsc.edu) Arun Jagatheesan (arun@sdsc.edu) - grid coordination Sheau-Yen Chen (sheauc@sdsc.edu) - data grid administrator Chien-Yi Hou (chienyi@sdsc.edu) - collection administrator • Goals: • Demonstrate federation of 17 SRB data grids (shared name spaces) • Demonstrate replication of a collection • Participants (19 data grids): • APAC – Australia: Stephen McMahon stephen.mcmahon@anu.edu.au • ASGC – Taiwan: Eric Yen, Wei-Long Ueng wlueng@twgrid.org • ChinaGrid – China: Li Qi quick.qi@gmail.com • DEISA-Italy: Giuseppe Fiameni g.fiameni@cineca.it • IB-New Zealand: Daniel Hanlon d.j.hanlon@dl.ac.uk • IB-UK: Daniel Hanlon d.j.hanlon@dl.ac.uk • IN2P3-France: Jean-Yves Nief nief@cc.in2p3.fr • KEK- Japan: Yoshimi Iida yoshimi.iida@kek.jp • LCDRG-US: Chien-Yi Hou chienyi@sdsc.edu • NCHC – Taiwan: Hsu-Mei Chou hmchou@nchc.org.tw • NOAO – Chile/US: Irene Barg ibarg@noao.edu • Purdue - US: Lan Zhao lanzhao@purdue.edu • RAL – UK: Adil Hasan a.hasan@rl.ac.uk • RNP - Brazil: Marcio Faerman marcio@rnp.br • SARA – Netherlands: Bart Heupers bart@sara.nl • TeraGrid – US: Sheau-Yen Chen sheauc@sdsc.edu • U. Maryland - US: Mike Smorul toaster@umiacs.umd.edu • UERJ - Brazil: Alberto Santoro Alberto.Santoro@cern.ch • WUNGrid – UK: Sheau-Yen Chen sheauc@sdsc.edu GGF-18 Data grid interoperability

  2. Intellectual Property Policy • I acknowledge that participation in GGF18 is subject to the GGF Intellectual Property Policy. • Intellectual Property Notices Note Well: All statements related to the activities of the GGF and addressed to the GGF are subject to all provisions of Section 17 of GFD-C.1 (.pdf), which grants to the GGF and its participants certain licenses and rights in such statements. Such statements include verbal statements in GGF meetings, as well as written and electronic communications made at any time or place, which are addressed to: the GGF plenary session, • any GGF working group or portion thereof, • the GFSG, or any member thereof on behalf of the GFSG, • the GFAC, or any member thereof on behalf of the GFAC, • any GGF mailing list, including any working group or research group list, or any other list functioning under GGF auspices, • the GFD Editor or the GWD process • Statements made outside of a GGF meeting, mailing list or other function, that are clearly not intended to be input to an GGF activity, group or function, are not subject to these provisions. • Excerpt from Section 17 of GFD-C.1 Where the GFSG knows of rights, or claimed rights, the GGF secretariat shall attempt to obtain from the claimant of such rights, a written assurance that upon approval by the GFSG of the relevant GGF document(s), any party will be able to obtain the right to implement, use and distribute the technology or works when implementing, using or distributing technology based upon the specific specification(s) under openly specified, reasonable, non-discriminatory terms. The working group or research group proposing the use of the technology with respect to which the proprietary rights are claimed may assist the GGF secretariat in this effort. The results of this procedure shall not affect advancement of document, except that the GFSG may defer approval where a delay may facilitate the obtaining of such assurances. The results will, however, be recorded by the GGF Secretariat, and made available. The GFSG may also direct that a summary of the results be included in any GFD published containing the specification.GGF Intellectual Property Policies are adapted from the IETF Intellectual Property Policies that support the Internet Standards Process. GGF-18 Data grid interoperability

  3. GIN - Two Approaches • Virtualize the storage resource • Provide a standard interface to the storage system for access • Storage Resource Manager • Asynchronous interface to storage • Virtualize the shared collection • Manage the properties of a shared collection independently of the multiple storage systems where it is distributed • Storage Resource Broker • Collection management • Federation of independent collections GGF-18 Data grid interoperability

  4. SRB Data Grid Federation Status GGF-18 Data grid interoperability

  5. Data Grid Federation • Builds on: • Registry for data grid names - ensures each data grid has a unique identity • Trust establishment - explicit registration command issued by the data grid administrator of each data grid • Peer-to-peer server interaction - each SRB server can respond to commands from any other SRB server, provided trust has been established between the data grids • Administrator controlled registration of name spaces - each grid controls whether they will share user names, file names, replicate data, replicate metadata or allow remote data storage • Shibboleth style user authentication - a person is identified by /Zone-name/user-name.domain-name. Authentication is done by the home zone. No passwords are shared between zones. • Local authorization - operations are under the control of the zone being accessed, including controls on access to files, storage resources, metadata and user quotas. GGF-18 Data grid interoperability

  6. Federation Between Data Grids Data Access Methods (Web Browser, Scommands, OAI-PMH) Data Collection A Data Collection B • Data Grid • Logical resource name space • Logical user name space • Logical file name space • Logical context (metadata) • Control/consistency constraints • Data Grid • Logical resource name space • Logical user name space • Logical file name space • Logical context (metadata) • Control/consistency constraints Access controls and consistency constraints on cross registration of name spaces GGF-18 Data grid interoperability

  7. Challenge - Replicate a Collection • Replicate files in a collection • Demonstrated at GGF17 • Replicate metadata associated with a shared collection • Authenticity metadata - describe provenance of file • Integrity metadata - state information such as checksums, access controls • SRB information synchronization command • Szonesync.pl -d -z remotezone • Synchronize data information with zone “remotezone” • Szonesync.pl -u -z remotezone • Synchronize user information with zone “remotezone” • Szonesync.pl -r -z remotezone • Synchronize user and resource information with zone “remotezone” GGF-18 Data grid interoperability

  8. Collection Management • Metadata extraction - user-defined metadata • Execute remote procedure to extract metadata from a file • Load the extracted metadata into the remote zone MCAT catalog • Demonstrated on FITS astronomy image • Images provided by Irene Barg - NOAO (noao-ls-t3-z1 data grid) • Created parsing template to extract metadata attributes from FITS header • Modified SRB to support extraction of multiple versions of the same metadata attribute from large files • Executed commands on the /GGF-RNP data grid in Brazil • Extracted 183 metadata attributes from a FITS header • ./Sufmeta ct422131.fits • DTPI = 'Christopher Stubbs' • 89 DTPIAFFL = 'University of Washington' • 90 DTTITLE = 'A Next Generation Microlensing Survey of the LMC' • 91 DTACQUIS = 'ctioa8.ctio.noao.edu' • 92 DTACCOUN = 'mosaic ' • 93 DTACQNAM = '/ua00/mosaic/tonight/sm84.051011_0516.100.fits' • 94 DTNSANAM = 'ct422131.fits ' GGF-18 Data grid interoperability

  9. Collection Management • Metadata hierarchy - extensible schema • Create additional tables in MCAT catalog to support schema extension • Load a metadata hierarchy into the remote zone MCAT catalog • Demonstrated on a state department collection of communiques about Amelia Earhart • Collection provided by Mark Conrad - NARA (LCDRG-GGF data grid) • Created scripts to add 70 tables to the MCAT catalog • Created scripts to load the Life Cycle Data Requirements Guide metadata into MCAT • Added LCDRG metadata hierarchy to 43 files in the Amelia Earhart collection on the UERJ-HEPGrid in Brazil • Queried the metadata hierarchy GGF-18 Data grid interoperability

  10. Information Management • Squery -N LCDRG_object -S LCDRG_object.object_data_id --------------------------- RESULTS ------------------------------ data_id: 503 ----------------------------------------------------------------- data_id: 558 ----------------------------------------------------------------- • Squery -N LCDRG_recordgroup -S LCDRG_recordgroup.recordgroup_grno -N LCDRG_object LCDRG_object.object_data_id = 558 --------------------------- RESULTS ------------------------------ grno: 59 ----------------------------------------------------------------- GGF-18 Data grid interoperability

  11. Challenges • Multiple software versions • All 3.4 versions interoperate • Using both SRB 3.4.0, SRB 3.4.1, SRB 3.4.2, SRB 3.4.2-P • Management of firewalls • Require ports opened to allow control messages to be exchanged • Support client-initiated and server-initiated parallel I/O and bulk load operations for data and metadata transport • Network tuning • Need to ensure system buffer size, TCP window size set for intercontinental latencies • Need to specify 6-16 parallel I/O streams as default • Management of shared collection • Decide what will be shared • Create logical resource name on which will support shared data GGF-18 Data grid interoperability

  12. Challenges • Port of SRM interface as client API to a SRB collection • Established as a collaboration • “Wayne Schroeder” schroede@sdsc.edu • “Wei-Long” wlueng@twgrid.org • “Eric Yen” eric@sinica.edu.tw • “Ethan Lin” ethanlin@gate.sinica.edu.tw • “Abhishek Singh Rana” rana@fnal.gov • Wiki created at • http://www.sdsc.edu/srb/index.php/SRM-SRB • Initial draft document published on high-level approach GGF-18 Data grid interoperability

  13. Demonstration - Web Browser • https://srb.npaci.edu/mysrb331reagan.shtml • Log onto shared collection at SDSC • Collection defined by port number and host machine • Differentiate between local collection and shared collection • Local collection - /home/user.domain/collection • Shared collection - /Zone/home/user.domain/collection • Web browser displays status of federated zone • Select remote data grid by clicking on zone • Browse metadata, list files, perform authorized operations GGF-18 Data grid interoperability

  14. Demonstration - Shell Commands • SRB shell commands located in ./SRB3_4_1/utilities/bin ./Sinit /* connect to default collection specified in .srb environment file authenticate yourself with challenge- response or GSI certificate */ ./Sls /* list collections and files */ ./Scd collection-name /* change to another collection */ ./Sufmeta -e stylesheet file /* extract metadata from a file */ ./Smeta file-name /* list user-defined metadata */ ./Squery -N namespace -S attributename /* query extensible schema */ GGF-18 Data grid interoperability

  15. Preservation Application • More detailed information provided in • Preservation Environments research group • Tuesday 10:00 - 11:30 • Room 158 A-B • Will also discuss next generation data management systems in PERG session • Rule-oriented data systems - iRODS • Support mapping of management policies to rules that are executed by the data management system • Assertions on integrity and authenticity • Assertions on data management - replication, data distribution • Assertions on access controls and display • http://www.sdsc.edu/srb/future/index.php/Main_Page GGF-18 Data grid interoperability

More Related