1 / 33

Remote Data Access Working Group

Remote Data Access Working Group. Introductory Session. Remote Data Access Working Group Grid Forum 5 Reagan Moore. Summary of Working Group Activities Challenges: Rapid evolution of grid environments Pressure of application implementation Interactions with Grid Forum working groups.

Download Presentation

Remote Data Access Working Group

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Remote Data Access Working Group Introductory Session

  2. Remote Data Access Working GroupGrid Forum 5Reagan Moore Summary of Working Group Activities Challenges: Rapid evolution of grid environments Pressure of application implementation Interactions with Grid Forum working groups

  3. Organization • Name: Remote Data Access Working Group • Chairs: Reagan Moore, Ann Chervenak, John Karpovich • Document Editor: Eric Stephan • Charter: Interoperability between remote data access systems • Short-term goals: • Review “Summary of Data Grids” • Define framework for common functionality across data grids

  4. Working Group Liaisonsfor Requirements Lists Accounting Ed Hanna Grid Performance Brian Tierney Information Ann Chervenak Program Models Tracey Smith/Craig Lee Scheduling Judy Beiriger Security (Open position) User Services Judith Utley (Applications and Tools ) Ron Oldfield

  5. Semantics for Data Access • File based access • User owns the files • Globus, Nile, IBP • Object based access • Object is member of a class • Legion, CORBA, “Objectivity” • Collection based access • Collection owns files • Storage Resource Broker, Digital libraries

  6. Remote Data Access Architecture Convergence Application Application ? FTP Client SRBClient Replica Catalog Metadata Catalog Metadata Catalog ? FTP Daemon SRB Server Storage System Storage System SDSC Storage Resource Broker Globus

  7. Evolution of Data Management • A grid supports • Data management • Access to distributed storage systems • Users also require • Information management • Tagged attributes of the stored data sets • Knowledge management • Relationships between the concepts described by the data set attributes

  8. Architecture API that provides “glue” to underlying data handling systems (security, scheduling, QoS, access protocol, data format/model, adaptivity, info discovery, location control) Application + authentication + authorization Data Model Management Remote Procedure Execution Armada D’agents, FEL, ADR GRAM, SRB Information Discovery Data Handling Systems Condor, GASS, NILE, [SRB], I-2 caching (e.g., filtering) API that provides “glue” to underlying storage, QoS, etc. [GASS, IBP, SRB] Dynamic Info Discovery Storage System Description Storage Resources DPSS, HPSS, ADSM, DMF, Unitree, NASstore, DFS, DB2, Oracle, Illustra, Sybase, O2, ObjectStore, Objectivity (which perf. Monitor, what QoS, location, what access control, replication) GloPerf, Netlogger, NWS

  9. Information Based Grid Management Access Services Tagging of data Information Repository Attribute- based Query Attributes Semantics SDLIP Information XML DTD (Data Handling System - SRB / FTP / HTTP) Data Fields Containers Folders Storage (Replicas, Persistent IDs) Grids Feature-based Query MCAT/HDF

  10. Knowledge Based Grid Management Access Services Tagging of data Relationships Between Concepts Knowledge Repository for Rules Knowledge or Topic-Based Query / Browse Knowledge XTM DTD • Rules - KQL (Topic Maps / Buckets / Model-based Access) Information Repository Attribute- based Query Attributes Semantics SDLIP Information XML DTD (Data Handling System - SRB / FTP / HTTP) Data Fields Containers Folders Storage (Replicas, Persistent IDs) Grids Feature-based Query MCAT/HDF

  11. Emerging Applications • Virtual Data Products • NSF GriPhyN ITR project • Dynamically create product by application of analysis procedures • Information Repositories • Protein Data Bank • Support application of structural comparison algorithms • Collections • National Virtual Observatory • Federate sky surveys

  12. Current Papers • Remote Data Access Architectures • Presented at GF4 • Summary/survey of existing data grids • Presented at GF4 • Data Transport Protocol • GridFTP presentation at GF5

  13. Grid Forum 5 Sessions • Monday 11:00 - XML Tutorial • Information tagging • Relationship tagging • Monday 4:30 - GF/eGRID survey • Working group session to identify requirements • Tuesday 3:00 - GridFTP specification • Working group session on data transport protocol

  14. DATA Working Groups • GF/eGRID discussion • GridFTP discussion

  15. Architecture Working Group

  16. Grid Forum Architecture Working Group • Discussion of need for: • Network services perspective for designing protocols and APIs for Grid Forum services • Distributed Operating system perspective for designing an architecture (naming, binding, persistence, process management, storage) Led by Charlie Catlett

  17. Grid Forum Interactions

  18. Grid Forum Interactions

  19. GF/eGRID Discussion Group

  20. GF/eGRID DiscussionLed by Reagan Moore • What access protocols are of interest? • What latency hiding mechanisms are of interest? • Data streaming • Caching • Replicas • Containers for aggregation • Remote proxies for bundling I/O commands

  21. GF/eGRID Discussion • What are data management requirements? • Data collections • Information catalogs • Knowledge repositories • What is the granularity of the data management systems? • Collection size • Object size • Data set access size

  22. GF/eGRID Discussion • What is the time granularity? • (Execution rate) * (Number of operations) • (Transmission bandwidth) * (Number of bytes) • How many operations are done per byte accessed, Ops-per-Byte? • For your resources, is Ops-per-Byte ~ Execution rate / Bandwidth

  23. GF/eGRID Discussion • Common application exists across Japan, US, and Europe for the high energy physics community (CMS, Atlas, Babar) • NSF GriPhyN • DOE PPDG • CERN DataGrid • Japan ETL-KEK data grid • Analyze event data generated at CERN

  24. CERN Event Data • “File” oriented access • Latency is smaller than the analysis time • Objects managed as a collection • Collection - 1 PB/year, event is 1 MB in size, implies 1 billion events per year

  25. Data Access Requirements • Current implementation • Global object namespace • Global schema • Each site replicates the catalog the manages the global namespace and global schema • Current data model is based upon Objectivity

  26. Data Management • Objects identified by • Database/container/page/slot • Each database can be thought of as a file • Replication at the file level • Analysis time is 10-100 seconds per object • Suggests alternate management by • Object level access • Size of initial object is 1 MB • Derived products are 100 kB to 10 kB in size

  27. Object Level Access • Manage 5 billion objects • Requires ability to • Export objects (encapsulated within XML) • Access individual objects within Objectivity • Definition of procedure for manipulating/subsetting an object • Maintains • Global namespace and global schema • Allows migration between collections

  28. Common Requirements • Archive interface • Aggregation of objects into containers to minimize impact on archive namespace • Replication of objects to allow local analysis • Track where replicas are located to improve performance • Knowledge management for mapping between schema

  29. GridFTP Discussion Group

  30. GridFTP ProposalLed by Steven Tuecke • Extensions to the FTP standard • RFC 959 - FTP definition • RFC 2228 - Security • RFC 2389 - Feature negotiation • What extensions are needed by the Grid Forum to support large data transfers over wide area networks?

  31. Grid FTP • Add • Security extension - GSI • Partial file transfer - Unix semantics • Parallel I/O • Striped I/O • Buffer, window size tuning • Recoverable data transfers • Progress monitoring

  32. Timeline • E-Mail discussion of current draft • Next 2 months • Complete draft by June,2001 • Implementation by June, 2001 • Depending upon on further extensions • Definition of API is scope of another working group

  33. Participants • Steven Tuecke <tuecke@mcs.anl.gov> • Bill Alcock • Lee Liming • Ann Chervenak <annc@ISI.EDU> • John Karpovich <karp@virginia.edu> • Dan Gunter <dkgunter@lbl.gov> • Tiziana Ferrari <ferrari@cnaf.infn.it> • Parkson Wong <parkson@nas.nasa.gov> • Heinz Stockinger <heinz.stockinger@cern.ch> • Samuel Meder <meder@mcs.anl.gov> • Reagan Moore <moore@sdsc.edu>

More Related