1 / 22

The Data Grid: Towards an Architecture for the Large Scientific Datasets

The Data Grid: Towards an Architecture for the Large Scientific Datasets. Ann Cherveak Ian Foster Presented By Qing Ling. Outlines. Introduction: Data Grid Data Grid Design and Architecture Core Data Grid Services Higher-level Data Grid Services Implementations experiences

Download Presentation

The Data Grid: Towards an Architecture for the Large Scientific Datasets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Data Grid: Towards an Architecture for the Large Scientific Datasets Ann Cherveak Ian Foster Presented By Qing Ling

  2. Outlines • Introduction: Data Grid • Data Grid Design and Architecture • Core Data Grid Services • Higher-level Data Grid Services • Implementations experiences • Current status on Data Grid research

  3. Introduction • Demand for a new data management infrastructure Terabytes, Petabytes introduce the Grid computing • Data Grid focus: High-speed movement of large data objects

  4. Data Grid Design • Requirements: operate in wide area, multi-institutional, heterogeneous environments Thus cannot assume uniform behavior or policy • Four design principles Mechanism neutrality Policy neutrality Compatibility with Grid infrastructure Uniformity of information infrastructure

  5. Mechanism neutrality • Independent of the low-lover mechanisms Achieved by defining data access, third-party transfer, catalog access, and other interfaces to specified storage system, catalogs, data transfer algorithms

  6. Policy neutrality • Design decisions with significant performance implications are exposed to users as far as possible. Not a black box. • Data movement and replica cataloging are provided as basic low level operations whereas replication policy implemented via higher-level procedures. This procedures can be subsititued with application-specific code.

  7. Compatibility with Grid infrastructure • Integrate into the Grid infrastructure • To allow more specialized data grid tools to be compatible with lower level Grid mechanisms. Also simplify the implementation of strategies

  8. Uniformity of information infrastructure • Use the same data model and interface as in Grid structure: integration with other components in Grid computing environment Review: the four principles lead to a layered Data Grid architecture.

  9. Layered Structure High level components Replica selection Metadata repository Replica Management Core services Storage System ….. Data Grid Specific Services General Grid Services

  10. Core Data Grid Services • Two fundamental (low level) services: data access and metadata access • Data access: accessing, managing and initiating third-part transfers of data stored in storage system • Metadata access: accessing and managing the description of data (Metdata repository).

  11. Storage system and Data Access • Data Abstraction: Storage systems File instance: basic unit of information in a storage system Logical storage system: HPSS, DPSS by using such as Storage Resource Broker (SRB) Associate a set of properties with each file instance

  12. Storage system and Data Access (Continued) • Data Access APIs should support 1. Remote requests 2. Third party transfer operation to support optimized implementations of replica management services.

  13. Metadata Service • Category of Metadata: Application Metadata - domain dependent data are specified to be used by the application Replica Metadata - mapping file instance to location System configuration Metadata - fabric of data gird

  14. Metadata service (Conitued) • Repository (Catalog) of Metadata: Provide a mechanism for storing and querying metadata using a uniform interface.

  15. Metadata service (Conitued) • How the metadata retrieved: Identify and discover data set 1. Applications posing queries to a metadata service 2. Metadata repository (catalog) associates the characteristics specified by the applications with logical files. 3. Replica Manager use replica metadata to locate the physical file instance to be accessed.

  16. Metadata Service (Continued) • Difficulties of specifying a general structure for all metadata Application metadata Additional requirements in large-scale data grid environments • Above analysis leads to a hierarchical and distributed system - distributed directory service, such as a LDAP-like struture.

  17. Other basic service • Authorization and authentication: public key based GSI • Resource reservation and co-allocation • Performance measurements and estimation technology • Instrumentation services: enable end to end instrumentation of storage transfers and other operations

  18. Higher-level Data Grid Components • Two major components: Replica Management Replica Selection and Data Filtering

  19. Higher-level Data Grid Components • Replica Management Role of replica manger: create copies of file instances, maintains the repository and access control - Doesn’t determine when, where replica created, nor which replica to be accessed by application. Typically, replicas in the catalog will be byte-for-byte of one another but not required. A data grid may contain multiple replica catalogs.

  20. Higher-level Data Grid Components (Continued) • Replica selection and Data filtering Selection: based on desired criterion such as access performance, security, cost etc. or even create a new replica Data filtering: select a subset of a replica and form a new replica with its own characteristic, then submitted to replica manager.

  21. Implementation Experience • Climate Modeling Application An LDAP implementation • Data Visualization Application Separate metadata and replica catalogs for managing the data

  22. Current status • GridFTP as introduced in last class • Visit www.globus.org/datagrid/deliverables/ if interested: Draft RFC for GridFTP protocol, downloadabe GridFTP, Llibraries etc.

More Related