1 / 15

Data Grids

Data Grids. Jon Ludwig Leor Dilmanian Braden Allchin Andrew Brown. Outline. What is a Data Grid Components of a Data Grid Data Grids of Today Amazon S3 Web Service. What is a Data Grid?.

tania
Download Presentation

Data Grids

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Grids Jon LudwigLeor DilmanianBraden AllchinAndrew Brown

  2. Outline What is a Data GridComponents of a Data Grid Data Grids of TodayAmazon S3 Web Service

  3. What is a Data Grid? Distributed storage mechanism providing resources to computational gridsCheap, effective, and scalable means of recording information across multiple grid sitesThe resources, tools, and information products that can be used for data discovery and delivery from a variety of sources, typically used for the production of valuable information.

  4. Components of a Data Grid • Case study: NERC • CSML.  The Climate Science Modelling Language information Model. • The CSML Toolbox: Create and Manipulate documents which conform to the CSML schema. • The CSML Data Services.  Expose documents & data pointed to. • The NDG Data Graphical User Interface - Use web service to manipulate data • Moles Schema, XQuery definitions, related software, frontend browser • Discovery Gateways & Infrastructure • Vocabulary server

  5. Components Diagram

  6. Storage Resource Broker • Virtual data storage using namespaces • Maintains metadata on files, users, groups • Stored in relational DBMS • Queries supported • Has an API for other applications (e.g. Globus) • Sharing, transfer, backup

  7. Data Grids of Today • Biomedical Informatics Research Network (BIRN) • HP's Global File systems (SFS) collaboration • NSF's iVDGL (International Virtual Data Grid Laboratory) • Now part of OSG • European Union's DataGrid Project • Now part of the Enabling Grids for E-SciencE • Natural Environment Research Counsel (NERC) • Amazon Simple Storage Solution (S3)

  8. Amazon S3 • Amazon Simple Storage Service • Web Service - REST / SOAP / BitTorrent • Offload storage requirements to Amazon • Cost • Security • Scalable - Storage, availability, speed • Reliable - Fault tolerance, redundancy • Fast • Inexpensive - Commodity hardware • Simple - Data grid is abstracted • Flexible - Constraints

  9. Amazon S3 - Design Principles • Decentralization - Avoid SPoF • Asynchrony - Avoid waiting on communications • Autonomy -  • Local Responsibility - Nodes take care of themselves • Controlled Concurrency - Exposed operations require little or no concurrency • Failure Tolerance - Automatic recovery, minimal interruption • Controlled Parallelism - Recover quickly • Small Building Blocks • Symmetry - Nodes are identical in functionality, minimal configuration • Simplicity

  10. Amazon S3 - Functionality • Objects - Fundamental storage unit • 1B to 5GB • Metadata • Keys uniquely identify Objects • Buckets - Namespace for managing objects • Users own Buckets • Buckets contain Objects • Unlimited Objects per Bucket • Operations • Create, Read, Write, List, Delete • Replication

  11. Amazon S3 - Security • Public key authentication + HMAC • Access Control Lists for Buckets • Logging for Buckets • May use SSL • Integrity - MD5 • No data encryption

  12. Amazon S3 - Disadvantages • No renaming or moving of Buckets • No content-based search • No capping capabilities • Cost

  13. Amazon S3 - Costs • Storage • $0.15 per GB-Month of storage used • Data Transfer • $0.10 per GB - all data transfer in • $0.18 per GB - first 10 TB / month data transfer out • $0.16 per GB - next 40 TB / month data transfer out • $0.13 per GB - data transfer out / month over 50 TB • Requests • $0.01 per 1,000 PUT or LIST requests • $0.01 per 10,000 GET and all other requests

  14. References [2]: Baru, C.; Moore, R.; Rajasekar, A. & Wan, M. (1998), The SDSC storage resource broker, in 'CASCON '98: Proceedings of the 1998 conference of the Centre for Advanced Studies on Collaborative research', IBM Press, , pp. 5. [3] Amazon S3: http://www.amazon.com/s3 [4] S. Aktas, M.; C. Fox, G. & Pierce, M. "Distributed High Performance Grid Information Service" Indiana University, 2007 [5] Garfinkel, I.; Palankar & Ripeanu. "Amazon S3 for Science Grids: a Viable Solution?" International Workshop on Data-Aware Distributed Computing, 2008

  15. http://eu-datagrid.web.cern.ch/eu-datagrid/ http://www.ppdg.net/ http://ndg.nerc.ac.uk/S3 - http://www.amazon.com/gp/browse.html?node=16427261http://www.ivdgl.org/http://www.hp.com/techservers/hpccn/linux_gfs/index.htmlhttp://en.wikipedia.org/wiki/Data_grid

More Related