Opendap in the cloud optimizing the use of storage systems provided by cloud computing environments
Download
1 / 22

OPeNDAP James Gallagher, Nathan Potter and NOAA/NODC Deirdre Byrne, Jefferson Ogata, John Relph - PowerPoint PPT Presentation


  • 138 Views
  • Uploaded on

OPeNDAP in the Cloud Optimizing the Use of Storage Systems Provided by Cloud Computing Environments. OPeNDAP James Gallagher, Nathan Potter and NOAA/NODC Deirdre Byrne, Jefferson Ogata, John Relph 26 June 2013. Cloud Systems Now*. Providers: IBM, Microsoft, Amazon, Google, Rackspace, …

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' OPeNDAP James Gallagher, Nathan Potter and NOAA/NODC Deirdre Byrne, Jefferson Ogata, John Relph' - forest


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Opendap in the cloud optimizing the use of storage systems provided by cloud computing environments

OPeNDAP in the CloudOptimizing the Use of Storage Systems Provided by Cloud Computing Environments

OPeNDAP

James Gallagher, Nathan Potter

and

NOAA/NODC

Deirdre Byrne, Jefferson Ogata, John Relph

26 June 2013


Cloud systems now
Cloud Systems Now*

  • Providers: IBM, Microsoft, Amazon, Google, Rackspace, …

  • Microsoft: Azure “…handles 100 petabytes of data a day”

  • Amazon: “…hundreds of thousands of users”

  • Netflix: “…stopped building it’s own data centers in 2008;” all in Amazon by 2012

  • Snapchat: 4000 pictures per second; “…never owned a computer server.” (Google cloud)

*Quentin Hardy, “Google Joins a Heavyweight Competition in Cloud Computing,” NY Times, 3 December 2013


Full dataset

Why use OPeNDAP?

OPeNDAP request

100% Download

4% Download

  • TheOPeNDAP request smaller and is just the data the person wants

  • In cloud systems cost is a function of data transfer, in addition to to data stored, so smaller targeted requests reduce costs


NOAA Environmental Data Management Conceptual Cloud Architecture*

*Aadapted from NOAA Environmental Data Management Framework Draft v0.3

Appendix C - Dr. Jeff de La Beaujardière, NOAA Data Management Architect

Potential locations of cloud-enabled OPeNDAP instances


Constraints
Constraints Architecture*

  • No vendor lock-in!

  • No Stovepipes! - flexible storage method

    • What will be the client of 2020?

  • Hierarchical/human browsable

dataset

file

file

file


Data stores s3 and glacier
Data stores: S3 and Glacier Architecture*

  • S3

    • Spinning disk with a flat file system

    • Designed to make web-scale computing easier

  • Glacier

    • Near-line device with 4-hour (or >) access times

    • Secure and durable storage

  • EC2

    • EC2 was used to run the OPeNDAP data server

    • Linux


Using s3 as a data store
Using S3 as a Data Store Architecture*

HTTP GET & HEAD requests

S3

Catalog

Data


Web requests
Web requests Architecture*

S3

Catalog, or data request

XML or data file


Opendap catalog requests
OPeNDAP Catalog requests Architecture*

EC2

User catalog

Request

S3

catalog

cache

Catalog Access

OPeNDAP

Server

data

cache

XML File

THREDDS

catalog or

HTML

To enhance performance, data were accessed from S3 only when not already cached.


Opendap data requests
OPeNDAP Data requests Architecture*

EC2

User data

Request

S3

catalog

cache

Data Access

OPeNDAP

Server

data

cache

Data File

Data Slice

To enhance performance, data were accessed from S3 only when not already cached.


Observations
Observations Architecture*

  • S3FS & Amazon's APIs: vendor lock-in

  • XML catalogs were flexible:

    • Support both direct web and…

    • Subsetting server access

    • Likely adaptable to other use-cases

    • Easily support hierarchical structure

  • Catalogs didn't need to be stored in S3


Glacier and asynchronous responses
Glacier and Asynchronous Responses Architecture*

  • To use Glacier, a web service protocol must support asynchronous access! Glacier is a near-line device; not a spinning disk.

  • Support via protocol is not enough: typical use cases cannot be met without caching ‘metadata’

    • To support web interfaces/clients DAP metadata objects should be cached

    • To support smart clients, may need domain data in cache


Glacier implementation
Glacier Implementation Architecture*

  • Caching

    • Catalog

    • DAP metadata

  • Support for programmatic and web clients

    • Web clients are the primary user of the DAP metadata because of their ‘click and browse’ behavior

  • XML with an embedded XSL style sheet

    • Single response (XML)

    • Multiple target clients – smart and browser


Comparison s3 and glacier
Comparison: S3 and Glacier* Architecture*

  • Glacier provides “secure and durable storage”

  • S3 is “designed to make web-scale computing easier”

  • These graphs: A tiny part of complex cost model. They do not include the cost to move data out of the Amazon cloud, EC2 instances, etc.

*http://calculator.s3.amazonaws.com/calc5.html


Summary
Summary Architecture*

  • OPeNDAP server with minimal changes

  • Data stored in S3 and Glacier

  • Solution widely applicable: Web + Smart clients

  • Complexity of the cost model  combination of both S3 and Glacier likely

  • Modeling & Monitoring use required


ad