1 / 22

OPeNDAP James Gallagher, Nathan Potter and NOAA/NODC Deirdre Byrne, Jefferson Ogata, John Relph

OPeNDAP in the Cloud Optimizing the Use of Storage Systems Provided by Cloud Computing Environments. OPeNDAP James Gallagher, Nathan Potter and NOAA/NODC Deirdre Byrne, Jefferson Ogata, John Relph 26 June 2013. Cloud Systems Now*. Providers: IBM, Microsoft, Amazon, Google, Rackspace, …

forest
Download Presentation

OPeNDAP James Gallagher, Nathan Potter and NOAA/NODC Deirdre Byrne, Jefferson Ogata, John Relph

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OPeNDAP in the CloudOptimizing the Use of Storage Systems Provided by Cloud Computing Environments OPeNDAP James Gallagher, Nathan Potter and NOAA/NODC Deirdre Byrne, Jefferson Ogata, John Relph 26 June 2013

  2. Cloud Systems Now* • Providers: IBM, Microsoft, Amazon, Google, Rackspace, … • Microsoft: Azure “…handles 100 petabytes of data a day” • Amazon: “…hundreds of thousands of users” • Netflix: “…stopped building it’s own data centers in 2008;” all in Amazon by 2012 • Snapchat: 4000 pictures per second; “…never owned a computer server.” (Google cloud) *Quentin Hardy, “Google Joins a Heavyweight Competition in Cloud Computing,” NY Times, 3 December 2013

  3. Full dataset Why use OPeNDAP? OPeNDAP request 100% Download 4% Download • TheOPeNDAP request smaller and is just the data the person wants • In cloud systems cost is a function of data transfer, in addition to to data stored, so smaller targeted requests reduce costs

  4. NOAA Environmental Data Management Conceptual Cloud Architecture* *Aadapted from NOAA Environmental Data Management Framework Draft v0.3 Appendix C - Dr. Jeff de La Beaujardière, NOAA Data Management Architect Potential locations of cloud-enabled OPeNDAP instances

  5. Constraints • No vendor lock-in! • No Stovepipes! - flexible storage method • What will be the client of 2020? • Hierarchical/human browsable dataset file file file

  6. Data stores: S3 and Glacier • S3 • Spinning disk with a flat file system • Designed to make web-scale computing easier • Glacier • Near-line device with 4-hour (or >) access times • Secure and durable storage • EC2 • EC2 was used to run the OPeNDAP data server • Linux

  7. Using S3 as a Data Store HTTP GET & HEAD requests S3 Catalog Data

  8. Web requests S3 Catalog, or data request XML or data file

  9. OPeNDAP Catalog requests EC2 User catalog Request S3 catalog cache Catalog Access OPeNDAP Server data cache XML File THREDDS catalog or HTML To enhance performance, data were accessed from S3 only when not already cached.

  10. OPeNDAP Data requests EC2 User data Request S3 catalog cache Data Access OPeNDAP Server data cache Data File Data Slice To enhance performance, data were accessed from S3 only when not already cached.

  11. Observations • S3FS & Amazon's APIs: vendor lock-in • XML catalogs were flexible: • Support both direct web and… • Subsetting server access • Likely adaptable to other use-cases • Easily support hierarchical structure • Catalogs didn't need to be stored in S3

  12. Glacier and Asynchronous Responses • To use Glacier, a web service protocol must support asynchronous access! Glacier is a near-line device; not a spinning disk. • Support via protocol is not enough: typical use cases cannot be met without caching ‘metadata’ • To support web interfaces/clients DAP metadata objects should be cached • To support smart clients, may need domain data in cache

  13. Glacier Implementation • Caching • Catalog • DAP metadata • Support for programmatic and web clients • Web clients are the primary user of the DAP metadata because of their ‘click and browse’ behavior • XML with an embedded XSL style sheet • Single response (XML) • Multiple target clients – smart and browser

  14. Comparison: S3 and Glacier* • Glacier provides “secure and durable storage” • S3 is “designed to make web-scale computing easier” • These graphs: A tiny part of complex cost model. They do not include the cost to move data out of the Amazon cloud, EC2 instances, etc. *http://calculator.s3.amazonaws.com/calc5.html

  15. Summary • OPeNDAP server with minimal changes • Data stored in S3 and Glacier • Solution widely applicable: Web + Smart clients • Complexity of the cost model  combination of both S3 and Glacier likely • Modeling & Monitoring use required

More Related