1 / 24

Globally Distributed Data and the Issues Faced in Building S uch a System

Globally Distributed Data and the Issues Faced in Building S uch a System. James Gallagher OPeNDAP. OPeNDAP is …. A non-profit corporation Funded by Federal agencies and other organizations Develops open-source software used to provide access over the Internet to scientific data

wendi
Download Presentation

Globally Distributed Data and the Issues Faced in Building S uch a System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Globally Distributed Dataand theIssues Faced in Building Such a System James Gallagher OPeNDAP

  2. OPeNDAP is … A non-profit corporation Funded by Federal agencies and other organizations Develops open-source software used to provide access over the Internet to scientific data Developed the Data Access Protocol … also a name often used to describe compatible software developed by others

  3. OPeNDAP’s Software is Based on the Client-Server Pattern • Client makes a request for data from the server • Server processes request • Uses local interface to read from the data store • Builds response • Returns response • Client receives/decodes response

  4. Client makes request Client Application Data Server Client Server Logic DAP DAP Data Access Network I/O Network I/O The Network Data

  5. Server reads data Client Application Data Server Client Server Logic DAP DAP Data Access Network I/O Network I/O The Network Data

  6. … and builds & returns a response Client Application Data Server Client Server Logic DAP DAP Data Access Network I/O Network I/O The Network Data

  7. DAP Provides a Common Request & Response Framework Client Application Data Server Client Server Logic DAP DAP Data Access Network I/O Network I/O The Network Data

  8. OPeNDAP Servers are all Over the World … here are just a few of the locations

  9. …and because DAP provides a uniform interface for access, users can access all available data without consideration for its local storage format

  10. Example: IDV

  11. IDV Accesses Local and Remote Data the Same Way

  12. Format Transparency ≠ Interoperability • Making a system that provides access regardless of local storage format is good… • But even data that are stored in the same format may not ‘play’ together • Problems: • Data model features provide for a great variety of structural representations • Metadata needed to use the data (e.g., units) may use different vocabulary

  13. The Matlab Toolboxes • These run either within Matlab or as a Standalone application – caveat: the latter is still in testing. • They provide two important features beyond format transparency: • Data are presented in geophysical units (values converted from their raw form) • Uniform structure and metadata • They can save data to the client computer using netCDF format files that use the CF-1.0 metadata and structure standard.

  14. The Matlab Toolbox: Provides access to ten major data sources

  15. This is the interface to HYCOM….

  16. Building Clients • Using the DAP APIs: C++; C; Python; Java • These interfaces encapsulate the networking • They provide direct access to the DAP data structures • Using netCDF: Build clients using the netCDF API • Instead of working with a network-centric API (DAP) this uses a file-based API • The library hides the network calls and ‘quirks’ • The DAP data model is hidden; data and accesses use the netCDF array-based model • Any program that uses netCDF can be switched to the DAP-enabled library with almost no effort.

  17. ‘OPeNDAP’ Resources Servers: Hyrax; TDS; PyDAP; Dapper; and GDS Kinds of Clients: Portals (LAS); Direct access (IDV, Matlab Toolbox); and API-based (Ferret, GrADS). Community: Active user and developer community; members help each other; most OPeNDAP software developed by the community Many Data Sites www.opendap.org

  18. Break

  19. The History of OPeNDAP* In 1993, The University of Rhode Island began the development of a system to ‘level the playing field’ so research and Federal labs would be equals – in some sense Work actually started in the late 1980s to early 1990s Driving force: Data distribution was dominated by Federal archive sites But much (most) data used was not really at those sites. It was held in ad hoc collections maintained by individual researchers *The remainder of these slides taken loosely from “NVODS and the Development of OPeNDAP”, Cornillon, et al.

  20. System Evolution The initial work began in 1993 and resulted in the Distributed Oceanographic Data System (DODS) DODS gave way to an effort to entrain a wider spectrum of data providers with a second project: the National Virtual Ocean Data System OPeNDAP was started when work on NVODS formally ended – we split our group into two parts: one to work on Ocean Science problems and one to continue the software development activities.

  21. Projects* Using OPeNDAP Clients SWFSC PMEL-LAS Servers COLA-GDS Data APDRC Unidata TDS PMEL-Dapper OPeNDAP URI-Matlab Toolbox IRI HyCOM *A subset of the...

  22. Lessons: Issues Identified Modularity provides flexibility: Seems obvious, but many systems are built as closed monoliths. It’s initially more work but the benefits build over time. Data will be stored in many formats Metadata will be similarly be heterogeneous New technology  Behavior change

  23. Issues Identified but not Addressed • While satellite and model data are easily represented by OPeNDAP, in situ and unstructured grids are not • ‘Inventory’ inconsistency is a huge barrier to wider use • Prototype success is a poor metric for operational success • Data searching systems are very fragile

  24. … still more Issues Time to real system maturity is on the order of ten years; funding cycles are generally three years Protocol extensions, particularly those involving server-side processing are unorganized. They provide increased client capability at the expense of reduced interoperability There is relatively little work on standardizing metadata to make data usable – most work is on discovery metadata

More Related