1 / 12

End-to-End Data Services A Few Personal Thoughts Unidata Staff Meeting 2 September 2009

End-to-End Data Services A Few Personal Thoughts Unidata Staff Meeting 2 September 2009.

sven
Download Presentation

End-to-End Data Services A Few Personal Thoughts Unidata Staff Meeting 2 September 2009

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. End-to-End Data ServicesA Few Personal ThoughtsUnidata Staff Meeting2 September 2009

  2. “Unidata’s vision calls for providing comprehensive, well-integrated and end-to-end data services for the geosciences. These include an array of functions for collecting, finding, and accessing data; data management tools for generating, cataloging, and exchanging metadata; and submitting or publishing, sharing, analyzing, visualizing, and integrating data.” Vision What does this vision statement mean to each of us?

  3. Background • Providing real-time weather data (and related tools) was the primary reason why Unidata was created and it has been the bread and butter of Unidata’s mission for more than two decades. • But as our work has evolved, along with our community, it has become clear that just provision of real-time data or facilitating data access is not enough. • Hence the vision statement. • Let’s think about how some of the capabilities available on Amazon/Ebay/You Tube/Flickr can be facilitated for geosciences “data”.

  4. Objectives 1. Create “integrated” data services across all stages of data life cycle, beginning with observations and ending with data curation/archiving. • A) Observations/Sensors  Ingest  Data collection systems  Data providers  Disseminate  Users (both end users and data archival systems) • B) From beginning till end of a workflow (LEAD example): • Observations  Ingest  Analysis/Assimilation  Prediction  Output  Dissemination  Users (both end users and data repositories)

  5. Imperatives • Integrated services does not imply a monolithic system, but a set of modular services that are configurable, flexible, extensible, and scalable. • Need think what [essential] services are needed by our users and the use cases. • Users include students, faculty, scientists, data providers, outreach providers, field project personnel • Use cases include class room & lab use, research studies, weather websites, field projects, projects like LEAD, portals, and data centers • Both programmatic and interactive invocation • We may not work on all of the functionalities ourselves but we need to facilitate as many of the as possible.

  6. Strategies and Tactics • Integration achieved via both loosely and tightly coupled components and services • Incrementalism is the only practical option for a program like Unidata where many technologies already exist and resources are scarce. • Leverage as much as possible both our own technologies as well as what is available from the outside.

  7. What do I mean by Data? • Scientific data (binary, ASCII, netCDF, HDF, XML, GRIB, …and XML) • Metadata (ASCII, XML, etc.) • Data in data bases (e.g., SQL) • GIS data (Shapefiles, KML, etc.) • Derived products from scientific data • Ancillary data objects (images, videos, documents – pdf, Word, html, ppt, etc.

  8. Integration Capabilities • Different data types (feeds, obs., platforms, model output, and GIS information) • Different data formats • Data on different projections • Distributed data holdings • Data operation (e.g., GDS, netCDF operators) • Metadata addition • Integration of scientific data with metadata content, documents, and other information

  9. Not develop ourselves but perhaps provide hooks to • Collaboration tools • Wikis • Forums • Blogs • Chat and IM/SMS • Social network apps (Facebook, Twitter, etc.) • RSS, email and other notifications

  10. A list of possible data services • Data collection service (routine ingest via LDM, FTP, etc.) • Data submission service • Metadata service for submitting, editing, and exchanging metadata • Cataloging service • Data discovery service • Monitoring and notification service for new data, metadata, and products • Data access services • Data delivery/transport services, including copying/moving data to other servers and personal space and streaming data on demand • Security and authentication services • Subsetting service, including capability for progressive disclosure • Aggregation services for data and metadata • Services for CF conformance checking • Decoding and data translation services • Unit conversion services • Visualization and product generation services • Data fusion and data manipulation services (e.g., netCDF operator services) • GIS services • Output handling services

  11. IMO, Beyond Unidata’s Scope • Data mining • Ontologies • Brokering and workflow orchestration • Federation and mediation • Provenance • Curation and stewardship

  12. Final Thoughts • It is important that we develop consensus on what we mean by integrated, end-to-end data services. Therefore, we need to hear your thoughts. • Once we have an idea of what it is that we want to build, we need to agree on how to go about building it. • Again, I believe in an incremental approach, but there may be other ideas. With RAMADDA, THREDDS, netCDF, and LDM, many of the pieces already exist upon which to build E2E data services. • We need to identify the next steps and a concrete project in this potentially long journey. Develop a pilot effort? A prototype?

More Related