120 likes | 189 Views
End-to-End Data Services A Few Personal Thoughts Unidata Staff Meeting 2 September 2009.
E N D
End-to-End Data ServicesA Few Personal ThoughtsUnidata Staff Meeting2 September 2009
“Unidata’s vision calls for providing comprehensive, well-integrated and end-to-end data services for the geosciences. These include an array of functions for collecting, finding, and accessing data; data management tools for generating, cataloging, and exchanging metadata; and submitting or publishing, sharing, analyzing, visualizing, and integrating data.” Vision What does this vision statement mean to each of us?
Background • Providing real-time weather data (and related tools) was the primary reason why Unidata was created and it has been the bread and butter of Unidata’s mission for more than two decades. • But as our work has evolved, along with our community, it has become clear that just provision of real-time data or facilitating data access is not enough. • Hence the vision statement. • Let’s think about how some of the capabilities available on Amazon/Ebay/You Tube/Flickr can be facilitated for geosciences “data”.
Objectives 1. Create “integrated” data services across all stages of data life cycle, beginning with observations and ending with data curation/archiving. • A) Observations/Sensors Ingest Data collection systems Data providers Disseminate Users (both end users and data archival systems) • B) From beginning till end of a workflow (LEAD example): • Observations Ingest Analysis/Assimilation Prediction Output Dissemination Users (both end users and data repositories)
Imperatives • Integrated services does not imply a monolithic system, but a set of modular services that are configurable, flexible, extensible, and scalable. • Need think what [essential] services are needed by our users and the use cases. • Users include students, faculty, scientists, data providers, outreach providers, field project personnel • Use cases include class room & lab use, research studies, weather websites, field projects, projects like LEAD, portals, and data centers • Both programmatic and interactive invocation • We may not work on all of the functionalities ourselves but we need to facilitate as many of the as possible.
Strategies and Tactics • Integration achieved via both loosely and tightly coupled components and services • Incrementalism is the only practical option for a program like Unidata where many technologies already exist and resources are scarce. • Leverage as much as possible both our own technologies as well as what is available from the outside.
What do I mean by Data? • Scientific data (binary, ASCII, netCDF, HDF, XML, GRIB, …and XML) • Metadata (ASCII, XML, etc.) • Data in data bases (e.g., SQL) • GIS data (Shapefiles, KML, etc.) • Derived products from scientific data • Ancillary data objects (images, videos, documents – pdf, Word, html, ppt, etc.
Integration Capabilities • Different data types (feeds, obs., platforms, model output, and GIS information) • Different data formats • Data on different projections • Distributed data holdings • Data operation (e.g., GDS, netCDF operators) • Metadata addition • Integration of scientific data with metadata content, documents, and other information
Not develop ourselves but perhaps provide hooks to • Collaboration tools • Wikis • Forums • Blogs • Chat and IM/SMS • Social network apps (Facebook, Twitter, etc.) • RSS, email and other notifications
A list of possible data services • Data collection service (routine ingest via LDM, FTP, etc.) • Data submission service • Metadata service for submitting, editing, and exchanging metadata • Cataloging service • Data discovery service • Monitoring and notification service for new data, metadata, and products • Data access services • Data delivery/transport services, including copying/moving data to other servers and personal space and streaming data on demand • Security and authentication services • Subsetting service, including capability for progressive disclosure • Aggregation services for data and metadata • Services for CF conformance checking • Decoding and data translation services • Unit conversion services • Visualization and product generation services • Data fusion and data manipulation services (e.g., netCDF operator services) • GIS services • Output handling services
IMO, Beyond Unidata’s Scope • Data mining • Ontologies • Brokering and workflow orchestration • Federation and mediation • Provenance • Curation and stewardship
Final Thoughts • It is important that we develop consensus on what we mean by integrated, end-to-end data services. Therefore, we need to hear your thoughts. • Once we have an idea of what it is that we want to build, we need to agree on how to go about building it. • Again, I believe in an incremental approach, but there may be other ideas. With RAMADDA, THREDDS, netCDF, and LDM, many of the pieces already exist upon which to build E2E data services. • We need to identify the next steps and a concrete project in this potentially long journey. Develop a pilot effort? A prototype?