1 / 11

The Federated Data System DataFed: Experiences in Data Homogenization and Networking

The Federated Data System DataFed: Experiences in Data Homogenization and Networking. R.B. Husar, K. Hoijarvi, S. R. Falke, E. M. Robinson , Washington University, St. Louis G. Leptoukh , NASA GSFC. Spring AGU, May 29, 2008, Ft. Lauderdale. DataFed Motivated by GEOSS. DataFed in a Nutshell:

adanna
Download Presentation

The Federated Data System DataFed: Experiences in Data Homogenization and Networking

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Federated Data System DataFed:Experiences in Data Homogenization and Networking R.B. Husar, K. Hoijarvi, S. R. Falke, E. M. Robinson, Washington University, St. Louis G. Leptoukh, NASA GSFC Spring AGU, May 29, 2008, Ft. Lauderdale

  2. DataFed Motivated by GEOSS DataFed in a Nutshell: A Federation of autonomous, distributed data providers Performs non-intrusive wrappingof data into web services Provides service-based analysis services and tools General Experience with DataFed: It is an agile virtual data system can deliver info products to diverse users Third-party mediation can homogenize distributed data on the fly Since 2005, DataFed is used by EPA and in research DataFed development is guided by the meme of GEOSS

  3. Five practices for agile, seamless data federation: • Space-Time Query for standardized access to all data (WCS) • Data Wrappers for turning heterogeneous data into web services • Data Mediators for transforming data into ‘Views’ • Mashups for connecting autonomous application • DataSpaces for shared metadata by the users, for the users

  4. Parameter-Space-Time Query Using OGC WCS Data Access Protocol Grid Image Station Data Parameter Bounding Box Time Range Out Format Coverage=THEEDDS.T& BBOX=-126,24,-65,52,0,0 &TIME=2002-07-07/2002-07-07&FORMAT=NetCDF Coverage=SEAW.Refl& BBOX=-126,24,-65,52,0,0 &TIME=2002-07-07/2002-07-07&FORMAT=GeoTIFF Coverage=SURF.Bext& BBOX=-126,24,-65,52,0,0 &TIME=2002-07-07/2002-07-07&FORMAT=NetCDF-table • Regardless of the data location, data type and format, • theparameter-space-timequery is the same • the return is inuser selectable format from the offerings

  5. Third Party Data WrappersHeterogeneous input data >>> Homogeneous (WCS) Query DataFed wrappers are non-intrusive, third party

  6. Mediated User-Data InterfaceMediator turns data into Views Query Data Views Client-Server design is demanding: User carries the burden of integration Mediated Integration is a flexible design pattern for System of Systems

  7. Mashups: Loose Coupling of Autonomous ApplicationsDataFed – Wiki -- GoogleEarth Mashup Workflow SOAP RDF

  8. Community DataSpaces Services Shared Metadata by the Users, for the Users DataSpaces for Datasets GEOSS Core Service Offerors and Users Data Analyst find composes GEOSS Clearinghouse Catalog User visualizes extracts Service Workflow Community AQ Portal Reports to invokes Searches, harvests Catalog list Policy Analyst Community AQ Catalog GEOSS Comp. Registry registers Informs provides links to references publishes Decision Maker Service Offeror Standards; SIF Registry Adopted from Percivall, Feb 2008 by R. Husar, March 2008

  9. Wiki ‘DataSpaces’Creating and Sharing Metadata • Semantic Wiki: Structured (RDF and Unstructured Content • Open, Standard Matadata - RDF • Ready for Export/Harvesting by Registries, Catalogs Community Catalog - Find Dataset Describe Dataset Discuss Dataset ESIP Communal Wiki

  10. Sharing Best Practices: GEO Best Practice Wiki

  11. Developments and Challenges: Favorable Engineering Developments: • A Core network for Air Quality data sharing is emerging. • Standards are available for sharing previously unstructured data • Third-party mediation can homogenize the distributed data • Agile SOA-based systems can deliver info products to diverse users • Since 2005, one such IS, DataFed is usedby EPA and in research However: • Service interfaces are still uneven; networks are still fragile • The utility of social networking in science is not understood • Users can not provide feedback to upstream providers • Many cultural, legal and other barriers hamper progress

More Related