1 / 30

The SEEK EcoGrid: A Data Grid System for Ecology

UC DAVIS Department of Computer Science. San Diego Supercomputer Center. The SEEK EcoGrid: A Data Grid System for Ecology. Arcot Rajasekar (sekar@sdsc.edu) Matthew Jones (jones@nceas.ucsb.edu) Bertram Ludäscher (ludaesch@ucdavis.edu). Large collaborative NSF/ITR (2002-2007)

amory
Download Presentation

The SEEK EcoGrid: A Data Grid System for Ecology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UC DAVIS Department of Computer Science San Diego Supercomputer Center The SEEK EcoGrid: A Data Grid System for Ecology Arcot Rajasekar (sekar@sdsc.edu) Matthew Jones (jones@nceas.ucsb.edu) Bertram Ludäscher (ludaesch@ucdavis.edu)

  2. Large collaborative NSF/ITR (2002-2007) Bringing together ecologists, IT experts, CS researchers, … SEEK.ecoinformatics.org Science Environment for Ecological Knowledge

  3. What is SEEK? • Multidisciplinary research project to facilitate … • Access to ecological, environmental, and biodiversity data • Enable data sharing & re-use • Enhance data discovery at global scales • Scalable analysis and synthesis • Taxonomic, Spatial, Temporal, Conceptual integration of data, addressing data heterogeneity issues • Enable communication and collaboration for analysis • Enable re-use of analytical components

  4. SEEK Components Main Components: • Kepler • Problem-solving environment for scientific data analysis and visualization  “scientific workflows” • EcoGrid • Distributed data network for environmental, ecological, and systematics data • Making diverse environmental data systems interoperate • Semantic Mediation System • “Smart” data discovery and integration • Knowledge Representation WG • Taxon WG • BEAM WG • Education, Outreach, Training

  5. Ecological Metadata Language • Metadata: a means to manage ecological data • There is no universal data model for ecology • Accommodate heterogeneity and dispersion • EML • Common language for archiving and transporting data • Discovery information • Creator, Title, Abstract, Keyword, etc. • Content • Context • Physical, logical structure • SEEK adds semantic structure

  6. An Example EML Document Transform <?xml version="1.0"?> <eml:eml packageId="piscoUCSB.5.20" system="knb" xmlns:eml="eml://ecoinformatics.org/eml-2.0.0"> <dataset> <shortName>Alegria Temperatures</shortName> <title>PISCO: Intertidal Temperature Data: Alegria, California: 1996-1997</title> <creator id="C.Blanchette"> <individualName> <givenName>Carol</givenName> <surName>Blanchette</surName> </individualName> <organizationName>PISCO</organizationName> <address> <deliveryPoint>UCSB Marine Science Institute</deliveryPoint> <city>Santa Barbara</city> <administrativeArea>CA</administrativeArea> <postalCode>93106</postalCode> </address> </creator> <abstract> <para>These temperature data were collected at Alegria Beach, California, and were ... </para> </abstract> <keywordSet> <keyword>OceanographicSensorData</keyword> <keyword>Thermistor</keyword> <keywordThesaurus> PISCOCategories </keywordThesaurus> </keywordSet> <intellectualRights><para>Please contact the authors for permission to use these data. Please also acknowledge the authors in any publications.</para> </intellectualRights> <contact> <references>C.Blanchette</references> </contact> </dataset> </eml:eml>

  7. SEEK Overview

  8. Ecogrid Focus • Data and Metadata • Distributed Data • XML-based Metadata • Service to Semantic Mediation Layer • Access to Ontologies and Taxon Services • Helping with Semantic Data Integration • Service to Analysis and Modelling Layer • Interaction with Kepler - Workflows • Interaction with Grid Computing Facilities • Access to Legacy Apps • LifeMapper • Spatial Data Workbench

  9. SEEK EcoGrid • Goal: allow diverse environmental data systems to interoperate • Hides complexity of underlying systems using lightweight interfaces • Integrate diverse data networks from ecology, biodiversity, and environmental sciences • Data systems • Any system can implement these interfaces • Prototyping using: • Metacat, SRB, DiGIR, Xanthoria, etc. • Supports multiple metadata standards • EML, Darwin Core as foci

  10. 2 1 3 Morpho Diagram from http://www.w3.org/TR/2002/WD-ws-arch-20021114/ Web services • Service Oriented Architecture (SOA) • Remote discovery and execution of services • Network transport of data (HTTP) • Message format (SOAP/XML) • Service interface description (WSDL)

  11. Ecogrid defines a standard set of grid interfaces for use by many data servers Grid Services • A Grid service is a Web service • plus • Lifecycle management • (persisting the service over outages) • State management • (tracking sessions across multiple requests) • Factory services • (allowing many clients to connect) • Security • (authorization) • …

  12. EcoGrid Registry EcoGrid WSDL query(session, query) get(session, identifier) 2. Find service 3. Return service description 1. Publish 4. Execute search, handle response query() get() Morpho 5. Execute get, handle response EcoGrid Example

  13. Query Result EcoGrid Query Interfaces • Provides a mechanism for search and retrieval of metadata and federated data • Supports third party interaction with search results • forwarding of result set identifiers to another service instance for retrieval • Different levels of compliance • Low barrier for participation • Bulk of data will be accessible through Type I

  14. Result Query EcoGrid Query Level I • Basic, entry level exposure of data and metadata for EcoGrid and SEEK • Response contains data – intended for direct communications rather than 3rd party indirection ResultsetType query(SessionID,QueryType) byte[] get(SessionID,objectID)

  15. Query Query Conditions • Language independent representation of a query structure • Transformed into the appropriate native language of the data store Example: <AND> <condition operator="LIKE“ concept="ScientificName">peromyscus%</condition> <condition operator="NOT EQUALS“ concept="DecimalLatitude">NULL</condition> </AND>

  16. Query Specifying the Resultset • Specify the list of concepts (fields) to be returned in the resultset • Simple paths used to identify elements or document subtrees • Effectively flattens the structure of the records, but allows generic representation Example: <returnfield>/ScientificName</returnfield> <returnfield>/Longitude</returnfield> <returnfield>/Latitude</returnfield>

  17. Query Full Query Example <egq:query queryId="query-digir.1.1" system="http://knb.ecoinformatics.org" xmlns:egq="ecogrid://ecoinformatics.org/ecogrid-query-1.0.0beta1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-query-1.0.0beta1 ../../src/xsd/query.xsd"> <namespace prefix="darwin">http://digir.net/schema/conceptual/darwin/2003/1.0</namespace> <returnfield>/ScientificName</returnfield> <returnfield>/Longitude</returnfield> <returnfield>/Latitude</returnfield> <title>Peromyscus genus query</title> <condition operator="LIKE" concept="Genus">Peromyscus</condition> </egq:query>

  18. Result Query Result Set Structure <rs:resultset resultsetId="foo.1.1" system="urn:not://sure/what/to/put/here" xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1 ../../src/xsd/resultset.xsd"> <resultsetMetadata> <sendTime>2003-05-02T16:45:50-09:00</sendTime> <startRecord>1</startRecord> <endRecord>2</endRecord> <recordCount>2</recordCount> <namespace>http://digir.net/schema/conceptual/darwin/2003/1.0</namespace> <system id="1">http://speciesanalyst.net/digir/DiGIR.php?resource=MammalsDwC2</system> </resultsetMetadata> <record number="1" system="1" identifier="mvz1"> <returnField name="ScientificName">PEROMYSCUS LEUCOPUS</returnField> <returnField name="Longitude">100</returnField> <returnField name="Latitude">200</returnField> </record> … </rs:resultset>

  19. EcoGrid Get & Put • get enables retrieval of the content of a dataset/file such as SRB, MetaCat. • get also enables SQL querying of relational databases (Oracle, DB2, etc), which are pre-registered as a data source in SRB. • put for data: allows users to create (upload) files into EcoGrid resources such as MetCat, SRB. • put for metadata: Ecogrid put service also allows ingestion of metadata such as EML in MetaCat or User-defined metadata in SRB. • Depends on the availability of an authentication and access control system • put(sessionID, objectID, object, type) • delete(sessionID,objectID)

  20. NTL AND HBR VCR LUQ LTER Network (24) Natural History Collections (>> 100) Organization of Biological Field Stations (180) UC Natural Reserve System (36) Partnership for Interdisciplinary Studies of Coastal Oceans (4) Multi-agency Rocky Intertidal Network (60) Metacat node SRB node VegBank node DiGIR node Xanthoria node Legacy system Building the EcoGrid

  21. EcoGrid Client Interactions • Modes of interaction • Client-server • Fully distributed • Peer-to-peer • EcoGrid Registry • Node discovery • Service discovery • Aggregation services • Centralized access • Reliability • Data preservation

  22. Layers in EcoGrid

  23. EcoGrid Queries in Kepler

  24. Metadata-driven analysis cycle

  25. Status • Read, Query & Register Completed • Simple Registry Operational • EcoGrid Wrappers completed for: • MetaCat • SRB • DiGIR • Xanthoria • Available Interfaces • WSDL • Simple Web Interactivity • Kepler

  26. Acknowledgements This material is based upon work supported by: The National Science Foundation under Grant Numbers 9980154, 9904777, 0131178, 9905838, 0129792, and 0225676. PBI Collaborators: NCEAS, University of New Mexico (Long Term Ecological Research Network Office), San Diego Supercomputer Center, University of California, Davis, University of Kansas (Center for Biodiversity Research) Kepler contributors: SEEK, Ptolemy II, DOE SDM/SciDAC, GEON, and others.

  27. Q & A

  28. Frequently Asked Questions … • Which version of Grid services do you use? • We currently use 3.2.x because it was the last stable version based on OGSA. It seems that WSRF does not support the OGSA Factory pattern, which is the main Grid Service feature that we utilize and wouldn’t want to lose. We may migrate to WSRF eventually. • How can a user (or developer) discover what catalogs are on the EcoGrid? • In Kepler, click the "Sources" button on the Data tab. The UI allows a basic query of the EcoGrid registry to discover new nodes and choose which should be searched. • Developers can program to the EcoGrid Registry API. • How much is the EcoGrid *integrated*? Is there a common query language? • Yes, there is a common query syntax for expressing path-based metadata queries. This syntax does not do any mapping among various metadata languages. We still need of a system that can translate a query that uses terms from one metadata language (e.g., DarwinCore) into queries for another metadata language (e.g., EML). The SEEK SMS system will help with this mapping.

  29. Frequently Asked Questions … • Is the EcoGrid a "federation of federations" ? • In a sense. The EcoGrid is an *API* (specifically a Grid Services API) that allows clients to use a common set of communication protocols to access diverse data systems. The EcoGrid API has been implemented for Metacat, DIGIR, and SRB, all of which are federations. As clients can access the various systems via EcoGrid, the latter can be considered a federation of federations. The EcoGrid Registry has a list of systems that have published EcoGrid interfaces that are accessible to clients. • Where are the WSDLs? • http://ecogrid.ecoinformatics.org/ogsa/services/org/ecoinformatics/ecogrid/EcoGridQueryInterfaceLevelOneService?wsdl • What’s on the EcoGrid right now? • The KNB network is gathering data and metadata from NCEAS, 24 LTER sites, and about 200 other field stations (KNB EcoGrid node) • The DIGIR system federates access to museum collections data in the form of Darwin Core records. The EcoGrid node at KU points at this network of about ~150 museums that are accessible through DIGIR. • SRB is currently used to hold some data objects that are described via EML metadata records that are in the KNB Metacat.

  30. Frequently Asked Questions … • Where is the code for the EcoGrid? • Most code is in CVS at seek/projects/ecogrid. Some Kepler-specific client-side UI code is in the Kepler CVS. • http://cvs.ecoinformatics.org/cvs/cvsweb.cgi/seek/projects/ecogrid • There are also Ecogrid design docs, meeting notes, etc. • Are there plans for an "EcoGrid Portal" so that end users can access easily contribute data? • Yes, this is under development. In the interim, one can search the KNB and DIGIR sites individually, or use Kepler.

More Related