1 / 26

EcoGrid in SEEK A Data Grid System for Ecology Bertram Ludaescher University of California, Davis

EcoGrid in SEEK A Data Grid System for Ecology Bertram Ludaescher University of California, Davis Arcot Rajasekar San Diego Supercomputer Center, University of California, San Diego. What is SEEK?. Science Environment for Ecological Knowledge (SEEK)

gaston
Download Presentation

EcoGrid in SEEK A Data Grid System for Ecology Bertram Ludaescher University of California, Davis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EcoGrid in SEEK A Data Grid System for Ecology Bertram Ludaescher University of California, Davis Arcot Rajasekar San Diego Supercomputer Center, University of California, San Diego

  2. What is SEEK? • Science Environment for Ecological Knowledge (SEEK) • Multidisciplinary research project to create: • Distributed data network (EcoGrid) • Environmental, ecological, and systematics data • Scalable systems for scientific analysis (workflow systems) • Systems for semi-automated data and model integration • Collaborators • NCEAS, UNM, SDSC, U Kansas • Vermont, Napier, ASU, UNC

  3. Science Environment for Ecological Knowledge Research Objectives • Access to ecological, environmental, and biodiversity data • Enable data sharing & re-use • Enhance data discovery at global scales • Scalable analysis and synthesis • Taxonomic, Spatial, Temporal, Conceptual integration of data • Address data heterogeneity issues • Enable communication and collaboration for analysis • Enable re-use of analytical components • Collaborators • NCEAS, UNM, SDSC, U Kansas • Vermont, Napier, ASU, UNC

  4. SEEK Overview

  5. SEEK Components Science Environment for Ecological Knowledge • Kepler • Modeling scientific workflows • EcoGrid • Making diverse environmental data systems interoperate • Semantic Mediation System • “Smart” data discovery and integration • Knowledge Representation WG • Taxon WG • BEAM WG • Education, Outreach, Training

  6. SEEK EcoGrid • Goal: allow diverse environmental data systems to interoperate • Hides complexity of underlying systems using lightweight interfaces • Integrate diverse data networks from ecology, biodiversity, and environmental sciences • Data systems • Any system can implement these interfaces • Prototyping using: • Metacat, SRB, DiGIR, Xanthoria, etc. • Supports multiple metadata standards • EML, Darwin Core as foci

  7. EcoGrid client interactions • Modes of interaction • Client-server • Fully distributed • Peer-to-peer • EcoGrid Registry • Node discovery • Service discovery • Aggregation services • Centralized access • Reliability • Data preservation

  8. Ecogrid Focus • Data and Metadata • Distributed Data • XML-based Metadata • Service to Semantic Mediation Layer • Access to Ontologies and Taxon Services • Helping with Semantic Data Integration • Service to Analysis and Modelling Layer • Interaction with Kepler - Workflows • Interaction with Grid Computing Facilities • Access to Legacy Apps • LifeMapper • Spatial Data Workbench

  9. EcoGrid Node

  10. Layers in EcoGrid

  11. Ecological Metadata Language • Metadata: a means to manage ecological data • There is no universal data model for ecology • Accommodate heterogeneity and dispersion • EML • Common language for archiving and transporting data • Discovery information • Creator, Title, Abstract, Keyword, etc. • Content • Context • Physical, logical structure • SEEK will add semantic structure

  12. An Example EML Document Transform <?xml version="1.0"?> <eml:eml packageId="piscoUCSB.5.20" system="knb" xmlns:eml="eml://ecoinformatics.org/eml-2.0.0"> <dataset> <shortName>Alegria Temperatures</shortName> <title>PISCO: Intertidal Temperature Data: Alegria, California: 1996-1997</title> <creator id="C.Blanchette"> <individualName> <givenName>Carol</givenName> <surName>Blanchette</surName> </individualName> <organizationName>PISCO</organizationName> <address> <deliveryPoint>UCSB Marine Science Institute</deliveryPoint> <city>Santa Barbara</city> <administrativeArea>CA</administrativeArea> <postalCode>93106</postalCode> </address> </creator> <abstract> <para>These temperature data were collected at Alegria Beach, California, and were ... </para> </abstract> <keywordSet> <keyword>OceanographicSensorData</keyword> <keyword>Thermistor</keyword> <keywordThesaurus> PISCOCategories </keywordThesaurus> </keywordSet> <intellectualRights><para>Please contact the authors for permission to use these data. Please also acknowledge the authors in any publications.</para> </intellectualRights> <contact> <references>C.Blanchette</references> </contact> </dataset> </eml:eml>

  13. Metadata driven data ingestion • Key information needed to read and machine process a data file is in the metadata • File descriptors (CSV, Excel, RDBMS, etc.) • Entity (table) and Attribute (column) descriptions • Name • Type (integer, float, string, etc.) • Codes (missing values, nulls, etc.) • Integrity constraints • In the future, this will include semantic typing

  14. Heterogeneous Data integration • Requires advanced metadata and processing • Attributes must be semantically typed • Collection protocols must be known • Units and measurement scale must be known • Measurement relationships must be known • e.g., that ArealDensity=Count/Area

  15. Ecological ontologies • What was measured (e.g., biomass) • Type of measurement (e.g., Energy) • Context of measurement (e.g., Psychotria limonensis) • How it was measured (e.g., dry weight) • SEEK intends to enable community-created ecological ontologies using OWL • Represents a controlled vocabulary for ecological metadata • More about this in Bertram’s talk

  16. EcoGrid Resources LUQ AND HBR VCR NTL LTER Network (24) Natural History Collections (>> 100) Organization of Biological Field Stations (180) UC Natural Reserve System (36) Partnership for Interdisciplinary Studies of Coastal Oceans (4) Multi-agency Rocky Intertidal Network (60) Metacat node SRB node VegBank node DiGIR node Xanthoria node Legacy system

  17. EcoGrid Resources EcoGrid Registry MetaCat EcoGrid Diggir Xanthoria SRB VegBank

  18. EcoGrid Node

  19. EcoGrid Query Service <AND> <condition operator="LIKE" concept="mdasCollectionName">/home/whywhere.seek</condition> <condition operator="LIKE" concept="ORIGINAL DATA SET">%World Geodetic System%</condition> <condition operator="EQUALS" concept="max. value">39.11</condition> </AND> Ecogrid Query adopts a query schema, Query Document Schema, as a common query language within Ecogrid. <?xml version="1.0" encoding="UTF-8"?> <egq:query queryId="test.1.1" system="http://knb.ecoinformatics.org" xmlns:egq="ecogrid://ecoinformatics.org/ecogrid-query-1.0.0beta1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-query-1.0.0beta1 ../../src/xsd/query.xsd"> <namespace prefix="eml">eml://ecoinformatics.org/eml-2.0.0</namespace> <returnfield>size</returnfield> <returnfield>owner</returnfield> <returnfield>min. value</returnfield> <returnfield>max. value</returnfield> <!-- <returnfield>value units</returnfield> --> <title>metadata query for Eco Models</title> <AND> <condition operator="LIKE" concept="mdasCollectionName">/home/whywhere.seek</condition> <condition operator="LIKE" concept="ORIGINAL DATA SET">%World Geodetic System%</condition> <condition operator="EQUALS" concept="max. value">39.11</condition> </AND> </egq:query>

  20. Ecogrid Services implementation for GET/PUT • The ‘get’ call from ecogrid client enables retrieval of the content of a dataset/file such as SRB, MetaCat. • The ‘get’ function also be enables SQL querying of relational databases (Oracle, DB2, etc), which are pre-registered as a data source in SRB. • Put for data: Ecogrid put service allows users to create (upload) files into EcoGrid resources such as MetCat, SRB. • Put for metadata: Ecogrid put service also allows ingestion of metadata such as EML in MetaCat or User-defined metadata in SRB.

  21. EcoGrid Queries in Kepler

  22. EML Metadata Display in Kepler

  23. EcoGrid Sources in Kepler

  24. Query Builder

  25. Status • Read, Query & Register Completed • Simple Registry Operational • EcoGrid Wrappers completed for: • MetaCat • SRB • DiGGiR • Xanthoria • Available Interfaces • WSDL • Simple Web Interactivity • Kepler

  26. Acknowledgements This material is based upon work supported by: The National Science Foundation under Grant Numbers 9980154, 9904777, 0131178, 9905838, 0129792, and 0225676. The National Center for Ecological Analysis and Synthesis, a Center funded by NSF (Grant Number 0072909), the University of California, and the UC Santa Barbara campus. The Andrew W. Mellon Foundation. PBI Collaborators: NCEAS, University of New Mexico (Long Term Ecological Research Network Office), San Diego Supercomputer Center, University of Kansas (Center for Biodiversity Research) Kepler contributors: SEEK, Ptolemy II, SDM/SciDAC, GEON

More Related