html5-img
1 / 41

Science Environment for Ecological Knowledge

UC Santa Barbara. U New Mexico. UC San Diego. U Kansas. Vermont, Napier, ASU, UNC. Science Environment for Ecological Knowledge . Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego. http://seek.ecoinformatics.org. Architecture Overview .

Donna
Download Presentation

Science Environment for Ecological Knowledge

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UC Santa Barbara U New Mexico UC San Diego U Kansas Vermont, Napier, ASU, UNC Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego http://seek.ecoinformatics.org

  2. Architecture Overview • Analysis & Modeling System • Design and execution of ecological models and analysis • End user focus • application-/upperware • Semantic Mediation System • Data Integration of hard-to-relate sources and processes • Semantic Types and Ontologies • upper middleware • EcoGrid • Access to ecology data and tools • middle-/underware (cf. GEON + Cyberinfrastructure) • Plus Working Groups: • – Knowledge Representation (SEEK-KR) • – Classification and Nomenclature (TAXON) • – Biodiversity and Ecological Analysis and Modeling (BEAM)

  3. SEEK EcoGrid • Goal: standardize interfaces (using web and grid services) • We have standardized data via EML • Integrate diverse data networks from ecology, biodiversity, and environmental sciences • Grid-standardized interfaces • Uniform interface to: • Metacat, SRB, DiGIR, Xanthoria, etc. • Anyone can implement these interfaces • Hides complexity of underlying systems • Metadata-mediated data access • Supports multiple metadata standards • EML, Darwin Core as foci • Computational services • Pre-defined analytical services • On-the-fly analytical services

  4. Grid versus Web Services • Grid Services are Web Services • Add authentication, lifecycle management, notification, etc. • Globus Toolkit 3: Implements Open Grid Services Architecture (OGSA) • Implications for use • Write a normal web service extending GridService base class • When deployed within GT3, you get these extra functions for ‘free’ • Supports distributed computation via proxy authentication • Problems • Complex system to understand • GT3 can be difficult to deploy • Proposals to incorporate grid services within the Web services community (Web Services Resource Framework [WSRF])

  5. EcoGrid client interactions • Modes of interaction • Client-server • Fully distributed • Peer-to-peer • EcoGrid Registry • Node discovery • Service discovery • Aggregation services • Centralized access • Reliability • Data preservation

  6. LUQ AND HBR VCR NTL Building the EcoGrid LTER Network (24) Natural History Collections (>> 100) Organization of Biological Field Stations (180) UC Natural Reserve System (36) Partnership for Interdisciplinary Studies of Coastal Oceans (4) Multi-agency Rocky Intertidal Network (60) Metacat node SRB node VegBank node DiGIR node Xanthoria node Legacy system

  7. Kepler: Scientific Workflows Query EcoGrid to find data Archive output to EcoGrid EML provides semi-automated data binding Scientific workflows represent knowledge about the process; Kepler captures this knowledge

  8. DiGIR Species presence &absence points (invasion area) (a) Test sample (d) DiGIR Species presence & absence points (native range) (a) Native range prediction map (f) Training sample (d) GARP rule set (e) Data Calculation EcoGrid Query EcoGrid Query Map Map Validation User Validation Sample +A3 +A2 Model quality parameter (g) Integrated layers (native range) (c) Layer Integration Layer Integration +A1 SRB Environmental layers (native range) (b) Model quality parameter (g) SRB Environmental layers (invasion area) (b) Integrated layers (invasion area) (c) Invasion area prediction map (f) GARP Invasive Species Model Scientific workflows represent knowledge about the process; AMS captures this knowledge Slide from D. Pennington

  9. Ilkay Altintas SDM Chad Berkley SEEK Shawn Bowers SEEK Jeffrey Grethe BIRN Christopher H. Brooks Ptolemy II Zhengang Cheng SDM Efrat Jaeger GEON Matt Jones SEEK Edward A. Lee Ptolemy II Kai Lin GEON Bertram Ludäscher BIRN, GEON, SDM, SEEK Steve Mock NMI Steve Neuendorffer Ptolemy II Jing Tao SEEK Mladen Vouk SDM Yang Zhao Ptolemy II … Kepler Team, Projects, Sponsors Ptolemy II

  10. Kepler Understands EML Data (Chad Berkley, SEEK)

  11. Kepler: Ecological Modeling(Chad Berkley, SEEK)

  12. Database Access (Efrat Jaeger, GEON) Note: EML descriptions of relational sources would allow automated data ingestion

  13. Mineral Classification with Kepler … (Efrat Jaeger, GEON)

  14. … inside the Classifier

  15. Standard BrowserUI: Client-Side SVG

  16. SWF Reengineering (Ilkay, SDM; Ashraf, Efrat, Kai, GEON)

  17. DataMapper Sub-Workflow

  18. Result launched via BrowserUI actor(coupling with ESRI’s ArcIMS)

  19. Distributed Workflows in KEPLER • Web and Grid Service plug-ins • WSDL (now) and Grid services (stay tuned …) • ProxyInit, GlobusGridJob, GridFTP, DataAccessWizard • SSH, SCP, SDSC SRB, OGS?-???… coming • WS Harvester • Import query-defined WS operations as Kepler actors • XSLT and XQuery Data Transformers • to link not “designed-to-fit” web services • WS-deployment interface (planned)

  20. Configure - select service operation Web Service Actor (Ilkay Altintas, SDM) • Given a WSDL and the name of an operation of a web service, dynamically customizes itself to implement and execute that method.

  21. Set Parameters and Commit Set parameters and commit

  22. Specialized WS Actor (after instantiation)

  23. Web Service Harvester (Ilkay Altintas, SDM) • Imports the web services in a repository into the actor library. • Has the capability to search for web services based on a keyword.

  24. Kepler: Grid Services Access(Steve Mock, NMI)

  25. g f X Y Z An (oversimplified) Model of the Grid • Hosts: {h1, h2, h3, …} • Data@Hosts: d1@{hi}, d2@{hj}, … • Functions@Hosts: f1@{hi}, f2@{hj}, … • Given: data/workflow: • … as a functional plan: […; Y := f(X); Z := g(Y); …] • … as a logic plan: […; f(X,Y)g(Y,Z); …] • FindHost Assignment: di hi , fj hj for all di ,fj … s.t. […; d3@h3 := f@h2(d1@h1), …] is a valid plan

  26. f@a f@a f@a f@a x@b x@b x@b x@b y@c y@c y@c y@c Shipping & Handling Algebra (SHA) Logical view (1) • plan Y@C = F@A of X@B = • [ X@B to A, Y@A := F@A(X@A), Y@A to C ] • [ F@A => B, Y@B := F@B(X@B), Y@B to C ] • [ X@B to C, F@A => C, Y@C := F@C(X@C) ] (2) (3) Physical view: SHA Plans

  27. Grid-Enabling PTII: Handles • AGA: get_handle • GAA: return &X • AB: send &X • BGB: request &X • GBGA: request &X • GA GB: send *X • GBB: send done(&X) • Example: • &X = “GA.17” • *X =<some_huge_file> • Candidate Formalisms: • GridFTP • SSH, SCP • SDSC SRB • OGS?-??? … WSRF? Logical token transfer (3) requires get_handle(1,2); then exec_handle(4,5,6,7) for completion. Keplerspace 3 A B 4 7 2 1 5 Gridspace GA GB 6

  28. Homogeneous Data Integration • Integration of homogeneous or mostly homogeneous data via EML metadata is relatively straightforward

  29. Heterogeneous Data integration • Requires advanced metadata and processing • Attributes must be semantically typed • Collection protocols must be known • Units and measurement scale must be known • Measurement relationships must be known • e.g., that ArealDensity=Count/Area

  30. Semantic Mediation • Label data with semantic types • Label inputs and outputs of analytical components with semantic types • Use reasoning engines to generate transformation steps • Beware analytical constraints • Use reasoning engine to discover relevant components Data Ontology Workflow Components

  31. Ecological ontologies • What was measured (e.g., biomass) • Type of measurement (e.g., Energy) • Context of measurement (e.g., Psychotria limonensis) • How it was measured (e.g., dry weight) • SEEK intends to enable community-created ecological ontologies using OWL • Represents a controlled vocabulary for ecological metadata

  32. Extensions: Semantic Types • Take concepts and relationships from an ontology to “semantically type” the data-in/out ports • Application: e.g., design support: • smart/semi-automatic wiring, generation of “massaging actors” m1 (normalize) p3 p4 Takes Abundance Count Measurements for Life Stages Returns Mortality Rate Derived Measurements for Life Stages

  33. Semantic Types • The semantic type signature • Type expressions over the (OWL) ontology m1 (normalize) p3 p4 SemType m1 :: Observation & itemMeasured.AbundanceCount & hasContext.appliesTo.LifeStageProperty -> DerivedObservation & itemMeasured.MortalityRate & hasContext.appliesTo.LifeStageProperty

  34. Extended Type System (here: OWL Semantic Types) SemType m1 :: Observation & itemMeasured.AbundanceCount & hasContext.appliesTo.LifeStageProperty  DerivedObservation & itemMeasured.MortalityRate & hasContext.appliesTo.LifeStageProperty Substructure association: XML raw-data =(X)Query=> object model =link => OWL ontology

  35. Semantic Types for Scientific Workflows

  36. Deriving Data Transformations from Semantic Service Registration [Bowers-Ludaescher, DILS’04]

  37. Structural and Semantic Mappings [Bowers-Ludaescher, DILS’04]

  38. SEEK Impact • Fundamental improvements for researchers • Global access to ecologically relevant data • Rapidly locate and utilize distributed computation • Capture, reproduce, extend analysis process

  39. Acknowledgements This material is based upon work supported by: The National Science Foundation under Grant Numbers 9980154, 9904777, 0131178, 9905838, 0129792, and 0225676. PBI Collaborators: NCEAS, University of New Mexico (Long Term Ecological Research Network Office), San Diego Supercomputer Center, University of Kansas (Center for Biodiversity Research) Kepler contributors: SEEK, Ptolemy II, SDM/SciDAC, GEON

More Related