Towards semantic typing support for scientific workflows
This presentation is the property of its rightful owner.
Sponsored Links
1 / 80

Towards Semantic Typing Support for Scientific Workflows PowerPoint PPT Presentation


  • 83 Views
  • Uploaded on
  • Presentation posted in: General

Towards Semantic Typing Support for Scientific Workflows. Bertram Ludäscher Knowledge-Based Information Systems Lab San Diego Supercomputer Center University of California San Diego. http://seek.ecoinformatics.org. http://www.geongrid.org. Outline.

Download Presentation

Towards Semantic Typing Support for Scientific Workflows

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Towards semantic typing support for scientific workflows

Towards Semantic Typing Support for Scientific Workflows

Bertram Ludäscher

Knowledge-Based Information Systems Lab

San Diego Supercomputer Center

University of California San Diego

http://seek.ecoinformatics.org

http://www.geongrid.org


Outline

Outline

  • Motivation: Traditional vs Scientific Data Integration

  • Semantic (a.k.a. Model-Based) Mediation

  • Scientific Workflows (a.k.a. Analysis Pipelines)

  • DB Theory Appetizer: Web Service Composition Through Declarative Queries


Information integration challenges

Semantics

Structure

Syntax

  • reconciling S4heterogeneities

  • “gluing” together resources

  • bridging information and knowledge gaps computationally

System aspects

Information Integration Challenges

  • System aspects: “Grid” Middleware

    • distributed data & computing

    • Web Services, WSDL/SOAP, OGSA, …

    • sources = functions, files, data sets

  • Syntax & Structure:

  • (XML-Based) Data Mediators

    • wrapping, restructuring

    • (XML) queries and views

    • sources = (XML) databases

  • Semantics:

  • Model-Based/Semantic Mediators

    • conceptual models and declarative views

    • Knowledge Representation: ontologies, description logics (RDF(S),OWL ...)

    • sources = knowledge bases (DB+CMs+ICs)


Information integration from a db perspective

Information Integration from a DB Perspective

  • Information Integration Problem

    • Given: data sources S1, ..., Sk (DBMS, web sites, ...) and user questions Q1,..., Qn that can be answered using the Si

    • Find: the answers to Q1, ..., Qn

  • The Database Perspective: source = “database”

    • Si has a schema (relational, XML, OO, ...)

    • Sican be queried

    • define virtual (or materialized) integrated/global view Gover

      S1 ,..., Skusing database query languages(SQL, XQuery,...)

    • questions become queries Qi against G(S1,..., Sk)


Standard xml based mediator architecture

1. Query Q ( G (S1,..., Sk) )

6. {answers(Q)}

Integrated Global

(XML) View G

Integrated View

Definition

G(..) S1(..)…Sk(..)

MEDIATOR

(XML) View

(XML) View

(XML) View

web services as

wrapper APIs

Wrapper

Wrapper

Wrapper

S1

S2

Sk

Standard (XML-Based) Mediator Architecture

USER/Client

3. Q1 Q2 Q3

4. {answers(Q1)} {answers(Q2)} {answers(Q3)}


Query planning for mediators

Query Planning for Mediators

  • Given:

    • User query Q: answer(…)  …G ...

    • … & { G  … S … } global-as-view (GAV)

    • … & { S  … G … } local-as-view (LAV)

    • … & { false  … S … G… } integrity constraints (ICs)

  • Find:

    • equivalent (or min. containing, max.contained) query plan Q’: answer(…)  … S …

  • Results:

    • A variety of results/algorithms; depending on classes of queries, views, and ICs: P, NP,…, undecidable

    • many variants still open


From scientific data integration to process application integration and back

From Scientific Data Integration to Process & Application Integration (and back…)

  • Data Integration

    • Database mediation + Knowledge-based extension

       Query rewriting w/ GAV, LAV, ICs, access patterns

  • “Process/Application”Integration

    • Scientific models (ocean, atmosphere, ecology, …), assimilation models (e.g., real-time data feeds), …

    • Data sets

    • Legacy tools

    • Components = web services

    • Applications = composite components (“workflows”)

    • Need for semantic type extensions


Geologic map integration

Geologic Map Integration

  • Given:

    • Geologic maps from different state geological surveys (shapefiles w/ different data schemas)

    • Different ontologies:

      • Geologic age ontology

      • Rock type ontologies:

        • Multiple hierarchies (chemical, fabric, texture, genesis) from Geological Survey of Canada (GSC)

        • Single hierarchy from British Geological Survey (BGS)

  • Problem

    • Support uniform queries against the multiple geologic mapsusing different ontologies

    • Support registration w/ ontology A, querying w/ ontology B


Ontology mappings motivation

Ontology Mappings: Motivation

  • Establish correspondences between ontologies

     Integrate data sets which are registered to different ontologies

     Query data sets through different ontologies

Ontology A

register

Data set 1

queries

Ontology mappings

Ontology B

register

Data set 2


A multi hierarchical rock classification ontology gsc

A Multi-Hierarchical Rock Classification Ontology (GSC)

Genesis

Fabric

Composition

Texture


Some enabling operations on ontology data

Some enabling operations on “ontology data”

  • Concept expansion:

  • what else to look for when asking for ‘Mafic’

Composition


Some enabling operations on ontology data1

Some enabling operations on “ontology data”

  • Generalization:

  • finding data that is “like” X and Y

Composition


Implementation in owl not only for the machine

Implementation in OWL: Not only “for the machine” …


Geologic map integration1

+/- Energy

+/- a few hundred million years

GEON Metamorphism Equation:

Geoscientists + Computer Scientists Igneous Geoinformaticists

domain

knowledge

Geologic Map Integration

Knowledge representation

Ontologies!?

Nevada


Geology workbench registering data to an ontology step 1 choose classes

Choose an ontology class

Click on Submission

Select a shapefile

Data set name

Geology Workbench: Registering Data to an OntologyStep 1: Choose Classes


Geology workbench data registration step 2 choose columns for selected classes

It contains information about

geologic age

Geology Workbench: Data RegistrationStep 2: Choose Columns for Selected Classes


Geology workbench data registration step 3 resolve mismatches

Two terms are

not matched any

ontology terms

Manually mapping

algonkian into

the ontology

Geology Workbench: Data RegistrationStep 3: Resolve Mismatches


Geology workbench ontology enabled map integrator

All areas with the

age Paleozoic

Click on the name

Choose interesting

Classes

Geology Workbench: Ontology-enabled Map Integrator


Geology workbench change ontology

New query interface

Run it

Switch from Canadian

Rock Classification to

British Rock

Classification

Ontology mapping

between British Rock

Classification and Canadian

Rock Classification

Submit a mapping

Geology Workbench: Change Ontology


Ontologies and data management

Ontologies and Data Management

Ontology

use concepts from

(explicitly or implicitly)

Design

Artifact

Conceptual

Model

Conceptual

Model

Schema

Schema

Schema

Schema

 Metadata

Data

  • How to define and refine an ontology?

  • How to register a dataset to an ontology?


Refining an ontology the logic way enables source contextualization

Biomedical

Informatics

Research Network

http://nbirn.net

Refining an Ontology – the logic way, enables “Source Contextualization”


Connecting datasets to ontologies semantic registration

DataCollectionEvent

Measurement

MeasurementContext

MeasurableItem

SpeciesCount

SpeciesAbundance

AbundanceCollectionEvent

Location

LTERSite

SBLTERSite

{naples,…}

⊑ contains.Measurement

⊑measureOf.MeasurableItem ⊓hasContext.MeasurementContext

⊑ hasTime.DateTime ⊓ hasLocation.Location

⊑ hasUnit.Unit ⊓hasValue.UnitValue

⊑ MeasurableItem ⊓hasSpecies.Species ⊓hasUnit.RatioUnit …

⊑ Measurement ⊓measureOf.SpeciesCount

⊑ DataCollectionEvent ⊓contains.SpeciesAbundance

⊑ position.Coordinate

⊑ Location

⊑ LTERSite ⊓position.SBLTERCoordinate

⊑ SBLTERSite

Connecting Datasets to Ontologies:“Semantic Registration”

Ontology (snippet)

How can we “register”

the dataset to concepts in

the Ontology?

Dataset

DateSiteTransectSP_CodeCount

2000-09-08CARP1CRGI0

2000-09-08CARP4LOCH0

2000-09-08CARP7MUCA1

2000-09-22NAPL7LOCH1

2000-09-18NAPL1PAPA5

2000-09-28BULL1CYOS57


Purpose of semantic registration

Purpose of Semantic Registration

Expose “hidden” information:

  • What do attributes represent?

  • What do specific values represent?

  • What conceptual“objects” are in the dataset?

    Capture connections between the dataset and ontology to:

  • Find existing datasets (or parts of datasets) via ontological concepts (discovery)

  • Enable integration of datasets (mediation)

  • Generate metadata for new data products (in a pipeline)


Semantic registration framework

Semantic Registration Framework

Step 1: Data provider selects relevant ontological concepts (for the dataset)

Step 2: The semantic registration system creates a structural representation based on chosen concepts (data provide refines if needed)

Step 3: The data provider maps the dataset information to the generated structural representation


Step1 selecting relevant concepts

Step1: Selecting Relevant Concepts

Concepts from an Ontology

  • DataCollectionEvent

    • AbundanceCollectionEvent

  • Measurement

    • Abundance

      • SpeciesAbundance

  • MeasurementContext

  • Location

    • LTERSite

      • SBLTERSite

        • naples

  • Species

  • MeasurableItem

    • SpeciesCount

Dataset

DateSiteTransectSP_CodeCount

2000-09-08CARP1CRGI0

2000-09-08CARP4LOCH0

2000-09-08CARP7MUCA1

2000-09-22NAPL7LOCH1

2000-09-18NAPL1PAPA5

2000-09-28BULL1CYOS57


Step1 selecting relevant concepts1

Step1: Selecting Relevant Concepts

Concepts from an Ontology

  • DataCollectionEvent

    • AbundanceCollectionEvent

  • Measurement

    • Abundance

      • SpeciesAbundance

  • MeasurementContext

  • Location

    • LTERSite

      • SBLTERSite

        • naples

  • Species

  • MeasurableItem

    • SpeciesCount

Dataset

DateSiteTransectSP_CodeCount

2000-09-08CARP1CRGI0

2000-09-08CARP4LOCH0

2000-09-08CARP7MUCA1

2000-09-22NAPL7LOCH1

2000-09-18NAPL1PAPA5

2000-09-28BULL1CYOS57


Step2 generate object model

Step2: Generate Object Model

Concepts from an Ontology

  • DataCollectionEvent

    • AbundanceCollectionEvent

  • Measurement

    • Abundance

      • SpeciesAbundance

  • MeasurementContext

  • Location

    • LTERSite

      • SBLTERSite

        • naples

  • Species

  • MeasurableItem

    • SpeciesCount

Abundance

Collection Event

contains

measureOf

SpeciesCount

SpeciesAbundance

hasValue

hasSpecies

hasUnit

Species

RatioUnit

hasTime

hasLoc

RatioValue

SBLTERSite

DateTime


Scientific workflows

Scientific Workflows


Promoter identification workflow piw

Promoter Identification Workflow (PIW)

Source: Matt Coleman (LLNL)


Towards semantic typing support for scientific workflows

Source: NIH BIRN (Jeffrey Grethe, UCSD)


Ecology garp analysis pipeline for invasive species prediction

Archive

To Ecogrid

Registered

Ecogrid

Database

Registered

Ecogrid

Database

Registered

Ecogrid

Database

Registered

Ecogrid

Database

Test sample (d)

Species

presence &

absence points

(native range)

(a)

Native range prediction

map (f)

Training sample

(d)

GARP

rule set

(e)

Data

Calculation

Map

Generation

Map

Generation

EcoGrid

Query

EcoGrid

Query

Validation

Validation

User

Sample

Data

+A2

+A3

Model quality

parameter (g)

Generate

Metadata

Integrated

layers

(native range) (c)

Layer

Integration

Layer

Integration

+A1

Environmental layers (native

range) (b)

Invasion

area prediction map (f)

Selected

prediction

maps (h)

Model quality

parameter (g)

Integrated layers

(invasion area) (c)

Environmental layers (invasion area) (b)

Species presence &absence points (invasion area) (a)

Ecology: GARP Analysis Pipeline for Invasive Species Prediction

Source: NSF SEEK (Deana Pennington et. al, UNM)


Scientific workflows some findings

Scientific Workflows: Some Findings

  • More dataflow than (business) workflow

  • Need for “programming extension”

    • Iterations over lists (foreach); filtering; functional composition; generic & higher-order operations (zip, map(f), …)

  • Need for abstraction and nested workflows

  • Need for data transformations

  • Need for rich user interaction & workflow steering:

    • pause / revise / resume

    • select & branch; e.g., web browser capability at specific steps as part of a coordinated SWF

  • Need for high-throughput transfers (“grid-enabling”, “streaming”)

  • Need for persistence of intermediate products

     data provenance (“virtual data” concept)


Our starting point dataflow process networks and ptolemy ii

see!

Our Starting Point: Dataflow Process Networks and Ptolemy II

read!

try!

Source: Edward Lee et al. http://ptolemy.eecs.berkeley.edu/ptolemyII/


Kepler team projects sponsors

Ilkay Altintas SDM

Chad Berkley SEEK

Shawn Bowers SEEK

Jeffrey Grethe BIRN

Christopher H. Brooks Ptolemy II

Zhengang Cheng SDM

Efrat Jaeger GEON

Matt Jones SEEK

Edward A. Lee Ptolemy II

Kai Lin GEON

Ashraf Memon GEON

Bertram Ludaescher BIRN, GEON, SDM, SEEK

Steve Mock NMI

Steve Neuendorffer Ptolemy II

Mladen Vouk SDM

Yang Zhao Ptolemy II

Kepler Team, Projects, Sponsors

Ptolemy II


Commercial workflow dataflow systems

Commercial Workflow/Dataflow Systems


Scirun problem solving environments for large scale scientific computing

SCIRun: Problem Solving Environments for Large-Scale Scientific Computing

  • SCIRun: PSE for interactive construction, debugging, and steering of large-scale scientific computations

  • Component model, based on generalized dataflow programming

Steve Parker (cs.utah.edu)


E science and link up buddies

E-Science and Link-Up Buddies

  • … <UPDATE ME> …

    • Taverna, Scufl, Freefluo, ..

    • DiscoveryNet

    • Triana

    • ICENI


Dataflow process networks putting computation models first

typed i/o ports

FIFO

actor

actor

Dataflow Process Networks:Putting Computation Models first!

  • Synchronous Dataflow Network (SDF)

    • Statically schedulable single-threaded dataflow

      • Can execute multi-threaded, but the firing-sequence is known in advance

    • Maximally well-behaved, but also limited expressiveness

  • Process Network (PN)

    • Multi-threaded dynamically scheduled dataflow

    • More expressive than SDF (dynamic token rate prevents static scheduling)

    • Natural streaming model

  • Other Execution Models (“Domains”)

    • Implemented through different “Directors”

advanced push/pull


Promoter identification workflow piw1

Promoter Identification Workflow (PIW)

Source: Matt Coleman (LLNL)


Towards semantic typing support for scientific workflows

Execution

Semantics

Promoter

Identification

Workflow

in Ptolemy-II

[SSDBM’03]


Towards semantic typing support for scientific workflows

designed to fit

designed to fit

hand-crafted

Web-service actor

hand-crafted control solution; also: forces sequential execution!

No data transformations available

Complex backward control-flow


Simplified process network piw

Back to purely functional dataflow process network

(= a data streaming model!)

Re-introducing map(f) to Ptolemy-II (was there in PT Classic)

no control-flow spaghetti

data-intensive apps

free concurrent execution

free type checking

automatic support to go from piw(GeneId) to PIW :=map(piw) over [GeneId]

Simplified Process Network PIW

map(f)-style

iterators

Powerful type checking

Generic, declarative “programming” constructs

Generic data transformation actors

Forward-only, abstractable sub-workflow piw(GeneId)


Optimization by declarative rewriting

PIW as a declarative, referentially transparent functional process

optimization via functional rewriting possible

e.g. map(fog) = map(f) o map(g)

Details:

Technical report &PIW specification in Haskell

Optimization by Declarative Rewriting

map(fo g) instead ofmap(f) o map(g)

Combination of map and zip

http://kbi.sdsc.edu/SciDAC-SDM/scidac-tn-map-constructs.pdf


Web services scientific workflows in kepler

Web Services & Scientific Workflows in Kepler

  • Web services = individual components (“actors”)

  • “Minute-Made” Application Integration:

    • Plugging-in and harvesting web service components is easy and fast

  • Rich SWF modeling semantics (“directors” and more):

    • Different and precise dataflow models of computation

    • Clear and composable component interaction semantics

       Web service composition and application integration tool

  • Coming soon:

    • Shrinked wrapped, pre-packaged “Kepler-to-Go” (v0.8)

    • SWFs with structural and semantic data types (better design support)

    • Grid-enabled web services (for big data, big computations,…)

    • Different deployment models (SWF WS, web site, applet, …)


Kepler core capabilities 1 2

KEPLER Core Capabilities (1/2)

  • Designing scientific workflows

    • Composition of actors (tasks) to perform a scientific WF

  • Actor prototyping

  • Accessing heterogeneous data

    • Data access wizard to search

      and retrieve Grid-based resources

    • Relational DB access and query

    • Ability to link to EML data sources


Kepler core capabilities 2 2

KEPLER Core Capabilities (2/2)

  • Data transformation actors to link heterogeneous data

  • Executing scientific workflows

    • Distributed and/or local computation

    • Various models for computational semantics and scheduling

    • SDF and PN: Most common for scientific workflows

  • External computing environments:

    • C++, Python, C (… Perl--planned ...)

  • Deploying scientific tasks and workflows as web services themselves (… planned …)


The kepler gui vergil

The KEPLER GUI (Vergil)

Drag and drop utilities, director

and actor libraries.


Towards semantic typing support for scientific workflows

Running the workflow


Distributed swfs in kepler

Distributed SWFs in KEPLER

  • Web and Grid Service plug-ins

    • WSDL, and whatever comes after GWSDL

    • ProxyInit, GlobusGridJob, GridFTP, DataAccessWizard

  • WS Harvester

    • Imports all the operations of a specific WS (or of all

      the WSs in a UDDI repository) as Kepler actors

  • WS-deployment interface (…ongoing work…)

  • XSLT and XQuery transformers to link non-fitting services together


A generic web service actor

Configure - select service

operation

A Generic Web Service Actor

  • Given a WSDL and the name of an operation of a web service, dynamically customizes itself to implement and execute that method.


Set parameters and commit

Set Parameters and Commit

Set parameters

and commit


Ws actor after instantiation

WS Actor after Instantiation


Web service harvester

Web Service Harvester

  • Imports the web services in a repository into the actor library.

  • Has the capability to search for web services based on a keyword.


Composing 3 rd party wss

Output of previous

web service

Composing 3rd-Party WSs

Input of next

web service

User interaction &

Transformations


Classifying with kepler

Classifying with Kepler


Classifying with kepler1

Classifying with Kepler


Swf designed in kepler

SWF Designed in Kepler


Result launched via the browserui actor

Result launched via the BrowserUI actor


Querying example

Querying Example


Kepler and you

Kepler …

is a community-based, cross-project, open source collaboration

uses web services as basic building blocks

has a joint CVS repository, mailing lists, web site, …

is gaining momentum thanks to contributors and contributions

BSD-style license allows commercial spin-offs

a pre-packaged, shrink-wrapped version (“Kepler-to-GO”) coming soon to a place near you…

KEPLER and YOU


Now back to the semantics stuff

Now back to the “Semantics Stuff”


Semantic types for scientific workflows

Semantic Types for Scientific Workflows


From semantic to structural mappings

From Semantic to Structural Mappings


Structural and semantic mappings

Structural and Semantic Mappings


Summary i putting it all together for the science environment for ecological knowledge

AM: Analysis & Modeling System (KEPLER)

Execution Environment

EcoGrid

providesunified access toDistributed Data Stores , Parameter Ontologies, & Stored Analyses, and runtime capabilities via theExecution Environment

Semantic Mediation System & Analysis and Modeling SystemuseWSDL/UDDItoaccess services within the

EcoGrid, enabling analytically driven data discovery and integration

SEEK is the combination of EcoGrid data resources and information services, coupled with advanced semantic and modeling capabilities

SAS, MATLAB,FORTRAN, etc

Example of “AP0”

Analytical Pipeline (AP)

TS2

ASy

TS1

ASx

ASz

ASr

W

S

D

L

/

U

D

D

I

etc.

Parameters w/ Semantics

Data Binding

ASr

SemanticMediation System (SMS)

Semantic Mediation

Engine

AP0

j¬y

j¬ a

Invasive speciesover time

Logic Rules ECO2-CL

Query Processing

Library of Analysis Steps,Pipelines

& Results

WSDL/UDDI

WSDL/UDDI

C

C

Raw data setswrappedfor integrationw/ EML, etc.

Dar

C

C

ECO2

MC

EML

C

C

ParameterOntologies

Wrp

KNB

SRB

Species

...

ECO2

TaxOn

Summary I: Putting it all together for the Science Environment for Ecological Knowledge

  • Large collaborative NSF/ITR project: UNM, UCSB, UCSD, UKansas,..

  • Goals: global access to ecologically relevant data; rapidly locate and utilize distributed computation; (semi-)automate, streamline analysis process –“Knowledge Discovery Workflows”


Outline1

Outline

  • Motivation: Traditional vs Scientific Data Integration

  • Semantic (a.k.a. Model-Based) Mediation

  • Scientific Workflows (a.k.a. Analysis Pipelines)

  • DB Theory Appetizer: Web Service Composition Through Declarative Queries


Planning with limited access patterns back to gav mediation

Planning with Limited Access Patterns(back to GAV mediation …)

  • User query Q: answer(ISBN, Author, Title) 

    book(ISBN, Author, Title),

    catalog(ISBN, Author),

    not library(ISBN).

  • Limited (web service) APIs (access patterns):

    • Src1.books: in: ISBN out: Author, Title

    • Src1.books: in: Author out: ISBN, Title

    • Src2.catalog: in: {} out: ISBN, Author

    • Src3.library: in: {} out: ISBN

  • Note: Q is not executable, but feasible (equivalent to executable Q’: catalog ; book ; not library)


Query feasibility is as hard as containment

Query Feasibility is as hard as Containment

  • Theorem [EDBT’04]: For UCQneg queries Q:

    Q is feasible iff ans(Q) Q

  • The answerable part ans(Q) can be computed in quadratic time. Idea: scan Q for answerable literals, rescan, repeat until ans(Q) is reached

  • Checking query containmentQ1  Q2is hard:

    • Already NP-complete for CQ (conjunctive queries)

    • Undecidable for FO (first-order logic queries)


Conjunctive query containment

Conjunctive Query Containment

  • Given: conjunctive queries Q1, Q2 (aka Select-Project-Join queries)

  • Problem: Is answers(D, Q1)  answers(D, Q2) for all databases D?

  • If yes, we say that “Q1 is contained in Q2”; short: Q1  Q2

  • Examples:

    Q1: answer(X)  student(X, cs)

    Q2: answer(X)  student(X,Dept), advisor(X,Y), dept(Y,cs)

    Q3: answer(X)  student(X,Dept)

  • Quiz:

    • Q1  Q2 ?

    • No: not every student X necessarily has an adviser Y who is in the cs department!

    • Q1  Q3 ?

    • Yes: every cs student is student in some department

    • (crux of the “proof”: Dept = cs)

    • Homework: What about Q1  Q2 if we know that every student must have an advisor from the same department?


The world s shortest conjunctive query containment checker an np complete problem 7 lines in prolog

The World’s Shortest Conjunctive Query Containment Checker (an NP-complete problem): 7 lines in Prolog …

  • Quiz:

  • find the bug in the 7 lines of code

  • Fix the bug (hint: add one more line of code)

  • Moral: Short programs can be buggy too


Summary ii got milk eggs meat wool or die eierlegende wollmilchsau

Summary II: Got milk/eggs/meat/wool?Or: “Die eierlegende Wollmilchsau …”

  • Data Integration

    • query rewriting under GAV/LAV

    • w/ binding pattern constraints

    • distributed query processing

  • Semantic Mediation

    • semantic integrity constraints, reasoning w/ plans, automated deduction

    • deductive database/logic programming technology, AI “stuff”...

    • Semantic Web technology

  • Scientific Workflow Management

    • more procedural than database mediation (the scientist is the “query planner”)

    • deployment using web services


Science environment for ecological knowledge

AM: Analysis & Modeling System (KEPLER)

Execution Environment

EcoGrid

providesunified access toDistributed Data Stores , Parameter Ontologies, & Stored Analyses, and runtime capabilities via theExecution Environment

Semantic Mediation System & Analysis and Modeling SystemuseWSDL/UDDItoaccess services within the

EcoGrid, enabling analytically driven data discovery and integration

SEEK is the combination of EcoGrid data resources and information services, coupled with advanced semantic and modeling capabilities

SAS, MATLAB,FORTRAN, etc

Example of “AP0”

Analytical Pipeline (AP)

TS2

ASy

TS1

ASx

ASz

ASr

W

S

D

L

/

U

D

D

I

etc.

Parameters w/ Semantics

Data Binding

ASr

SemanticMediation System (SMS)

Semantic Mediation

Engine

AP0

j¬y

j¬ a

Invasive speciesover time

Logic Rules ECO2-CL

Query Processing

Library of Analysis Steps,Pipelines

& Results

WSDL/UDDI

WSDL/UDDI

C

C

Raw data setswrappedfor integrationw/ EML, etc.

Dar

C

C

ECO2

MC

EML

C

C

ParameterOntologies

Wrp

KNB

SRB

Species

...

ECO2

TaxOn

Science Environment for Ecological Knowledge

  • Large collaborative NSF/ITR project: UNM, UCSB, UCSD, UKansas,..

  • Goals: global access to ecologically relevant data; rapidly locate and utilize distributed computation; (semi-)automate, streamline analysis process –“Knowledge Discovery Workflows”


Building the ecogrid

LUQ

AND

HBR

VCR

NTL

Building the EcoGrid

Source: Matthew Jones (UCSB)

LTER Network (24) Natural History Collections (>> 100)

Organization of Biological Field Stations (180)

UC Natural Reserve System (36)

Partnership for Interdisciplinary Studies of Coastal Oceans (4)

Multi-agency Rocky Intertidal Network (60)

Metacat node

SRB node

VegBank node

DiGIR node

Xanthoria node

Legacy system


Heterogeneous data integration

Heterogeneous Data integration

  • Requires advanced metadata and processing

    • Attributes must be semantically typed

    • Collection protocols must be known

    • Units and measurement scale must be known

    • Measurement relationships must be known

      • e.g., that ArealDensity=Count/Area


Ecological ontologies

Ecological ontologies

  • What was measured (e.g., biomass)

  • Type of measurement (e.g., Energy)

  • Context of measurement (e.g., Psychotria limonensis)

  • How it was measured (e.g., dry weight)

  • SEEK intends to enable community-created ecological ontologies using OWL

    • Represents a controlled vocabulary for ecological metadata

  • More about this in Bertram’s talk


Semantic mediation

Semantic Mediation

  • Label data with semantic types (e.g. concept expressions in OWL)

  • Label inputs and outputs of analytical components with semantic types

  • Use reasoning engines to generate transformation steps

    • Observe analytical constraints

  • Use reasoning engine to discover relevant components

Data

Ontology

Workflow Components


  • Login