geon it advances data integration geon workbench scientific workflows n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows PowerPoint Presentation
Download Presentation
GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows

Loading in 2 Seconds...

play fullscreen
1 / 45

GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows - PowerPoint PPT Presentation


  • 105 Views
  • Uploaded on

GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows. Bertram Lud ä scher Kai Lin Ilkay Altintas Efrat Jaeger. San Diego Supercomputer Center University of California, San Diego. The Problem: Scientific Data Integration or: … from Questions to Queries ….

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows' - jered


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
geon it advances data integration geon workbench scientific workflows

GEON IT Advances:⁃ Data Integration⁃ GEON Workbench⁃ Scientific Workflows

Bertram Ludäscher

Kai Lin

Ilkay Altintas

Efrat Jaeger

San Diego Supercomputer Center

University of California, San Diego

www.geongrid.org

the problem scientific data integration or from questions to queries
The Problem: Scientific Data Integrationor: … from Questions to Queries …

www.geongrid.org

information integration challenges s 4 heterogeneities
Information Integration Challenges: S4 Heterogeneities
  • Systems Integration
    • platforms, devices, data & service distribution, APIs, protocols, …

 Grid middleware technologies

+ e.g. single sign-on, platform independence, transparent use of remote resources, …

  • Syntax & Structure
    • heterogeneous data formats (one for each tool ...)
    • heterogeneous data models (RDBs, ORDBs, OODBs, XMLDBs, flat files, …)
    • heterogeneous schemas(one for each DB ...)

 Database mediation technologies

+ XML-based data exchange, integrated views, transparent query rewriting, …

  • Semantics
    • fuzzy metadata, terminology, “hidden” semantics, implicit assumptions, …

 Knowledge representation & semantic mediation technologies

+ “smart” data discovery & integration

+ e.g. ask about X (‘mafic’); find data about Y (‘diorite’); be happy anyways!

www.geongrid.org

information integration challenges s 5 heterogeneities
Information Integration Challenges: S5 Heterogeneities
  • Synthesis of analysis pipelines, integrated apps & data products, …
    • How to make use of these wonderful things & put them together to solve a scientist’s problem?
  • Scientific Problem Solving Environments
    • GEON Portal and Workbench (“scientist’s view”)

+ ontology-enhanced data registration, discovery, manipulation

+ creation and registration of new data products from existing ones, …

    • GEON Scientific Workflow System (“engineer’s view”)

+ for designing, re-engineering, deploying analysis pipelines and scientific workflows; a tool to make new tools …

+ e.g., creation of new datasets from existing ones, dataset registration,…

www.geongrid.org

ontology enabled application example geologic map integration

domain

knowledge

Knowledge representation

AGE ONTOLOGY

Show formations where AGE = ‘Paleozic’

(without age ontology)

Show formations where AGE = ‘Paleozic’

(with age ontology)

Nevada

+/- a few hundred million years

Ontology-Enabled Application Example:Geologic Map Integration

www.geongrid.org

querying by chemical composition results
Querying by Chemical Composition: Results

Note the fine differences in shades of gray:

DO know:

It’s NOT there!

DON’T know!

(not registered)

OK – we got to work on the color coding ;-)

www.geongrid.org

querying w british rock classification brc
Querying w/ British Rock Classification (BRC)

Uses a GSC  BRC inter-ontology articulation mapping

www.geongrid.org

british rock classification query results
British Rock Classification Query: Results

Uses a GSC  BRC inter-ontology articulation mapping

www.geongrid.org

need for knowledge enabled integration

DataTables

GeoChemDB

GeolAgeDB

Need for Knowledge-enabled Integration
  • A geologist analyzing chemical data from a pluton finds no recognizable correlation between variables.
    • What possible scenarios can he examine to understand this heterogeneity?
  • Measured ages also show a scatter
    • What is the significance of the observed spread in measure time?
  • Knowledge Representation

Research:

  • concept maps & ontologies
  • process maps & ontologies
  • semantic types
  • … to facilitate (even) “smarter” tools

www.geongrid.org

a prerequisite resource registration
A Prerequisite: Resource Registration

(1a) Register ontologies

  • geologic age; rock classifications (GSC, BGS), seismology; …

(1b) optionally: register inter-ontology articulations

  • e.g. GSC ontology  BGS ontology

(2a) Item-level dataset registration

  • ADN metadata; other controlled vocabularies & ontologies (e.g. geologic age timescale (USGS), SWEET (NASA), …)

(2b) Item-detail registration

  • e.g. associate values in a column with a concept

(3) Use ontology-based query UI / application

  • e.g. query by geologic age and chemical composition

www.geongrid.org

demonstration preview
Demonstration Preview

NOTE: A technology demonstration, not a content

demonstration (vocabulary, ontology, maps, …)

  • Ontology Registration (geologicAge.owl)
  • Dataset Registration (myShapeFiles.zip)
  • Item-Level Association (12)
  • GEONsearch
    • metadata, spatial, temporal, concept-based
  • GEONworkbench
    • use of workspace e.g. composing new maps from existing ones

… resume with GEON workflow overview

www.geongrid.org

demonstration preview1

Other distributed apps

Kepler, DLESE, …

Client Access (via web services)

myOntology.owl

myDataset.foo

Search condition(s)

spatialtemporalconcept

User actions

add delete manipulate

metadata

metadata

ResourceRegistration

GEONsearch

GEONworkbench

GEON

Workspace

(user)

GEON

Catalog

SRB

Log

GEONmiddleware

external services

Gazetteer,

DLESE, …

Geologic Age,

Chronos, …

Demonstration Preview

User Access (via Portal)

www.geongrid.org

dataset to ontology registration item level

Domain Knowledge Ontologies

Arizona

www.geongrid.org

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES

19

Dataset to Ontology Registration (Item-level)

www.geongrid.org

scientific problem solving environments
Scientific Problem Solving Environments
  • GEON Portal and Workbench (“scientist’s view”)

 previousdemonstration

    • a workbench for using existing/integrated tools
  • Kepler Workflow System (“engineer’s view”)
    • for (semi-)automating “scientific workflows” and “analysis pipelines”
    • a tool for making and deploying new tools
    • some features:
      • … low-level plumbing to high-level conceptual flows …
      • connect reusable components (“actors”, “boxes”) to form apps
      • abstraction via nesting of subworkflows into composite actors
      • deploy automated workflows on the Grid and/or with custom Uis
    • demonstrations available (“Kepler2Go-1.” CD for Summer Institute)

www.geongrid.org

a kepler scientific workflow

www.geongrid.org

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES

22

A Kepler Scientific Workflow

inline documentation

canvas for design and

execution monitoring

component (actor) libraries

www.geongrid.org

geon dataset extraction processing

Translating query xml response to web service xml input format.

worldImage

XML SOAP response

Look Inside

Sample

GEON Dataset Extraction & Processing

www.geongrid.org

geon dataset registration

Annotation form

www.geongrid.org

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES

24

GEON Dataset Registration

www.geongrid.org

geon dataset registration1

ADN metadata

Metadata display

validation

www.geongrid.org

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES

25

GEON Dataset Registration

Registering

www.geongrid.org

putting it all together

www.geongrid.org

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES

26

Putting it all together …

www.geongrid.org

geon workflows kepler

HPC workflow

www.geongrid.org

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES

27

GEON Workflows & KEPLER

http://kepler-project.org

www.geongrid.org

using kepler for geological data integration workflows

Using Kepler for Geological Data Integration Workflows

Ilkay Altintas

presenting joint GEON work of:

Efrat Jaeger Bertram Ludäscher

Kai Lin Ashraf Memon

San Diego Supercomputer Center

University of California, San Diego

www.geongrid.org

some requirements for a scientific workflow system 1 2
Some Requirements for a Scientific Workflow System (1/2)
  • …it should work… (No kidding!)

USER REQUIREMENTS:

  • Design tools-- especially for non-expert users
  • Ease of use-- fairly simple user interface having more complex features hidden in the background
  • Reusable generic features
    • Generic enough to serve to different communities but specific enough to serve one domain (e.g. geosciences)
  • Extensibility for the expert user-- almost a visual programming interface
  • Registration and publication of data products and “process products” (=workflows); provenance

www.geongrid.org

some requirements for a scientific workflow system 2 2
Some Requirements for a Scientific Workflow System (2/2)

TECHNICAL REQUIREMENTS:

  • Error detection and recovery from failure
    • Logging information for each workflow
  • Allow data-intensive and compute-intensive tasks

(Maybe at the same time)

    • HPC+X (From Dr. Berman’s last GSM talk)
  • Allow status checks and on the fly updates
  • Visualization…
  • Semantics and metadata…
  • Certification, trust, security…

Ask the experts in this room 

www.geongrid.org

kepler is
Kepler is…
  • … a scientific workflow system
  • … a cross-project collaboration

New contributing partners:

      • Cheminformatics: Resurgence (Kim Baldridge et al.)
      • Life Sciences: EOL (Mark Miller et al.)
      • Data Mining: SKIDL (Tony Fountain et al.)
      • Neuroinformatics: BIRN (coming…)
  • … an emerging open source tool for “scientific discovery workflows”

Kepler 1.0 alpha release

 Summer Institute

www.geongrid.org

www.geongrid.org

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES

31

some recent actor additions
Some Recent Actor Additions

Browser-based user interface

Queries & Transformations

Generic WS Invocation

SQL Queries

File Transfer

SMTP-based messaging

SRB Access

CommandLine Execution

Globus Job Execution

Real-time data streaming

www.geongrid.org

web services actors ws harvester
Web Services  Actors (WS Harvester)

1

2

4

3

”Minute-made” (MM) WS-based application integration

  • Similarly: MM workflow design & sharing w/o implemented components

www.geongrid.org

geon contributions to kepler
GEON Contributions to Kepler
  • System demonstration
  • Using Kepler Features
  • GEON workflows in detail
  • Dataset Registration Model
  • Processing Datasets on the Fly and Registering with the GEONworkbench

www.geongrid.org

conclusions
Conclusions
  • Evolving system – GEON is a significant contributor
    • Plans for new generic and project-specific extensions
  • Second alpha release available as CD
    • Installers for Windows, Linux, MacOSX
    • Daily version tests and JWS installer generation
  • User manuals and developer documentation is coming soon!
  • More: next week during the Summer Institute …

Kepler project website: http://kepler-project.org

Thanks!

www.geongrid.org

geon it advances data integration geon workbench scientific workflows1

E N D

GEON IT Advances:⁃ Data Integration⁃ GEON Workbench⁃ Scientific Workflows

Bertram Ludäscher

Kai Lin

Ilkay Altintas

Efrat Jaeger

San Diego Supercomputer Center

UC San Diego

www.geongrid.org

related publications
Related Publications
  • Semantic Data Registration and Integration
  • On Integrating Scientific Resources through Semantic Registration, S. Bowers, K. Lin, and B. Ludäscher, 16th International Conference on Scientific and Statistical Database Management (SSDBM'04), 21-23 June 2004, Santorini Island, Greece.
  • A System for Semantic Integration of Geologic Maps via Ontologies, K. Lin and B. Ludäscher. In Semantic Web Technologies for Searching and Retrieving Scientific Data (SCISW), Sanibel Island, Florida, 2003.
  • Towards a Generic Framework for Semantic Registration of Scientific Data, S. Bowers and B. Ludäscher. In Semantic Web Technologies for Searching and Retrieving Scientific Data (SCISW), Sanibel Island, Florida, 2003.
  • The Role of XML in Mediated Data Integration Systems with Examples from Geological (Map) Data Interoperability, B. Brodaric, B. Ludäscher, and K. Lin. In Geological Society of America (GSA) Annual Meeting, volume 35(6), November 2003.
  • Semantic Mediation Services in Geologic Data Integration: A Case Study from the GEON Grid, K. Lin, B. Ludäscher, B. Brodaric, D. Seber, C. Baru, and K. A. Sinha. In Geological Society of America (GSA) Annual Meeting, volume 35(6), November 2003.
  • Query Planning and Rewriting
  • Processing First-Order Queries under Limited Access Patterns, Alan Nash and B. Ludäscher, Proc. 23rd ACM Symposium on Principles of Database Systems (PODS'04) Paris, France, June 2004.
  • Processing Unions of Conjunctive Queries with Negation under Limited Access Patterns, Alan Nash and B. Ludäscher., 9th Intl. Conference on Extending Database Technology (EDBT'04) Heraklion, Crete, Greece, March 2004, LNCS 2992.
  • Web Service Composition Through Declarative Queries: The Case of Conjunctive Queries with Union and Negation, B. Ludäscher and Alan Nash. Research abstract (poster), 20th Intl. Conference on Data Engineering (ICDE'04) Boston, IEEE Computer Society, April 2004.

www.geongrid.org

related publications1
Related Publications
  • Scientific Workflows
  • Kepler: An Extensible System for Design and Execution of Scientific Workflows, I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludäscher, S. Mock, 16th International Conference on Scientific and Statistical Database Management (SSDBM'04), 21-23 June 2004, Santorini Island, Greece.
  • Kepler: Towards a Grid-Enabled System for Scientific Workflows, Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher, Steve Mock, Workflow in Grid Systems (GGF10), Berlin, March 9th, 2004.
  • An Ontology-Driven Framework for Data Transformation in Scientific Workflows, S. Bowers and B. Ludäscher, Intl. Workshop on Data Integration in the Life Sciences (DILS'04), March 25-26, 2004 Leipzig, Germany, LNCS 2994.
  • A Web Service Composition and Deployment Framework for Scientific Workflows, I. Altintas, E. Jaeger, K. Lin, B. Ludaescher, A. Memon, In the 2nd Intl. Conference on Web Services (ICWS), San Diego, California, July 2004.

www.geongrid.org

slide40
Multi-Hierarchical Rock Classification System (GSC)… a target ontology (after conversion to OWL) for geologic map registration …

Genesis

Fabric

Composition

Texture

www.geongrid.org

inside ontology enabled map integration
Inside Ontology-Enabled Map Integration

User: “Show formations from Cenozoic!”

Age Ontology

Cenozoic

Query Rewriting

Quaternary

Tertiary

select FORMATION

where AGE=“Tertiary”

or AGE=“Quaternary”

PERIOD

FORMATION

PERIOD

LITHOLOGY

ABBREV

Arizona

Montana West

Map Rendering

Color Definition

www.geongrid.org

data source wrapping and integration
Data Source Wrapping and Integration

ABBREV

Arizona

PERIOD

FORMATION

AGE

Idaho

NAME

Colorado

PERIOD

LITHOLOGY

Utah

TYPE

PERIOD

Nevada

FMATN

TIME_UNIT

Wyoming

NAME

Livingston formation

FORMATION

PERIOD

Tertiary-Cretaceous

Montana West

AGE

New Mexico

NAME

PERIOD

LITHOLOGY

andesitic sandstone

Montana East

FORMATION

PERIOD

www.geongrid.org

gravity modeling design workflow
Gravity Modeling Design Workflow
  • Idea: Comparing observed & synthetic gravity models
  • Steps:
    • Extracting and merging gravity depths from heterogeneous data sources for a Lat/Lon bounding box (databases, web services).
    • Projecting and interpolating data sources into the same coordinate systems.
    • Differencing observed and synthetic models.
    • Displaying Differential raster image.

www.geongrid.org

grid interpolation
Grid Interpolation
  • Interpolating queried gravity data on the grid and displaying it using a color schema.
  • Currently IDW interpolation algorithm supported. Future plans: Minimum Curvature, TIN, Kriging and Spline.
  • Output: either ascii x,y,z,p or ESRI ascii grid format.
  • Display: using global mapper service.

www.geongrid.org