E N D
1. Ontologies in Data and Application Integration – an Update
2. 2 Outline Motivation
Ontology Cheat Sheet
Ontology-enabled Prototypes and Tools
Data & Service Registration (Structural + Semantic)
Scientific Workflows
3. 3
4. 4 Ontology Cheat Sheet (1/2) What is an ontology? An ontology usually …
specifies a theory (a set of models) by …
defining and relating …
concepts representing features of a domain of interest
Also an overloaded (sometimes sloppy) term for:
Controlled vocabularies
Database schema (relational, XML, …)
Conceptual schema (ER, UML, … )
Thesauri (synonyms, broader term/narrower term)
Taxonomies
Informal/semi-formal representations
“Concept spaces”, “concept maps”
Labeled graphs / semantic networks (RDF)
Formal ontologies, e.g., in [Description] Logic (OWL)
“formalization of a specification”
? constrains possible interpretation of terms
5. 5 A Multi-Hierarchical Rock Classification “Ontology” (GSC)
6. 6 Ontology Cheat Sheet (2/2) What are ontologies used for?
Conceptual models of a domain or application, (communication means, system design, …)
Classification of …
concepts (taxonomy) and
data/object instances through classes
Analysis of ontologies e.g.
Graph queries (reachability, path queries, …)
Reasoning (concept subsumption, consistency checking, …)
Targets for semantic data registration
Conceptual indexes and views for
searching,
browsing,
querying, and
integration of registered data
7. Application Example: Geologic Map Integration
8. 8 Geologic Map Integration in the Portal After registering datasets, ontologies (here: “classes”), and an application (“OMI”), the datasets can be searched and displayed in an integrated way.
9. 9 Concept-Based Queries and Analysis After registering a source with one or more ontologies, concept-based queries and analysis can be launched
Here: light-weight client-side processing (SVG)
10. 10 Ontologies and Data Management Where do ontologies fit within data management architectures?
Several answers, specifically:
An ontology is similar to a schema or conceptual model if one exists, but is
Developed independently of a particular application
Probably given in a different language
Inherently more general
Usually not a very good schema (weak structure)
11. 11 Ontologies and Data Management(? watch out for Semantic Data Registration later)
12. 12 Creating and Sharing Concept Maps (here: Seismology concept map & Cmap tool) Lock up scientists for 2+ days
Add CS/KRDB types
Create concept maps
Refine
Iterate
? from napkin drawings, to concept maps, to ontologies
13. 13
14. 14
15. 15
16. 16 Graph (RDF) Queries on Ontologies
17. 17 Community-Based Ontology Development Draft of a geochemistry ontology developed by scientists
18. 18 Protégé (… not so ezOWL yet…)
19. 19 Sparrow (a poor man’s OWL tool …) Simple ASCII-based RDF and OWL entry and manipulation
20. Semantic Data Registration(joint work w/ Shawn Bowers)
21. 21 What is Data/Ontology/… Registration? A mechanism by which data sources, ontologies, services, …
… are published in a repository/registry
for the purpose of “smart” discovery, querying, integration
22. 22 Things to Register Data files (individual files)
Shapefile as a blob (+ file type)
Collections (of files; nested; eg satellite data)
Databases (has schema and can be queried)
Shapefile with schema registered
Ontologies
Services (web + grid services)
Other/external applications
23. 23 Connecting Datasets to Ontologies
24. 24 Step1: Selecting Relevant Concepts
25. 25 Step1: Selecting Relevant Concepts
26. 26 Step2: Generate Object Model
27. 27
28. 28
29. 29 Applications of Semantic Registration Mentioned before:
Smart data discovery, integration etc.
New application:
Generating data transformation semi-automatically for chaining together computational services
30. 30 Problem: Service Reusability Unless “designed to fit,” independent services are structurally incompatible
Generally, the source output type will not be a subtype of the target input type
31. 31 Service Reusability A data transformation mapping (?) is required to connect the services … artificially creating subtype compatibility
If such a ? exists, the services are “structurally feasible”
32. 32 Service Reusability Idea:
annotate services with semantic types (concept expressions) primarily for discovery of services
33. 33 Service Reusability Services can be semantically compatible, but structurally incompatible
34. 34 The Ontology-Driven Framework (work w/ Shawn Bowers, SEEK)
35. 35 Example Generated Data Transformation (in XQuery) Based on the structural correspondences and certain assumptions, we derive the transformation query:
36. Scientific Workflows(Efrat Jaeger et al.)
37. 37 Reverse Engineering a Scientific Workflow using the KEPLER Tool (Efrat Jaeger) Classification of igneous rocks is done in several steps:
Given a sample point data in which the maffic mineral content is less then 90%, the QAPF diagram is used for classification of coarse grained crystalline rocks according to their modal mineral contents. Since the point falls in the diorite gabbro anorthosite, either of the triangular diagram is used according to whether there is a value for the point in the modaldata DB for Olivine or hornblende.
Since these steps are computationally similar, we’ll concentrate on classifying a point using the triangular diagram.Classification of igneous rocks is done in several steps:
Given a sample point data in which the maffic mineral content is less then 90%, the QAPF diagram is used for classification of coarse grained crystalline rocks according to their modal mineral contents. Since the point falls in the diorite gabbro anorthosite, either of the triangular diagram is used according to whether there is a value for the point in the modaldata DB for Olivine or hornblende.
Since these steps are computationally similar, we’ll concentrate on classifying a point using the triangular diagram.
38. 38 A Scientific Workflow in Kepler This triangular diagram for classification and nomenclature of gabbroic rock is chosen for a specific point according to the values it has in the ModalData DB. It is used when the point has values for Plagioclase, Pyroxene and Olivine.
The ModalData provides the mineral info.This triangular diagram for classification and nomenclature of gabbroic rock is chosen for a specific point according to the values it has in the ModalData DB. It is used when the point has values for Plagioclase, Pyroxene and Olivine.
The ModalData provides the mineral info.
39. 39 A Scientific Workflow in Kepler This triangular diagram for classification and nomenclature of gabbroic rock is chosen for a specific point according to the values it has in the ModalData DB. It is used when the point has values for Plagioclase, Pyroxene and Olivine.
The ModalData provides the mineral info.This triangular diagram for classification and nomenclature of gabbroic rock is chosen for a specific point according to the values it has in the ModalData DB. It is used when the point has values for Plagioclase, Pyroxene and Olivine.
The ModalData provides the mineral info.
40. 40 A Scientific Workflow in Kepler This triangular diagram for classification and nomenclature of gabbroic rock is chosen for a specific point according to the values it has in the ModalData DB. It is used when the point has values for Plagioclase, Pyroxene and Olivine.
The ModalData provides the mineral info.This triangular diagram for classification and nomenclature of gabbroic rock is chosen for a specific point according to the values it has in the ModalData DB. It is used when the point has values for Plagioclase, Pyroxene and Olivine.
The ModalData provides the mineral info.
41. 41 This triangular diagram for classification and nomenclature of gabbroic rock is chosen for a specific point according to the values it has in the ModalData DB. It is used when the point has values for Plagioclase, Pyroxene and Olivine.
The ModalData provides the mineral info.This triangular diagram for classification and nomenclature of gabbroic rock is chosen for a specific point according to the values it has in the ModalData DB. It is used when the point has values for Plagioclase, Pyroxene and Olivine.
The ModalData provides the mineral info.
42. 42 Reverse-Engineered the Geological Map Integration in Kepler
43. 43 DataMapper Sub-Workflow
44. 44 Result launched via the BrowserUI actor
45. 45 KEPLER and YOU Kepler …
is a community-based, cross-project, open source collaboration
for “minute made” application integration
using web (grid) services as basic building blocks
has a joint CVS repository, mailing lists, web site, …
is gaining momentum thanks to contributors and contributions
BSD-style license allows commercial spin-offs
a pre-packaged, shrink-wrapped version (“Kepler-to-GO”) coming soon to a place near you…
46. F I N – Questions?
47. Additional Material
48. 48 The KEPLER GUI (Vergil from Ptolemy II)
49. 49
50. 50 Distributed Workflows in KEPLER Web and Grid Service plug-ins
WSDL
ProxyInit, GlobusGridJob, GridFTP, DataAccessWizard
SRB
SSH, SCP
Web Service Harvester
Imports all the operations of a specific WS (or of all
the WSs in a UDDI repository) as Kepler actors
XSLT and XQuery transformers to link non-fitting services together
Web Service Deployment (…ongoing work…)
51. 51 A Generic Web Service Actor
52. 52 Set Parameters and Commit
53. 53 WS Actor after Instantiation
54. 54 Web Service Harvester
55. 55 Composing 3rd-Party WSs
56. Providing DB Access through Kepler Database connection actor:
Opening a database connection and passing it to all actors accessing this database.
Database query actor:
A generic actor that queries a database and provides its result.
DBConnection type and DBConnectionToken:
A new IOPort type and a token to distinguish a database connection from any general type.
This triangular diagram for classification and nomenclature of gabbroic rock is chosen for a specific point according to the values it has in the ModalData DB. It is used when the point has values for Plagioclase, Pyroxene and Olivine.
The ModalData provides the mineral info.This triangular diagram for classification and nomenclature of gabbroic rock is chosen for a specific point according to the values it has in the ModalData DB. It is used when the point has values for Plagioclase, Pyroxene and Olivine.
The ModalData provides the mineral info.
57. Database Connection Actor OpenDBConnection actor:
Input: database connection information.
Output: A DBConnectionToken, a reference to a database connection instance, through a DBConnection output port.
This triangular diagram for classification and nomenclature of gabbroic rock is chosen for a specific point according to the values it has in the ModalData DB. It is used when the point has values for Plagioclase, Pyroxene and Olivine.
The ModalData provides the mineral info.This triangular diagram for classification and nomenclature of gabbroic rock is chosen for a specific point according to the values it has in the ModalData DB. It is used when the point has values for Plagioclase, Pyroxene and Olivine.
The ModalData provides the mineral info.
58. Database Query Actor Database Query actor:
Input: A query string (SQL) and a database connection reference.Parameters: output type – XML, Record or String. output each row separately or all at once. Process: Execute query. Produce results according to parameters.
This triangular diagram for classification and nomenclature of gabbroic rock is chosen for a specific point according to the values it has in the ModalData DB. It is used when the point has values for Plagioclase, Pyroxene and Olivine.
The ModalData provides the mineral info.This triangular diagram for classification and nomenclature of gabbroic rock is chosen for a specific point according to the values it has in the ModalData DB. It is used when the point has values for Plagioclase, Pyroxene and Olivine.
The ModalData provides the mineral info.
59. Querying Example This triangular diagram for classification and nomenclature of gabbroic rock is chosen for a specific point according to the values it has in the ModalData DB. It is used when the point has values for Plagioclase, Pyroxene and Olivine.
The ModalData provides the mineral info.This triangular diagram for classification and nomenclature of gabbroic rock is chosen for a specific point according to the values it has in the ModalData DB. It is used when the point has values for Plagioclase, Pyroxene and Olivine.
The ModalData provides the mineral info.
60. 60 Resource Description Framework (RDF) Simple data model that consists of
Resources (uniquely identified via URIs)
Properties
Values (resources or character strings)
Data organized into triples (subject, property, value)
61. 61 RDF Schema Adds a set of pre-defined properties to define classes and properties
Allows instances to be connected to classes
Sub-class and sub-property (is-a) relationships
62. 62 OWL Adds additional pre-defined properties to further constrain an ontology
(See http://www.w3.org/TR/owl-guide/)
Note, RDF(S) and OWL use XML
Some graphic tools exist (e.g., Protégé)