1 / 61

Ontologies in Data and Application Integration an Update

yates
Download Presentation

Ontologies in Data and Application Integration an Update

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Ontologies in Data and Application Integration – an Update

    2. 2 Outline Motivation Ontology Cheat Sheet Ontology-enabled Prototypes and Tools Data & Service Registration (Structural + Semantic) Scientific Workflows

    3. 3

    4. 4 Ontology Cheat Sheet (1/2) What is an ontology? An ontology usually … specifies a theory (a set of models) by … defining and relating … concepts representing features of a domain of interest Also an overloaded (sometimes sloppy) term for: Controlled vocabularies Database schema (relational, XML, …) Conceptual schema (ER, UML, … ) Thesauri (synonyms, broader term/narrower term) Taxonomies Informal/semi-formal representations “Concept spaces”, “concept maps” Labeled graphs / semantic networks (RDF) Formal ontologies, e.g., in [Description] Logic (OWL) “formalization of a specification” ? constrains possible interpretation of terms

    5. 5 A Multi-Hierarchical Rock Classification “Ontology” (GSC)

    6. 6 Ontology Cheat Sheet (2/2) What are ontologies used for? Conceptual models of a domain or application, (communication means, system design, …) Classification of … concepts (taxonomy) and data/object instances through classes Analysis of ontologies e.g. Graph queries (reachability, path queries, …) Reasoning (concept subsumption, consistency checking, …) Targets for semantic data registration Conceptual indexes and views for searching, browsing, querying, and integration of registered data

    7. Application Example: Geologic Map Integration

    8. 8 Geologic Map Integration in the Portal After registering datasets, ontologies (here: “classes”), and an application (“OMI”), the datasets can be searched and displayed in an integrated way.

    9. 9 Concept-Based Queries and Analysis After registering a source with one or more ontologies, concept-based queries and analysis can be launched Here: light-weight client-side processing (SVG)

    10. 10 Ontologies and Data Management Where do ontologies fit within data management architectures? Several answers, specifically: An ontology is similar to a schema or conceptual model if one exists, but is Developed independently of a particular application Probably given in a different language Inherently more general Usually not a very good schema (weak structure)

    11. 11 Ontologies and Data Management (? watch out for Semantic Data Registration later)

    12. 12 Creating and Sharing Concept Maps (here: Seismology concept map & Cmap tool) Lock up scientists for 2+ days Add CS/KRDB types Create concept maps Refine Iterate ? from napkin drawings, to concept maps, to ontologies

    13. 13

    14. 14

    15. 15

    16. 16 Graph (RDF) Queries on Ontologies

    17. 17 Community-Based Ontology Development Draft of a geochemistry ontology developed by scientists

    18. 18 Protégé (… not so ezOWL yet…)

    19. 19 Sparrow (a poor man’s OWL tool …) Simple ASCII-based RDF and OWL entry and manipulation

    20. Semantic Data Registration (joint work w/ Shawn Bowers)

    21. 21 What is Data/Ontology/… Registration? A mechanism by which data sources, ontologies, services, … … are published in a repository/registry for the purpose of “smart” discovery, querying, integration

    22. 22 Things to Register Data files (individual files) Shapefile as a blob (+ file type) Collections (of files; nested; eg satellite data) Databases (has schema and can be queried) Shapefile with schema registered Ontologies Services (web + grid services) Other/external applications

    23. 23 Connecting Datasets to Ontologies

    24. 24 Step1: Selecting Relevant Concepts

    25. 25 Step1: Selecting Relevant Concepts

    26. 26 Step2: Generate Object Model

    27. 27

    28. 28

    29. 29 Applications of Semantic Registration Mentioned before: Smart data discovery, integration etc. New application: Generating data transformation semi-automatically for chaining together computational services

    30. 30 Problem: Service Reusability Unless “designed to fit,” independent services are structurally incompatible Generally, the source output type will not be a subtype of the target input type

    31. 31 Service Reusability A data transformation mapping (?) is required to connect the services … artificially creating subtype compatibility If such a ? exists, the services are “structurally feasible”

    32. 32 Service Reusability Idea: annotate services with semantic types (concept expressions) primarily for discovery of services

    33. 33 Service Reusability Services can be semantically compatible, but structurally incompatible

    34. 34 The Ontology-Driven Framework (work w/ Shawn Bowers, SEEK)

    35. 35 Example Generated Data Transformation (in XQuery) Based on the structural correspondences and certain assumptions, we derive the transformation query:

    36. Scientific Workflows (Efrat Jaeger et al.)

    37. 37 Reverse Engineering a Scientific Workflow using the KEPLER Tool (Efrat Jaeger) Classification of igneous rocks is done in several steps: Given a sample point data in which the maffic mineral content is less then 90%, the QAPF diagram is used for classification of coarse grained crystalline rocks according to their modal mineral contents. Since the point falls in the diorite gabbro anorthosite, either of the triangular diagram is used according to whether there is a value for the point in the modaldata DB for Olivine or hornblende. Since these steps are computationally similar, we’ll concentrate on classifying a point using the triangular diagram.Classification of igneous rocks is done in several steps: Given a sample point data in which the maffic mineral content is less then 90%, the QAPF diagram is used for classification of coarse grained crystalline rocks according to their modal mineral contents. Since the point falls in the diorite gabbro anorthosite, either of the triangular diagram is used according to whether there is a value for the point in the modaldata DB for Olivine or hornblende. Since these steps are computationally similar, we’ll concentrate on classifying a point using the triangular diagram.

    38. 38 A Scientific Workflow in Kepler This triangular diagram for classification and nomenclature of gabbroic rock is chosen for a specific point according to the values it has in the ModalData DB. It is used when the point has values for Plagioclase, Pyroxene and Olivine. The ModalData provides the mineral info.This triangular diagram for classification and nomenclature of gabbroic rock is chosen for a specific point according to the values it has in the ModalData DB. It is used when the point has values for Plagioclase, Pyroxene and Olivine. The ModalData provides the mineral info.

    39. 39 A Scientific Workflow in Kepler This triangular diagram for classification and nomenclature of gabbroic rock is chosen for a specific point according to the values it has in the ModalData DB. It is used when the point has values for Plagioclase, Pyroxene and Olivine. The ModalData provides the mineral info.This triangular diagram for classification and nomenclature of gabbroic rock is chosen for a specific point according to the values it has in the ModalData DB. It is used when the point has values for Plagioclase, Pyroxene and Olivine. The ModalData provides the mineral info.

    40. 40 A Scientific Workflow in Kepler This triangular diagram for classification and nomenclature of gabbroic rock is chosen for a specific point according to the values it has in the ModalData DB. It is used when the point has values for Plagioclase, Pyroxene and Olivine. The ModalData provides the mineral info.This triangular diagram for classification and nomenclature of gabbroic rock is chosen for a specific point according to the values it has in the ModalData DB. It is used when the point has values for Plagioclase, Pyroxene and Olivine. The ModalData provides the mineral info.

    41. 41 This triangular diagram for classification and nomenclature of gabbroic rock is chosen for a specific point according to the values it has in the ModalData DB. It is used when the point has values for Plagioclase, Pyroxene and Olivine. The ModalData provides the mineral info.This triangular diagram for classification and nomenclature of gabbroic rock is chosen for a specific point according to the values it has in the ModalData DB. It is used when the point has values for Plagioclase, Pyroxene and Olivine. The ModalData provides the mineral info.

    42. 42 Reverse-Engineered the Geological Map Integration in Kepler

    43. 43 DataMapper Sub-Workflow

    44. 44 Result launched via the BrowserUI actor

    45. 45 KEPLER and YOU Kepler … is a community-based, cross-project, open source collaboration for “minute made” application integration using web (grid) services as basic building blocks has a joint CVS repository, mailing lists, web site, … is gaining momentum thanks to contributors and contributions BSD-style license allows commercial spin-offs a pre-packaged, shrink-wrapped version (“Kepler-to-GO”) coming soon to a place near you…

    46. F I N – Questions?

    47. Additional Material

    48. 48 The KEPLER GUI (Vergil from Ptolemy II)

    49. 49

    50. 50 Distributed Workflows in KEPLER Web and Grid Service plug-ins WSDL ProxyInit, GlobusGridJob, GridFTP, DataAccessWizard SRB SSH, SCP Web Service Harvester Imports all the operations of a specific WS (or of all the WSs in a UDDI repository) as Kepler actors XSLT and XQuery transformers to link non-fitting services together Web Service Deployment (…ongoing work…)

    51. 51 A Generic Web Service Actor

    52. 52 Set Parameters and Commit

    53. 53 WS Actor after Instantiation

    54. 54 Web Service Harvester

    55. 55 Composing 3rd-Party WSs

    56. Providing DB Access through Kepler Database connection actor: Opening a database connection and passing it to all actors accessing this database. Database query actor: A generic actor that queries a database and provides its result. DBConnection type and DBConnectionToken: A new IOPort type and a token to distinguish a database connection from any general type. This triangular diagram for classification and nomenclature of gabbroic rock is chosen for a specific point according to the values it has in the ModalData DB. It is used when the point has values for Plagioclase, Pyroxene and Olivine. The ModalData provides the mineral info.This triangular diagram for classification and nomenclature of gabbroic rock is chosen for a specific point according to the values it has in the ModalData DB. It is used when the point has values for Plagioclase, Pyroxene and Olivine. The ModalData provides the mineral info.

    57. Database Connection Actor OpenDBConnection actor: Input: database connection information. Output: A DBConnectionToken, a reference to a database connection instance, through a DBConnection output port. This triangular diagram for classification and nomenclature of gabbroic rock is chosen for a specific point according to the values it has in the ModalData DB. It is used when the point has values for Plagioclase, Pyroxene and Olivine. The ModalData provides the mineral info.This triangular diagram for classification and nomenclature of gabbroic rock is chosen for a specific point according to the values it has in the ModalData DB. It is used when the point has values for Plagioclase, Pyroxene and Olivine. The ModalData provides the mineral info.

    58. Database Query Actor Database Query actor: Input: A query string (SQL) and a database connection reference. Parameters: output type – XML, Record or String. output each row separately or all at once. Process: Execute query. Produce results according to parameters. This triangular diagram for classification and nomenclature of gabbroic rock is chosen for a specific point according to the values it has in the ModalData DB. It is used when the point has values for Plagioclase, Pyroxene and Olivine. The ModalData provides the mineral info.This triangular diagram for classification and nomenclature of gabbroic rock is chosen for a specific point according to the values it has in the ModalData DB. It is used when the point has values for Plagioclase, Pyroxene and Olivine. The ModalData provides the mineral info.

    59. Querying Example This triangular diagram for classification and nomenclature of gabbroic rock is chosen for a specific point according to the values it has in the ModalData DB. It is used when the point has values for Plagioclase, Pyroxene and Olivine. The ModalData provides the mineral info.This triangular diagram for classification and nomenclature of gabbroic rock is chosen for a specific point according to the values it has in the ModalData DB. It is used when the point has values for Plagioclase, Pyroxene and Olivine. The ModalData provides the mineral info.

    60. 60 Resource Description Framework (RDF) Simple data model that consists of Resources (uniquely identified via URIs) Properties Values (resources or character strings) Data organized into triples (subject, property, value)

    61. 61 RDF Schema Adds a set of pre-defined properties to define classes and properties Allows instances to be connected to classes Sub-class and sub-property (is-a) relationships

    62. 62 OWL Adds additional pre-defined properties to further constrain an ontology (See http://www.w3.org/TR/owl-guide/) Note, RDF(S) and OWL use XML Some graphic tools exist (e.g., Protégé)

More Related