1 / 24

Semantic Web Technologies and Data Management

Semantic Web Technologies and Data Management. Li Ma, Jing Mei, Yue Pan, Krishna Kulkarni, Achille Fokoue, Anand Ranganathan. Why bring together Relational Databases and the Semantic Web?. Main Motivations are in capturing Data Semantics, achieving Data Integration and Reasoning.

Download Presentation

Semantic Web Technologies and Data Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantic Web Technologies and Data Management Li Ma, Jing Mei, Yue Pan, Krishna Kulkarni, Achille Fokoue, Anand Ranganathan W3C RDF/RDB Workshop

  2. Why bring together Relational Databases and the Semantic Web? Main Motivations are in capturing Data Semantics, achieving Data Integration and Reasoning • RDF and OWL ontologies are good in capturing data semantics • Can be used to define a “semantic” model of the underlying relational data that can be tailored to different domains or applications, and that hides the actual layout of data across different tables • Allow use of additional domain knowledge in OWL ontologies while answering queries to the relational DB • Allow use of DL reasoning while answering queries to the relational DB to improve recall • Allow Semantic Web applications (that use an RDF/OWL data model) to have access to relational data, without having to deal with a different data model

  3. Business … Finance IT … Banking Telecom PC Optical Wireless … Hardware Solution Software Wireless Software Main board … Memory Region Asia Euro. Amer. … … East Asia France North Amer. Paris China USA Canada BeiJing NY Vancouver Ontology based Semantic Query Company info. ontology Shareholding Semantic query Find Company EDOX’s all direct and indirect shareholders who are from Europe and are IT company. FOO is retrievedusing transitive closure and subsumption inference. BAR is retrievedusing classification and subsumption inference

  4. Overview of talk • Survey of Two Basic Approaches for RDF Access to Relational Databases • Use Cases • Relevant Technologies from IBM • IBM’s basic positions

  5. Two Basic Approaches for RDF Access to Relational Data • Extending existing query languages for RDF Access • Extend SQL or XQuery with RDF-specific extensions • Using RDF-specific languages (like SPARQL) to allow publishing and accessing legacy data as RDF • Define an RDF interface over relational (or XML) databases and use query rewriting methods for accessing data

  6. Extending existing query languages for the access to new types of the data

  7. Publishing and accessing legacy data using new data models and query languages.

  8. IBM’s position • Because of the semantics differences between different query languages, we believe attempts to extend SQL and XQuery to support SPARQL would involve considerable complexity. • Hence, we advocate focusing research efforts on publishing and accessing relational (and XML) data as RDF data and exploiting SPARQL for semantic query and integration.

  9. Case Study: RDF Representation and Access to Master Data • Master data is the reference data that is shared by several disparate IT systems and groups. • May include lists or hierarchies of customers, suppliers, accounts, products, or organizational units • Effective Master Data Management required to enable consistent computing between diverse system architectures and business functions • Challenge of building a common master model flexible enough to deal with business changes, and expressive enough to represent the semantics of master data.

  10. Why RDF/OWL for Master Data Management? • Use of URIs to enable identification of common entities across different organizations • Integration of external industry-specific ontologies • Annotation of various relations based on Description Logics (e.g. symmetric, functional, inverse functional, or transitive properties) • Ability to define new classes of entities in a flexible and dynamic manner • Using intersection, union and complement operators in OWL • Defining classes based on restrictions on properties

  11. Product categories Computable Relationships transitive transitive Composed of Cross sell item Upsellitem Replacement Item Automatic definition of inverse relations : Replaced by <-> Replaces Ontology’s Values for Master Data Management • Electronic City example from IBM Websphere Product Center • It gathers typical customer models in electronic product sales. • Extended with ontology expressiveness Offer the Product Information Management basic semantics (category, products, catalogs…) Extend the definition of relationships thanks to ontology expressiveness : Transitivity, reflexivity… Inverse property (Replaced by is the inverse of Replaces)

  12. Product categories Computable category definition Disjoint classes PDA Phone Intersection Computable Relationships transitive transitive Composed of Cross sell item Upsellitem Replacement Item Automatic definition of inverse relations : Replaced by <-> Replaces Category can be defined using disjoint, intersection, union, as well as various restriction on paraent categories; The items will be automatically categorized according the defined categories

  13. Product categories New relations New entities Made by Manufacturer Material Contains Computable category definition Disjoint classes PDA Phone Intersection Computable Relationships transitive transitive Composed of Cross sell item Upsellitem Replacement Item Automatic definition of inverse relations : Replaced by <-> Replaces Allow to define new type of objects and relationship(locally or using URI reference)

  14. Product categories New relations New entities Dynamic categories Made by Manufacturer Outdated Items Cardinality restriction: Has 1 or more Replacement item Material Contains Promoted Items Cardinality restriction: Has 1 or more promotion Computable category definition Metallic products «allValuesFrom» Contains comes from Metal Disjoint classes Products containingbatteries «someValuesFrom» Composed of Comes from Battery PDA Phone Aluminum products Intersection Contains «hasValue» aluminum ComputableRelationships transitive transitive Composed of Cross sell item Upsellitem Replacement Item Automatic definition of inverse relations : Replaced by <-> Replaces

  15. New MDM services created by developers Architecture Ontology and rule queries Scenario Business analysts OWL Class and ontology User-defined Rules SPARQL Queries SPARQL Query Parser Ontology and rule Repository Ontology Classification Ontology based Semantic Engine Datalog Evaluation Ontology to RDB Mapping SQL Generator& Executor Ontology Views Operational Data stores Operational Data stores Operational Data stores Data Operational Data stores Pub/Sub MDM Hub IBM Confidential

  16. Example Query • Find all Contracts related to those which are assembled by ContractComponents that VIP Contactsown • SPARQL • Select ?w • Where • { ?x RelatedContract ?w; :assembledby ?y. • ?y rdf:type :ContractComponent. • ?z rdf:type VIP; :playRole ?u. • ?u typeOf :ContractRole; :contractRoleType own; :playRoleIn ?y} • Elements • WCC business entities and their related properties • ContractRelationship • Contract • ContractComponent • assemble (object property, range: Contract) • ContractRole • ContractRoleType (datatype property) = own • playRoleIn (object property, range: ContractComponent) • Contact • playRole (object property, range: ContractRole) • User-defined classifier • VIP contact • A contact whose client_importance property is high • User-defined datalog rule • RelatedContract(x, y):- RelatedContract(x, z), RelatedContract(z, y); • RelatedContract(x, y):- Contract(y), ContractRelationship(z), Related_From(z,y), Related_To(z,x), Contract_Relationship_type(z,'supplemental'); VIP contact as a classifier is defined in a hierarchy tree outside the MDM system. Users can manually add new individuals under the classifier or automatically populate individuals as its instances using class expression. Ontology reasoning capability Subsumption inference HasValue restriction Rule reasoning Transitive closure IBM Confidential

  17. IBM SOR - Scalable Ontology Repository • Efficient management for large-scale OWL ontologies (millions of statements) • DBMSs Supported • IBM DB2 (Powerful Persistent Storage) • Derby (http://incubator.apache.org/derby/, Embedded Storage) • SQL Server (Powerful Persistent Storage) • Oracle (Powerful Persistent Storage) • Query Language • W3C SPARQL Query Language • Inference Engines • Pellet • Structural TBox Engine (IBM CRL) • Memory Model • EODM (EMF based Ontology Definition Metamodel, OMG’s recommendation) (http://www.eclipse.org/emft/projects/eodm/)

  18. SHER DL Reasoner Rule Inference Engine DL Reasoner Membership and relationship query ABox Summarizer ABox Filters … Generate reasoning task TBox Translator TBox Translator Returned results by SQLs Query Adaptor Query Adaptor Users Enhanced Datalog Engine SOR Architecture Reasoning OWL documents SPARQL queries and results SPARQL Processor Insert Tbox Retrieve data for query answering & reasoning Storage Retrieve subsumption Return results Insert data for reasoning Generate SPARQL memory model Generate EODM models from documents Persistent Store Load and traverse EODM Tbox Retrieve Tbox OWL Parser Simplified Datalog Engine • SPARQL2SQL translation • Return resutls Insert Tbox Load and traverse EODM Abpx DB Translator Query Answering Import Insert Abox assertions into DB

  19. IBM SHER – Scalable Highly Expressive Reasoner • SHER: Support SHIN (a subset of OWL-DL) ontologies. • SHER is better than the state of art reasoners in terms of scalability and performance.

  20. Design Principles • 3 key steps to exposing relational data as RDF data, • creating an RDF Representation (ontology) of the relational data, • building a mapping between the relational database and ontology, and • rewriting SPARQL queries to retrieve the relational data.

  21. Design Questions • URI Generation for Relational Data • For classes, properties and instances • N-ary Relationship Representation • Representation of RDB Schema Constraints in Mapping • Effective Query Rewriting and Optimization • Reasoning • Performance, Security

  22. Backup

  23. Master Data • “Master data is data that is shared across systems (such as lists or hierarchies of customers, suppliers, accounts, or organizational units) and is used to classify and define transactional data.”[IDC] • Examples • Sell Product A to Customer X on 1/1/06 for $100. • With Master Data, we should be able to answer to such questions • What is a “customer” ? • It is a subclass of People with the specific attributes A,B,C … • How to add a new customer ? • Defines the workflow • How to know that 2 customers refers to the same identity ? • Defines some business rules

  24. Master Data Management System New Applications What Is Master Data Management? MasterData Existing Applications • Decouples master information from individual applications • Becomes a central, application independent resource • Simplifies ongoing integration tasks and new app development • Ensure consistent master information across transactional and analytical systems • Addresses key issues such as data quality and consistency proactively rather than “after the fact” in the data warehouse Existing Applications MasterData Existing Applications Historical /AnalyticalSystems MasterData Source: IBM EMDS team

More Related