620 likes | 678 Views
Explore ontology roles, scenarios, and aligning strategies in geospatial data integration. Address syntactic, schematic, and semantic heterogeneities for seamless information processing across multiple sources.
 
                
                E N D
Geospatial Data Integration Isabel F. Cruz Department of Computer Science University of Illinois at Chicago http://www.cs.uic.edu/~advis/publications/ http://www.cs.uic.edu/~ifc/grants/DG/ http://www.cs.uic.edu/~advis/CASSIS/ ADVIS Lab – http://www.cs.uic.edu/~advis
Overview • Introduction, motivation, and some definitions • Semantic Integration • Ontology roles • Scenarios (geospatial and non-geospatial) • Ontology alignment for semantic heterogeneities • Research Issues and Discussion ADVIS Lab – http://www.cs.uic.edu/~advis
Multiplicity of Data Sources • From “MONO- to MULTI-” environment [Backe and Edwards]: • multi-sources • multi-sensors • multi-producers • multi-representation • multi-answer ADVIS Lab – http://www.cs.uic.edu/~advis
Data Heterogeneity [Bishr99] • Syntactic heterogeneity • Different paradigms (e.g., Relational, XML, and RDF) • Schematic heterogeneity • Different aggregation or generalization hierarchies for the same “real world” facts • Semantic heterogeneity • Disagreement on the meaning, interpretation or intended use of data ADVIS Lab – http://www.cs.uic.edu/~advis
Syntactic Heterogeneity Data sources may use different syntax to represent data. ADVIS Lab – http://www.cs.uic.edu/~advis
Schematic Heterogeneity Documents can contain the same element and attribute names but have different nested structures. ADVIS Lab – http://www.cs.uic.edu/~advis
Semantic Heterogeneity Documents can have the same names for elements and attributes but different meanings. ADVIS Lab – http://www.cs.uic.edu/~advis
Data Integration • Data integration: ability to manipulate (e.g., query) data transparently across multiple heterogeneous data sources • Semantic data integration: based on conceptual representation of the data and their relationships to eliminate possible syntactic, schematic and semantic heterogeneities • Note: Semantic data integration can be used to solve syntactic and schematic heterogeneities! ADVIS Lab – http://www.cs.uic.edu/~advis
Ontology-Based Data Integration Application Query Mediator Ontology Wrapper Wrapper Wrapper Local Ontology Local Ontology Local Ontology Source Source Source [Fonseca & Egenhofer 99] ADVIS Lab – http://www.cs.uic.edu/~advis
Ontology • An ontologyis an explicit specification of a shared conceptualization • RDF (Resource Description Framework) • A directed graph of statements: (resource, property, value) • RDF Schema: A language that is used to describe vocabularies of RDF data • rdfs:Class, rdf:Property, rdfs:domain, rdfs:range, etc. • DAML+OIL and OWL rdfs:domain rdf:Property Flying-object Measure place Aircraft name Maintenance airbase rdfs:subPropertyOf rdfs:subClassOf Combat-aircraft time number staff Person rdfs:Class rdfs:range ADVIS Lab – http://www.cs.uic.edu/~advis
Semantic Web: Architecture(Berners-Lee http://www.w3.org/2000/Talks/1206-xml2k-tbl/) Trust Rules Proof Data Logic Data Digital Signature Ontology Vocabulary Self – described document RDF + rdfschema XML + NS + xmlschema Unicode URI ADVIS Lab – http://www.cs.uic.edu/~advis
GaV and LaV • Each local source corresponds to a query over the global schema • Global querying is difficult – inference over partial answers • Global schema maintenance is easy • Global schema consists of views over local schemas • Global querying is easy – sub-query unfolding • Global schema maintenance is difficult Global Schema Global Schema MAPPING query query MAPPING query query Local Source Local Source Local Source Local Source ADVIS Lab – http://www.cs.uic.edu/~advis
Role of Ontologies in Data Integration • Role 1: Schema Annotation • Role 2: High-level View of Sources • Role 3: Support for High-level Queries • Role 4: Declarative Mediation • Role 5: Support for Inference ADVIS Lab – http://www.cs.uic.edu/~advis
Application Scenario 1 • Data interoperation between legacy systems: System B and System E • A typical query: List all the “F15” aircrafts System B Table: RDYACFT MODEL AVAILTIME QTY AIRBASE S_ID F15 0800 12 CA, Anaheim 1214 F16 1000 13 GA, Dalton 1215 Table: STAFF S_ID TITLE TEAM_LEADER STAFF_NUM 1214 F15_team Johnson 6 1215 F16_team Michael 5 System E AIRCRAFT.DTD <?xml version="1.0" encoding="UTF-8"?> <!ELEMENT AIRCRAFT_SCHEDULE (AIRCRAFT)*> <!ELEMENT AIRCRAFT (NUMBER, RDYTIME, AIRBASE, MTSTAFF)+> <!ATTLIST AIRCRAFT NAME CDATA #REQUIRED> <!ELEMENT NUMBER (#PCDATA)> <!ELEMENT RDYTIME (#PCDATA)> <!ELEMENT AIRBASE (#PCDATA)> <!ELEMENT MTSTAFF (#PCDATA)> Foreign key <AIRCRAFT NAME="F-15"> <NUMBER> 5 </NUMBER> <RDYTIME> 11:00 am </RDYTIME> <AIRBASE> Anaheim, CA </AIRBASE> <MTSTAFF> Eagle-1 </MTSTAFF> </AIRCRAFT> ADVIS Lab – http://www.cs.uic.edu/~advis
A Layered Model [Melnik00] Local Source Remote Source Application Layer Application Layer User Interface User Interface Semantic Layer Semantic Layer Our focus Language Language Domain Models Domain Models Conceptual Models Conceptual Models Object Layer (Objects/Relationships between objects) Object Layer (Objects/Relationships between objects) Syntax Layer (Serialization, Storage) Syntax Layer (Serialization, Storage) ADVIS Lab – http://www.cs.uic.edu/~advis
Implementation of Layers • Application Layer • Provides user interface and accepts user queries • Semantic Layer (Our focus) • Conceptual Models: Model concepts, relationships and constraints. RDF Schema • Domain Models: Express the ontologies of a particular application domain. Global ontology for the domain of aircraft maintenance • Languages: Express the queries. RDF Query Language (RQL) • Object Layer and Syntax Layer. RDF Schema Specific Database (RSSDB) ADVIS Lab – http://www.cs.uic.edu/~advis
Architecture [Cruz03a] ADVIS Lab – http://www.cs.uic.edu/~advis
Three Phases • Constructing the Local Unified Schema • Schema transformation (Schema Integrator) Relational schemas XML schemas (DTD) Local unified schema RDF schemas • Data transformation • Mapping Process • The global ontology: used by the mediator • Common vocabulary facilitates the mapping between the global ontology and local unified schemas in the LaV approach • Query Processing • Query rewriting algorithm • Based on the mapping information ADVIS Lab – http://www.cs.uic.edu/~advis
Ontology Role 1 – Schema Annotation • Annotation (or abstraction) of the schema of a local relational, XML, or RDF source • Conceptualizing the elements and relationships between elements • A uniform metadata representation that facilitates the mapping process • Addition and/or preservation of the schema features, such as key information and XML document structure • A requirement for correct query answering ADVIS Lab – http://www.cs.uic.edu/~advis
Translation from Relational to RDFS. Ontology Role 1 – Example 1 • Rules for schema annotation for relational sources MODEL AIRBASE Literal RDYACFT Literal System B Table: RDYACFT MODEL AVAILTIME QTY AIRBASE S_ID F15 0800 12 CA, Anaheim 1214 F16 1000 13 GA, Dalton 1215 Table: STAFF S_ID TITLE TEAM_LEADER STAFF_NUM 1214 F15_team Johnson 6 1215 F16_team Michael 5 QYT AVAILTIME S_ID Literal Literal Literal S_ID STAFF_NUM TITLE Literal STAFF Literal TEAM_LEADER Literal local relational source local RDF description ADVIS Lab – http://www.cs.uic.edu/~advis
Mapping Process and Query Processing The global ontology • Mapping Mediation uses Common Vocabulary • Query ProcessingMediation uses the Global Ontology Common Vocabulary mapping User query … Local unified schema Query answer Schema and data transformation … … … Relational database XML document source RDF data source ADVIS Lab – http://www.cs.uic.edu/~advis
Mapping rdfs:domain Schema for System E AIRCRAFT NAME RDYTIME NUMBER AIRBASE MTSTAFF Global Ontology Maintenance mapping name time number airbase title Schema for System B RDYACFT S_ID STAFF MODEL AVAILTIME QTY AIRBASE TITLE STAFF_NUM Synonym of READYTIME TEAM_LEADER ADVIS Lab – http://www.cs.uic.edu/~advis
Application Scenario 2 writers papers writer* paper* author* article* @fullname @title @name @title Local XML Source S1 Local XML Source S2 • Goal: Integrating heterogeneous XML sources, enabling interoperation among them • S1 and S2 • Semantically equivalent: Two concepts –paper (or article) and author (or writer), and a relationship between them • Structurally different: Reverse nesting structures ADVIS Lab – http://www.cs.uic.edu/~advis
A new RDFS property rdfx:contains is used to represent relationships between classes. Schema Annotation • Rules for schema annotation for XML sources Books books rdfx:contains book* title Literal Book author* rdfx:contains @title name Literal Author @name local XML schema S1 local RDF description S1' ADVIS Lab – http://www.cs.uic.edu/~advis
A new RDFS property rdfx:contains is used to represent class-to-class relationship. Ontology Role 1 – Example 2 • Rules for schema annotation for XML sources Books books rdfx:contains book* title Literal Book author* rdfx:contains @title name Literal Author @name local XML schema S1 local RDF description S1' ADVIS Lab – http://www.cs.uic.edu/~advis
Ontology Role 2 – High-level View of Sources • The global ontology is generated by integrating the RDF annotations S1' and S2' (of local XML sources S1 and S2) and the RDF schema S3 • Using the GaV approach • A high-level overview of local sources with explicit semantics • The user does not need to formulate a query according to the document structure of a particular XML source ADVIS Lab – http://www.cs.uic.edu/~advis
Ontology Role 2 – Example Inter-schema Mapping Books Writers rdfx:contains rdfx:contains fullname title Literal Book Literal Writer rdfx:contains rdfx:contains name title Literal Author Literal Article local RDF ontology S1' local RDF ontology S2' Literal Literal publishedBy ISBN name Book Publisher rdfx:contains rdfx:contains rdfx:contains Books Book Author Authors booktitle name ISBN rdfx:contains title publishedBy Literal Literal Literal name Literal Publisher Literal RDF-based global ontology local RDF ontology S3 ADVIS Lab – http://www.cs.uic.edu/~advis
Application Scenario 3 • WLIS (Wisconsin LandInformation System): web-based system linking data from distributed, heterogeneous data sources • Case study: land use codes • Sample query: “Find all the agricultural lands in Dane and Racine counties.” • Different authorities use different land use coding systems leading to syntactic, schematic, and semantic heterogeneities ADVIS Lab – http://www.cs.uic.edu/~advis
Heterogeneity “Find all the agricultural lands in Dane and Racine counties.” Parcel-based example Each highlighted parcel has its own land use classification code ADVIS Lab – http://www.cs.uic.edu/~advis
Land Use Code Heterogeneity in WLIS Land Use Code Land Use Code Land Use Code Land Use Code There are 72 counties and hundreds of cities and towns in the state; each may have their own system of classifying Land Use codes ADVIS Lab – http://www.cs.uic.edu/~advis
Dane County Commercial Commercial Retail Sales and Services Retail Sales Retail Services Land Under Development Intensive Nonintensive Classification: Semantic Issue Racine County ADVIS Lab – http://www.cs.uic.edu/~advis
Land Use Codes ADVIS Lab – http://www.cs.uic.edu/~advis
Land Use Codes Synonyms ADVIS Lab – http://www.cs.uic.edu/~advis
Land Use Codes Synonyms Value heterogeneity ADVIS Lab – http://www.cs.uic.edu/~advis
Agreement Document • XML document that act as a wrapper layer for the underlying local data source • Stores information about how entities in the global ontology map to the entities in the local data source • Uses XML to capture the hierarchical ordering of entities and their mappings • Supports query operations using XPath/XSLT to hide details of how data is structured in local data source • Minimizes need for programmer intervention and maintenance as it is declaratively specified ADVIS Lab – http://www.cs.uic.edu/~advis
Ontology Alignment • Alignment is the process of mapping concepts from one ontology to concepts of another ontology • Concepts are mapped based on how “similar” they are to each other • Similarity takes different shapes: • Similarity in definition • For example,automobile and car have very similar definition in any given dictionary • Similarity in text • For example: agriculture and agricultural have the same prefix and have 4 letters in common ADVIS Lab – http://www.cs.uic.edu/~advis
Mapping types • Exact: the connected vertices equivalent in meaning • Subset: the vertex in the global ontology is a subset of the vertex in the local ontology, i.e. less general in meaning. • Superset: the vertex in the global ontology is a superset of the vertex in the local ontology, i.e. more general in meaning • Approximate: the connected vertices are close in meaning (e.g., they intersect in some properties) but are not equivalent in definition. • Null: the vertex in the global ontology does not have an equivalent vertex in definition in the local ontology ADVIS Lab – http://www.cs.uic.edu/~advis
Mapping Types Exact Industry Industry Exact Mining Manufacturing Production Mining Exact Rubber Construction Material Electrical Supplies Rubber and Glass Superset Subset ADVIS Lab – http://www.cs.uic.edu/~advis
Agreement Maker • Visual interface for creating agreements • Existing mappings displayed to the user • Displayed list of mappings updated as user identifies more mappings ADVIS Lab – http://www.cs.uic.edu/~advis
User Interface ADVIS Lab – http://www.cs.uic.edu/~advis
Semi-automatic Alignment • Framework that defines the values associated with the vertices of the ontology as functions of the: • values of the children vertices, or • user input • User (or system) establishes some mapping types • System propagates the mapping types along the ontologies (bottom-up) as much as possible ADVIS Lab – http://www.cs.uic.edu/~advis
Full vs. Partial Mappings Superset a d b e c f Exact Superset Full Mapping ADVIS Lab – http://www.cs.uic.edu/~advis
Full vs. Partial Mappings Subset a d b e c f g Exact Exact Partial Mapping ADVIS Lab – http://www.cs.uic.edu/~advis
Propagation Rules ADVIS Lab – http://www.cs.uic.edu/~advis
Deduction Results Interface ADVIS Lab – http://www.cs.uic.edu/~advis
Conclusions • Ontologies carry explicit semantics with concepts and relationships between concepts in a knowledge domain • XML or relational schema languages encode the semantics implicitly in the schema structure, e.g., the XML nested structure • In traditional or P2P schema-based data integration, ontologies may be used to add semantics to the local schemas, so as to facilitate the interoperation between heterogeneous data sources • In the mapping process • In query answering • Five roles of ontologies in data integration • Considered all kinds of heterogeneities • Looked at geospatial and non-geospatial applications ADVIS Lab – http://www.cs.uic.edu/~advis
Research Questions • Standards • What problems do they address and in what degree? • Syntax • Schematic • Semantics • New applications (LBS), new architectures (P2P), new technologies (sensor networks) • How do they affect what we already know about data interoperation? • How to extend what we already know? ADVIS Lab – http://www.cs.uic.edu/~advis
Which Techniques Solve Fundamental/Practical Issues in Data Integration? • Intelligent Software Agent Technology for distributed GIServices [Tsou] • Data Mining [Shekhar, Gahegan] • Middleware [Armstrong and Wang] • Geospatial Web Services and Grid Services [Di] • … ADVIS Lab – http://www.cs.uic.edu/~advis
Context and Ontologies [Bishr] • Context: Collection of relevant conditions and surroundings that make a situation unique and comprehensible • No context-independent facts • Can provide for simplification of: • Axiom formalization • Ontologies • Theory of geospatial context must consider the role of time, location, and other spatio-temporal aspects in determining the truth value of a given set of axioms ADVIS Lab – http://www.cs.uic.edu/~advis
Context and Ontologies [Bishr] • Semantic interoperability in geospatial applications can only be achieved if we introduce context into our ontological models • Capturing the difference between the system’s view and the user’s view and extending it to the use of the same concept in different contexts (working with vector spaces) • Context-augmented ontology (ontology changes based on user or device profile, role or task) and querying [Cruz] ADVIS Lab – http://www.cs.uic.edu/~advis