automatic schema matching n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Automatic Schema Matching PowerPoint Presentation
Download Presentation
Automatic Schema Matching

Loading in 2 Seconds...

play fullscreen
1 / 62

Automatic Schema Matching - PowerPoint PPT Presentation


  • 407 Views
  • Uploaded on

Automatic Schema Matching. Nicole Oldham CSCI 8350 (Semantic Web Course @ Univ of Georgia) Topic Presentation. Outline. Introduction Application Domains Classification of Schema Matching Approaches Current Work MWSAF Matching Open Research Directories Conclusion. Schema Matching.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Automatic Schema Matching' - Mia_John


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
automatic schema matching

Automatic Schema Matching

Nicole Oldham

CSCI 8350

(Semantic Web Course @ Univ of Georgia)

Topic Presentation

outline
Outline
  • Introduction
  • Application Domains
  • Classification of Schema Matching Approaches
  • Current Work
  • MWSAF Matching
  • Open Research Directories
  • Conclusion
schema matching
Schema Matching
  • Match: Takes two schemas as input and produces a mapping between the elements that correspond to each other semantically.
  • It is usually performed manually.
    • Tedious
    • Time Consuming
    • Error Prone
    • Expensive

We must automate this process!

example
Example
  • GTE telecommunications needed to integrate 40 databases with a total of 27,000 elements.
  • Project planners estimated that manual matching would take 12 person years to integrate.

Doan A, Halevy A. Semantic Integration Research in the Database Community: A Brief Survey.

various levels of heterogenity
Various Levels of Heterogenity

ftp://ftp.dagstuhl.de/pub/Proceedings/04/04391/04391.ChristophidesVassilis.Slides.pdf

how to deal with semantic heterogenity
How to deal with Semantic Heterogenity

1. Standardize: agree on a common representation

2. Translate: create mappings between different schemas

􀂾 -requires human input and machine reasoning

􀂾 -mappings can be difficult and expensive

3. Annotate: create relationships between agreed upon conceptualizations

􀂾 -requires human input and machine reasoning

􀂾 -annotation can be difficult and expensive

􀂾

ftp://ftp.dagstuhl.de/pub/Proceedings/04/04391/04391.ChristophidesVassilis.Slides.pdf

challenges
Challenges
  • Actual semantics of the involved elements are typically only from the creators or documentation – so we must use clues in the schema and data instead.
  • These clues are often misleading.
    • Ie. ‘Area’ can refer to different entities
    • Ie. The same entities can have very different names.
  • Clues are often ambiguous.
    • Ie. ‘Contact-agent’ Agent name or phone number?
  • Matching process can be very costly
    • Each element of the schema must be examined to ensure discovery of the best match.
  • Matching is often subjective depending on the application.

Doan A, Halevy A. Semantic Integration Research in the Database Community: A Brief Survey.

outline1
Outline
  • Introduction
  • Application Domains
  • Classification of Schema Matching Approaches
  • Current Work
  • MWSAF Matching
  • Open Research Directories
  • Conclusion
where is schema matching used
Where is Schema Matching used?
  • Database Application Domains
    • Data Integration
    • Data Warehousing
    • E-Business
    • Query Processing
  • Semantic Web
    • XML/HTML to an Ontology
    • Semantic Web Services

Bernstein P, Rahm E. A survey of approaches to automatic schema matching

schema integration
Schema Integration

Problem:Construct a global view from a set of independently constructed schemas.

(ie: ontologies)

- Different structure and terminologies

Solution: Schema Matching is performed to find relationships between concepts in each schema. Then the matching elements can be unified.

Bernstein P, Rahm E. A survey of approaches to automatic schema matching

data warehouses
Data Warehouses

Problem: Integrating data sources into a data warehouse.

- Different formats between the source and warehouse.

Solution: Use matching to find the elements of the source that are also present in the warehouse. Then the details of the semantics can be examined to integrate the two.

Bernstein P, Rahm E. A survey of approaches to automatic schema matching

e commerce
E-Commerce

Problem: Message translation.

-Each trading partner uses its own message format.

Solution: A match operation would reduce the amount of manual work to specify how the formats are related.

Bernstein P, Rahm E. A survey of approaches to automatic schema matching

query processing
Query Processing

Problem: The terms used in the user’s query may be different from those in the database.

Solution: Matching is used to map the user-specified concepts in the query to schema elements.

Bernstein P, Rahm E. A survey of approaches to automatic schema matching

need for data integration on the semantic web
Need for Data Integration on the Semantic Web
  • Problem: Web documents are not in RDF or any form suitable for the SW.
    • We must annotate them with concepts from ontologies.
  • Solution: Use schema matching to map between elements represented in OWL and the different schemas of web documents.
semantic web services
Semantic Web Services
  • Problem: Web Services are currently searched for using keywords.
    • We need to annotate the WSDLs with semantic metadata so that they can be discovered efficiently.
    • WSDLs are in XML, Ontologies in OWL!
  • Solution: Use schema matching approaches to map between the two different schemas.
outline2
Outline
  • Introduction
  • Application Domains
  • Classification of Schema Matching Approaches
  • Current Work
  • MWSAF Matching
  • Open Research Directories
  • Conclusion
term definitions
Term Definitions
  • Schema: a set of elements connected by some structure.
  • Mapping: a set of mapping elements , each of which indicates that certain elements of schema s1 are mapped to certain elements in s2.
  • Mapping Expression: Tells how s1 and s2 elements are related.

Bernstein P, Rahm E. A survey of approaches to automatic schema matching

example1
Example

A mapping between s1 and s2 might contain these elements:

  • Cust.C#=Customer.CustID
  • Concatenate(Cust.FirstName, Cust.LastName) = Customer.contact
  • Cust.CName = Customer.Company

Bernstein P, Rahm E. A survey of approaches to automatic schema matching

example2
Example

Doan A, Halevy A. Semantic Integration Research in the Database Community: A Brief Survey.

classification of schema matching approaches
Classification of Schema Matching Approaches
  • Instance vs Schema: matching approaches can consider instance data or schema-level information.
  • Element vs Structure matching: match can be performed for individual schema elements or combinations of elements.
  • Language vs Constraint: linguistic (names) or constraint-based (keys and relationships).
  • Matching Cardinality: match result may relate one or more elements of one schema to one or more elements of another.
  • Auxiliary Information: matcher relies on other information besides the input schemas, such as dictionaries, user input, global schemas.

Bernstein P, Rahm E. A survey of approaches to automatic schema matching

classification of schema matching approaches1
Classification of Schema Matching Approaches

Schema Matching Approaches

Individual Matchers

Combining Matchers

Schema-only

Instance/Contents

Hybrid Matchers

Composite Matchers

Element Level

Structure Level

Element Level

Manual Composition

Automatic Composition

Linguistic

Constraint

Constraint

Linguistic

Constraint

Further Criteria

-Match Cardinality

-Auxiliary information used…

  • Word Frequency
  • Name Similarity
  • Description Similarity
  • Global Namespaces
  • Type Similarity
  • Key Properties
  • Group Matching
  • Value Pattern and Ranges

Sample Approaches

Bernstein P, Rahm E. A survey of approaches to automatic schema matching

schema level matchers
Schema Level Matchers
  • Consider schema information instead of instance data:

Name, Description, Data Type, Relationship Types,

Constraints, Structure

  • Often produces multiple candidates and estimates a degree of similarity for each
  • Granularity of match (element level vs structure level)
  • Match Cardinality
  • Linguistic Approaches: Name or Description Matching
  • Constraint-Based Approaches
  • Reusing Schema and Matching Information

Bernstein P, Rahm E. A survey of approaches to automatic schema matching

element level
Element-Level
  • Element-Level: Identifies all elements of S1 that are the same or similar to elements of S2.
  • The match comparison can be based on name, description, or data type of the element.
  • Example of name-based element-level matching:

Address = CustomerAddress

Bernstein P, Rahm E. A survey of approaches to automatic schema matching

structure level
Structure-Level
  • Structure-Level: Matches combinations of elements that appear together in S1 with combinations of elements that appear together in S2.
  • Full Structure Match:
  • Partial Structure Match:
  • Equivalence Patterns: Can enhance structure matching by considering known equivalence patterns stored in a library.

Bernstein P, Rahm E. A survey of approaches to automatic schema matching

match cardinality
Match Cardinality
  • One or more S1 elements can match one or more S2 elements.
    • Complex matches

Bernstein P, Rahm E. A survey of approaches to automatic schema matching

complex matches
Complex Matches
  • 1:1 matches are bounded by the sizes of the schemas but there are an unbounded number of functions for combining attributes in a schema
  • Only a few works on complex matching have been done.
    • Some hard code complex matches into rules.
    • Some rely on a domain specific ontology.
  • We need domain knowledge to accurately perform complex matching.
  • The best match isn’t always the top match returned by the matcher – so human involvement is still needed.

Doan A, Halevy A. Semantic Integration Research in the Database Community: A Brief Survey.

linguistic approaches
Linguistic Approaches
  • Language based matchers use names and text (i.e. words or sentences) to find semantically similar schema elements.
  • Name Matching: match elements with similar names
  • Description Matching: match comments in the schemas

Bernstein P, Rahm E. A survey of approaches to automatic schema matching

linguistic approaches name matching
Linguistic Approaches:Name Matching
  • Matches schema elements with equal or similar names.
  • How similarity is defined:

1. Equality of names

2. Equality of names after stemming, deals with prefixes/suffixes.

3. Equality of synonyms

4. Equality of hypernyms (suv is a type of car)

5. Similarity of names based on common substrings, soundex, pronunciation (ShipTo = Ship2)

6. User provided name matches.

  • Can be element or structure-level.
  • Cardinality is not limited to 1:1.

Bernstein P, Rahm E. A survey of approaches to automatic schema matching

linguistic approaches description matching
Linguistic Approaches:Description Matching
  • Schemas can contain comments in natural language that express the intended semantics of the schema elements.
  • Example

S1: empn // employee name

S2: name // name of employee

  • Can be as simple as keyword extraction and synonym matching, or as complex as using natural language understanding technology.

Bernstein P, Rahm E. A survey of approaches to automatic schema matching

constraint based
Schemas often contain constraints to define data types and value ranges, optionality, relationship types, cardinalities, etc.Constraint Based

Bernstein P, Rahm E. A survey of approaches to automatic schema matching

reusing schema and mapping information
Reusing Schema and Mapping Information
  • The effectiveness of matching can be improved with the reuse of common schema components and previously determined mappings.
  • Many schemas are often very similar to each other and previously matched schemas.

i.e. In E-Commerce, substructures often repeat within

different message formats (address fields, name fields)

  • A schema library should be created and the schema editors should access the library to use predefined terms and definitions.

Bernstein P, Rahm E. A survey of approaches to automatic schema matching

schema mapping reuse

Schema S1

Schema S

Schema S2

Purchase-order

Product

BillTo

Name

Address

ShipTo

Name

Address

ContactPhone

Purchase-order

Product

BillTo

Name

Address

ShipTo

Name

Address

Contact

Name

Address

POrder

Article

Payee

BillAddress

Recipient

ShipAddress

Schema Mapping Reuse
  • Example
  • Problems:

1. Determining which part of a new schema is similar to some part of a previously matched one is a match problem itself.

2. Similarity values may depend on the domain. i.e. Salary and income may be identical in payroll application but not in a tax reporting application

Bernstein P, Rahm E. A survey of approaches to automatic schema matching

instance level approaches
Instance Level Approaches
  • Why?

1. Little or no schema information available.

2. Enhancement of schema-level matchers. Instance data gives insight to the contents and meaning of schema elements.

3. To match instance-level data.

  • How?

1. Preferred Method: Linguistic Characterization

2. Constraint-based Characterization

i.e. Ranges

3. Auxiliary Information

4. Also uses both rule-based and learner-based techniques.

  • Main Problem: When comparing data at the instance-level it is likely that there will be a ton of possible match combinations, a lot of which are irrelevant.

Bernstein P, Rahm E. A survey of approaches to automatic schema matching

rule based solutions
Rule Based Solutions
  • Rule-Based: hand crafted rules to exploit schema information
    • element names, data types, structures and subelements.
    • Ie: two elements match if they have the same name and the same number of subelements

Doan A, Halevy A. Semantic Integration Research in the Database Community: A Brief Survey.

learner based solutions
Learner Based Solutions
  • Learner-Based: exploit both schema and data.
  • Requires a lot of training data but can exploit data.
  • Rule and learner based techniques combined provide an effective matching solution.

Doan A, Halevy A. Semantic Integration Research in the Database Community: A Brief Survey.

combining different matchers
Combining Different Matchers
  • The ideal matching system must exploit many different types of information and technique for maximum accuracy.
  • More match candidates will be produced if the previous approaches are combined.
  • Two Combination Methods:

1. Hybrid: integrates multiple matching criteria.

Better performance.

2. Composite: combine the results of independently executed matchers.

More flexible.

Can be done automatically or manually.

Bernstein P, Rahm E. A survey of approaches to automatic schema matching

outline3
Outline
  • Introduction
  • Application Domains
  • Classification of Schema Matching Approaches
  • Current Work
  • MWSAF Matching
  • Open Research Directories
  • Conclusion
lsd univ of washington
LSD (Univ. of Washington)
  • Learning Source Descriptions
  • Uses machine learning techniques to match a new data source against a previously determined global schema.
  • Uses a name matcher and several instance-level matchers.
  • System is trained with sample user inputs and it learns patterns and matching rules.
  • Mostly instance-oriented but can use schema information too.
  • Also supports user input domain constraints on the global schema.

Bernstein P, Rahm E. A survey of approaches to automatic schema matching

skat stanford university
SKAT (Stanford University)
  • Semantic Knowledge Articulation Tool
  • Follows a rule-based approach to semi-automatically determine matches between two ontologies.
  • User input required:

* The user must provide application specific match/mismatch relations.

* The user must approve or reject matches.

  • SKAT matching is used within the ONION architecture for ontology integration.
  • In ONION, an “articulation ontology” is constructed from the rules. Matching is based on is-a relationships between the articulation ontology and the source ontology.

Bernstein P, Rahm E. A survey of approaches to automatic schema matching

transscm tel aviv university
TransScm (Tel Aviv University)
  • Uses schema matching to derive an automatic data translation between schema instances.
  • Schemas are transformed into labeled graphs.
  • Matching is performed node by node (element-level, 1:1) starting at the top.
  • Requires user intervention if no match is found (i.e. to provide a new rule).

Bernstein P, Rahm E. A survey of approaches to automatic schema matching

dike univ of reggio calabria univ of calabria
DIKE (Univ. of Reggio Calabria, Univ. of Calabria)
  • Compares pairs of objects by their attributes and the is-a relationships that they are involved in.
  • These pairs are given a match score between 0 and 1.
  • User must specify synonyms, homonyms, and inclusion properties.

Bernstein P, Rahm E. A survey of approaches to automatic schema matching

cupid microsoft research
Cupid (Microsoft Research)
  • Hybrid matcher
  • Element and Structural-Level matches.

Phase 1:

Linguistic Element-Level.

- categorizes elements based on name, data types, and domains.

- calculates a linguistic similarity coefficient.

Phase 2:

- transform the original schema into a tree then perform a bottom-up structure matching.

- calculates a similarity value.

- calculates a weighted mean of linguistic and structural similarity of pairs of elements

Phase 3:

- uses the mean from phase 2 to decide on a mapping.

Bernstein P, Rahm E. A survey of approaches to automatic schema matching

clio ibm almaden and univ of toronto
Clio (IBM Almaden and Univ. of Toronto)
  • Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema.
  • Three Components:

Schema Readers: read schema and translate it into an internal representation.

Correspondence Engine: is used to identify matching parts of the schemas or databases.

Mapping Generator: generates view definitions to map data in the source schema to data in the target schema.

Bernstein P, Rahm E. A survey of approaches to automatic schema matching

similarity flooding stanford univ and univ of leipzig
Similarity flooding (Stanford Univ. and Univ. of Leipzig)
  • Graph Matching Algorithm.
  • Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs.
  • Uses a name matcher to get an initial element-level match that is then given to the structural matcher.

Bernstein P, Rahm E. A survey of approaches to automatic schema matching

delta mitre
Delta (Mitre)
  • Uses attribute descriptions to determine attribute matches.
  • The method is to group the metadata about an attribute into a text string which is presented as a document. The user is then presented with other ‘documents’ with matching attributes and can chose from those.

Bernstein P, Rahm E. A survey of approaches to automatic schema matching

tess univ of massachusetts amherst
Tess (Univ. of Massachusetts, Amherst)
  • System for helping to cope with schema evolution.
  • Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema.

Bernstein P, Rahm E. A survey of approaches to automatic schema matching

outline4
Outline
  • Introduction
  • Application Domains
  • Classification of Schema Matching Approaches
  • Current Work
  • MWSAF Matching
  • Open Research Directories
  • Conclusion
mwsaf meteor s web service annotation framework lsdis lab uga
MWSAF: Meteor-S Web Service Annotation FrameworkLSDIS Lab, UGA
  • What is it?

A tool for semi-automatically marking up web service descriptions with ontologies.

It helps in describing services semantically and aids in efficient web service discovery and composition.

mwsaf annotation tool
MWSAF Annotation Tool
  • Input: WSDL File
    • Individual elements of the WSDL are matched to concepts in the domain
    • The WSDL is classified into a domain.
    • The Matches are given to the user to accept or reject.
    • Upon the user’s acceptance, the annotations are written to the WSDL.
  • Output: WSDL File with semantic annotations
mwsaf architecture
MWSAF Architecture

Main Components of the System:

  • Ontology Store: stores the DAML and RDF ontologies that will be used to annotate the WSDL files. Ontologies are categorized by domain.
  • Parser Library: consists of the parsers used to generate the SchemaGraphs.
  • Matcher Library: provides schema matching algorithm.

Patil A, Oundhakar S, Sheth A, Verma K. METEOR-S Web service Annotation Framework

mwsaf schema graphs
MWSAFSchema Graphs

PROBLEM: The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly.

MWSAF converts both models to a common

representation format called SchemaGraph.

A SchemaGraph is a set of nodes connected by edges that are created using conversion functions.

Then it applies a matching algorithm to find the

mappings between them.

Patil A, Oundhakar S, Sheth A, Verma K. METEOR-S Web service Annotation Framework

mwsaf meteor s web service annotation framework xml to schemagraph conversion rules

Direction

compass

hasElement

degrees

Direction

Compass

SchemaNode representation of XML schema

MWSAF: Meteor-S Web Service Annotation FrameworkXML to SchemaGraph conversion rules

<xsd:complexType name="Direction">

<xsd:sequence>

<xsd:element maxOccurs="1" minOccurs="1"

name="compass" nillable="true"

type="xsd1:DirectionCompass" />

<xsd:element maxOccurs="1" minOccurs="1"

name="degrees" type="xsd:int" />

</xsd:sequence>

</xsd:complexType>

Patil A, Oundhakar S, Sheth A, Verma K. METEOR-S Web service Annotation Framework.

mwsaf meteor s web service annotation framework ontology to schemagraph conversion rules

WindEvent

hasProperty

windSpeed

windDirection

Speed

SchemaGraph representation of part of ontology

MWSAF: Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rules

<daml:Class rdf:ID="WindEvent">

<rdfs:comment>Superclass for all events dealing with wind</rdfs:comment>

  <rdfs:label>Wind event</rdfs:label>

  <rdfs:subClassOf rdf:resource="#WeatherEvent" />

</daml:Class>

<daml:Property rdf:ID="windDirection">

<rdfs:label>Wind direction</rdfs:label>

<rdfs:domain rdf:resource="#WindEvent" />

<rdfs:range rdf:resource = "http://www.w3.org/2000/10/XMLSchema#string" />

</daml:Property><daml:Property rdf:ID="windSpeed">

<rdfs:label>Wind speed</rdfs:label>

<rdfs:domain rdf:resource="#WindEvent" />

<rdfs:range rdf:resource="#Speed" />

</daml:Property>

Patil A, Oundhakar S, Sheth A, Verma K. METEOR-S Web service Annotation Framework.

mapping
Mapping
  • Measures of the Match Score:

-Element Level Match: linguistic similarity of two concepts based on names. Uses WordNet to check for synonyms. Abbreviations are even checked.

-Schema Match: structural similarity, sub-concept similarities.

  • The getBestMapping function then looks at the Match Scores and determines a map set.

Patil A, Oundhakar S, Sheth A, Verma K. METEOR-S Web service Annotation Framework

mwsaf matching techniques elemmatch
MWSAF Matching Techniques:ElemMatch
  • Name and String Matching algorithms:

-NGram: considers the number of qgrams that the names have in common.

-CheckSynonym: uses Wordnet to find synonyms.

-CheckAbbreviations: uses an abbreviation dictionary.

-TokenMatcher: uses Porter Stemmer tonkenization and substring matching techniques.

  • Each algorithm returns a value between 0 and 1. These values are used in an equation for the final match score.

Patil A, Oundhakar S, Sheth A, Verma K. METEOR-S Web service Annotation Framework

matching
Matching
  • Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology,

Then two measures are derived from the mapping:

-Average Concept Match: tells the user about the degree of similarity between matched concepts of the WSDL and ontology.

-Average Service Match: helps to categorize the service.

*We have a machine learning alternative for categorization!

Patil A, Oundhakar S, Sheth A, Verma K. METEOR-S Web service Annotation Framework

outline5
Outline
  • Introduction
  • Application Domains
  • Classification of Schema Matching Approaches
  • Current Work
  • MWSAF Matching
  • Open Research Directories
  • Conclusion
current and future issues
Current and Future Issues
  • User Interaction: minimize user input but maximize impact of the feedback
  • Real World Analysis: can the current matching techniques be used in real world situations?
  • P2P data management
  • Mapping Maintenance: what happens when you map between two schemas and then one changes?
  • Developing global schemas (or ontologies) for domains.
  • Dealing with inconsistent data values for a schema element.

Doan A, Halevy A. Semantic Integration Research in the Database Community: A Brief Survey.

more issues
More Issues
  • If we require user acceptance for our matches, then what happens if our matcher returns thousands or hundreds of matches?
  • Is it unrealistic to think that we will eventually perfect our matchers?

Doan A, Halevy A. Semantic Integration Research in the Database Community: A Brief Survey.

conclusion
Conclusion
  • It is necessary to automate the matching process.
  • Schema matching is very difficult and expensive.
  • We have looked at a taxonomy and the descriptions of the existing approaches for matching.

-Schema vs Instance-level

-Element vs Structure-level

-Language and Constraint based matchers.

  • We also discussed several implementations of the matching techniques.
references
References
  • Bernstein P, Rahm E. A survey of approaches to automatic schema matching. www.research.microsoft.com/~philbe/VLDBJ-Dec2001.pdf
  • Doan A, Halevy A. Semantic Integration Research in the Database Community: A Brief Survey.

http://anhai.cs.uiuc.edu/public/db-review14.pdf

  • Patil A, Oundhakar S, Sheth A, Verma K. METEOR-S Web service Annotation Framework. POSV-WWW2004.pdf
  • Vassilis C, Integrating XML Data Sources using RDF/S Schemas: The ICS-FORTH Semantic Web Integration Middleware (SWIM). Dagsthul Seminar

ftp://ftp.dagstuhl.de/pub/Proceedings/04/04391/04391.ChristophidesVassilis.Slides.pdf