D couverte de mappings entre schemas les diff rentes approches schema matching different approaches l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 18

Découverte de mappings entre schemas : les différentes approches Schema Matching : Different Approaches PowerPoint PPT Presentation


  • 117 Views
  • Uploaded on
  • Presentation posted in: General

Découverte de mappings entre schemas : les différentes approches Schema Matching : Different Approaches. Khalid Saleem LIRMM. RDF Schema. XML Schema. XML. RDF. OWL. Schema and Ontology. Schema represents Database Community

Download Presentation

Découverte de mappings entre schemas : les différentes approches Schema Matching : Different Approaches

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Découverte de mappingsentre schemas :les différentes approchesSchema Matching : Different Approaches

Khalid Saleem

LIRMM


RDF Schema

XML Schema

XML

RDF

OWL

Schema and Ontology

  • Schema represents Database Community

    • Schemas often do not provide explicit semantics of their data (ER, XML document schema).

  • Ontology represents the AI Community

    • Ontologies are logical systems that themselves obey some formal semantics. Designed to be interpreted by computers for reasoning (OWL)

  • Schemas and Ontologies are similar in the sense that

    • Both provide a vocabulary of terms that describes a domain

    • Both constraint the meaning of terms used in vocabulary (Hierarchy/ relations)


<class-def>

<name>branch</name>

<slot-constraint>

<name>is-part-of</name>

<has-value>tree</has-value>

</slot-constraint>

</class-def>

XML

class-def animal

%plants are a class that is disjoint from animals

class-def plant subclass-ofNOT animal

%it isnecessary but not sufficientfor a tree to be a plant:

class-def tree subclass-of plant

%branches arePART OFtrees

class-def branch

slot-constraint is-part-of has-value tree

%it isnecessary and sufficientfor a carnivore to be an animal:

class-defdefined carnivore subclass-of animal

slot-constraints eats value-type animal

%herbivores eat only plantsORpart of plants

class-defdefined herbivore subclass-of animal

slot-constraint eats value-type plant OR

(slot-constraint is-part-of has-value plant)

DAML+OIL

Schema vs Ontology : examples


Books

Source A

Books

Source B

price book-title author-name

listed-price title a-fname a-lname

16,50 Nous Les Dieux Bernard Werber

24 Pompei Robert Harris

26,60 Harry Potter J. K. Rowling

11,50 Marie Des Intrigues Juliette Benzoni

Match

  • Takes two schemas/ontologies as input and produces a mapping between elements of the two schemas that correspond semantically to each other

complex match

1-1 match


Schema Matching vs Ontology Matching

  • Schema matching is usually performed with the help of techniques trying to guess the meaning encoded in the schemas

  • Ontology matching try to exploit knowledge explicitly encoded in the ontologies.`

In real world applications :

Solutions from both domains are mutually beneficial


Application Domains

  • Traditional (Static)

    • Schema Integration

    • Data warehousing

    • E-commerce

    • Catalogue Integration

  • New Frontiers (Dynamic)

    • Semantic Query Processing

    • Agent Communication

    • Web Services Integration

    • P2P Databases


Basic Classification of Matchers [RB01]

  • Schema vs Data Instance

  • Element vs Structure

  • Language vs Constraint

    • String based : Prefix, Suffix e.g. auth: author

    • Tokenization, Lemmatization, Eliminition [GSY04]

      Tool_Kit :(Tool,Kit), Kits:Kit, IsRelatedTo : Related

    • Data Types, Value domain e.g. 1..12 : month

  • Match Cardinalities - 1:1, 1:n, n:m

    (Tel Res, Other) : (Tel Day, Evening, Night)

  • Auxiliary Information

    • Global Schema, Dictionaries, Thesauri, Previous Match Decisions, User Input


Basic Classification of Matchers [SE05]

  • Structure Level Techniques

    • Graph Matching

    • Children

    • Leaves

    • Relations

  • Taxonomy based Techniques

    e.g if super concept is same then sub concepts are same or vice versa

  • Model Based

    • ER, XML or XML schema, OWL, OO etc.

Combinational Matchers[RB01]

  • Hybrid Matcher

  • Multiple/Composite Matcher


Match Dimensions [SE05]

For Match Algorithms designing

We need the knowledge for its utilization i.e. Dimensions

  • Input of the Algorithm

    • Data or Schema, Element level or Structure Level

  • Characteristics of the Matching Process

    • Require exact or approximate matching

    • Performance over quality

  • Output of the Algorithms

    • Output is a graded result, or part of a set of match algorithms which are combined together for a map result


Existing Matching Tools

  • Cupid[MBR01]

  • COMA (COMA++)[ADMR05]

  • Similarity Flooding

  • SemInt

  • Artemis

  • DIKE

  • TransScm

  • AutoMed

  • Charlie[TBBT04]

Ontologies Specific

  • NOM/ QOM

  • OLA

  • Anchor-PROMPT

  • S-Match [GSY04]

  • HICAL

  • SKAT


Matching Tools continued

Machine Learning

  • GLUE (LSD, CGLUE)[DMDH02]

  • Automatch

  • These tools do not completely fulfil the requirements for large scale schema matching because

    • Not fully automated

    • Emphasise less on search space optimisation


b

b

b

w

b

a

p

t

w

w

f

f

t

t

n

h

o

d

g

n

n

n

n

p

p

i

i

t

h

a

t

p

r

n

n

Our Approach

a: author

b: book

d: detail

f: information

g: general

h: birth

i: isbn

n: name

o: own-books

p: publisher

r: price

t: title

w: writer

  • Motivation :

    • Large Scale Scenario

      Peer-to-peer Information Systems over the XML Web

  • Our Schema Matching and Integration Approach

    • Tree Mining Techniques

      • Name Matcher

      • Element Level Matching

      • Structure Level Matching

a=w

b=o

f=d

Search sub-trees


book

publisher

author

title

n

name

name

n2 [2,2]

b

n0 [0,5]

p

a

t

n5 [5,5]

n1 [1,2]

n3 [3,4]

n

n4 [4,4]

Tree Mining Approach

Inspired from the tree mining algorithms and data structures based on node scope values (calculated by depth first pre-order traversal) Top-down [Z02]

  • Our work extends these data structures for schema matching and integration process for handling large sets of XML schema trees.

  • Employs

    • Element level Name Matcher (same node label or synonym)

      • Cluster similar/synonym labels

    • Utilize the node scope values properties to extract semantics out of structure

      • E.g. node with label name n2[2,2] is a descendent of node with label author n1[1,2] and not of node with label publisher n3[3,4] verified using descendent test

Descendent Node Check :

Scope of Node x is [X,Y] and Scope of Descendent Node xd [Xd,Yd] then Xd>X and Yd<=Y


Tree Mining Approach … continued

  • Data Structure used

    • Label List : Sorted list of all node labels in the forest of XML schema trees

    • xGrid : Matrix in which each row represent each participating XML tree and each column represents the corresponding node label. Each cell contains the scope values, parent node number and mapping information.

  • Output

    • Creation of a Mediated Schema Tree , from the given forest of participating XML schema trees.

    • Generation of Mapping Information between participating schema trees and the mediated schema tree


Sm

S1

S2

S3

S4

Mapping Information is the column number of node

Tree Mining Approach … continued


Conclusion

  • Element level Name and Linguistic Matching with the support of thesaurus is an integral part of every Match system.

  • With systems moving towards schema/ontology based manipulation, and lack of global schemas or previous matching results, Structure Level matching is equally important for making out the semantics.

  • Peer-to-peer environment requires new methods to be exploited for performance and quality mapping i.e. integration of Tree Mining techniques for matching purposes and search space optimisation.

  • Machine Learning algorithms can be beneficial in the P2P environment in later stages when training examples have been created from instance data, provided the target domain remains the same.


References

  • [AH04] Antoniou G., Harmelen F. A Semantic Web Primer, The MIT Press, 2004

  • [ADMR05] Aumuller D., Do H. H. , Massmann S., and Rahm E. Schema and ontology matching with COMA++. In Proceedings of the International Conference on Management of Data (SIG-MOD), 2005

  • [BR04] Bellahsène Z. and Roantree M. (2004) Querying Distributed Data in a Super-peer based Architecture. DEXA 2004.

  • [BMP04] Bernstein PA., Melnik S., Petropoulos M. and Quix C. (2004) Industrial-Strength Schema Mapping. SIGMOD Record, Vol. 33, No. 4, December 2004

  • [DMDH02] Doan AH., Madhavan J., Domingos P. and Halvey A. (2002) Learning to Map Ontologies on the Semantic Web. WWW 2002

  • [MBR01] Madhavan J., Bernstein PA. and Rahm E. (2001) Generic Schema Matching with Cupid. VLDB 2001.

  • [RB01] Rahm E. and Bernstein PA (2001) A Survey of Approaches to Automatic Schema Matching. VLDB Journal 2001 : 10(4):334-3503

  • [SE05] Shvaiko P. and Euzenat J. (2005) A Survey of Schema-based Matching Approaches. Journal on Data Semantics, 2005.

  • [TBBT04] Tranier J., Baraer R., Bellahsene Z. and Teisseire M (2004) Where’s Charlie: Family Based Heuristics for Peer-to-Peer Schema Integration. IDEAS 2004, 227-235

  • [Z02] Zaki MJ (2002) Efficiently Mining Frequent Trees in a Forest. 8th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining. July 2002

  • http://www.w3.org/TR/daml+oil-reference

  • http://www.doc.ic.ac.uk/automed/


Thank you


  • Login