1 lri paris sud university cnrs 2 etis cergy pontoise 3 inria saclay le de france wod 2013
This presentation is the property of its rightful owner.
Sponsored Links
1 / 13

1 LRI –Paris Sud University & CNRS 2 ETIS, Cergy-Pontoise 3 INRIA Saclay Île de France PowerPoint PPT Presentation


  • 78 Views
  • Uploaded on
  • Presentation posted in: General

N2R-Part: Identity Link Discovery using Partially Aligned Ontologies by Nathalie Pernelle 1 , Fatiha Saïs 1 , Brigitte Safar 1 , Maria Koutraki 2 and Tushar Ghosh 1,3. 1 LRI –Paris Sud University & CNRS 2 ETIS, Cergy-Pontoise 3 INRIA Saclay Île de France WOD 2013. Context.

Download Presentation

1 LRI –Paris Sud University & CNRS 2 ETIS, Cergy-Pontoise 3 INRIA Saclay Île de France

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


1 lri paris sud university cnrs 2 etis cergy pontoise 3 inria saclay le de france wod 2013

N2R-Part: Identity Link DiscoveryusingPartiallyAligned Ontologiesby Nathalie Pernelle1, Fatiha Saïs1, Brigitte Safar1, Maria Koutraki2 and TusharGhosh1,3

1LRI –Paris Sud University & CNRS

2ETIS, Cergy-Pontoise

3INRIA SaclayÎle de France

WOD 2013


Context

Context

  • Discover identity links between data items inRDF data sources structured by distinct owl ontologies

    same restaurant, same laboratory …

  • Existing data linking tools exploit the mapped entities (classes, properties) of the ontologies to definelinkingrules [Silk, LDIF]

    A = {HumanBeingPerson, foaf:namename, …}

  • Some of these mappings can be declared or discovered by (semi-)automatic alignment tools [Shvaiko & all 2012]

    • But the set of mappings can be incomplete, in particular the set of property mappings


Two simple ontologies

Two simple ontologies

subsumption

street

street

String

String

Address

Address

city

city

mappedprop.

location

hasLocation

own

hasOwner

Person

Restaurant

Restaurant

Person

unmappedprop.

hasChief

Chief

hasCook

food

cuisineType

smoking

name

name

phonenum

phone

acceptedCard

creditCard

rname

title

O2

O1

String

String

Class mappings (complete set) : {Restaurant  Restaurant, …}

Propertymappings: {street street, rname  title, city  city, hasLocation  Location}


Two restaurants to compare

Two Restaurants to compare

Lotus bleu

Lotus bleu

name

title

r2

r1

food

thai

creditCard

location

food

Visa card

hasLocation

asian

food

cuisineType

cuisineType

a2

smoking

chinese

a1

acceptedCard

thai

acceptedCard

Onlyat bar

own

asian

phone

phonenum

phone

hasOwner

3368555158

Visa card

3368555158

p2

3368555158

p1

Mastercard

in O1

in O2


1 lri paris sud university cnrs 2 etis cergy pontoise 3 inria saclay le de france

Aim

The “values” of the mapped properties can be very heterogeneous, or even unknown for some instances

Street : downing St, London, SW1A 2AA

10 Downingstreet

How to improve the recall in such a context ?


Main ideas

Main ideas

  • Exploit unmapped properties to increase the similarity scores

    • Exploit the ontology semantics and the property values to select the best comparable properties for two compared class instances

    • Combine similarities between mapped properties and selected unmapped properties

  • Propagate the similarities thanks to a graph-based data linking approach

    same Restaurant  same Address  same City  sameCountry

    • Focus on Data sources that can be replicated locally

  • Extend an existing graph-based data linking tool

    (N2R [Sais et al 09])


N2r l inking t ool

N2R LinkingTool

  • Knowledge-basedapproach (i.e. keys)

    Common mappedkeys of O1/O2 (cartesianproduct)

    O1:name,O2:birthDate,deathDate name+birthdate+deathdate

  • Non linearequation system

    • Eachequationrepresentshow a similarity score xi canbecomputedusingrelatedsimilarity scores

      fi(X)= max (fi-df(X), fi-ndf(X))

      • Solvedthanks to an iterativemethod


1 lri paris sud university cnrs 2 etis cergy pontoise 3 inria saclay le de france

Impacts and propagation

Mapped

{Le lotus bleu}, {le lotus bleu}

Mapped

{17 rue Polar}, {rue Polar}

r1,r2

key

key

a1,a2

Best comparable

Object Properties

key

Best comparable Data Type properties

{thai,asian} {thai,asian, chinese}

{3368555158},{3368555158, 33888…}

{Visacard}, {Mastercard, Visacard}

p1,p2

Mapped

{Paris}, {Paris}

Mapped

{Chang Lee} {Chang lee}


Comparable properties

Comparable properties

  • Exploit the ontology to select comparable properties

  • Comparable objectproperties

    itexists one compatible (more specific or equivalent) domain and one compatible range, and inverse properties are considered

    own (domain Person, range Restaurant)

    is comparable to

    Inverse(hasOwner) (domain Restaurant, range Person)

    Inverse(haschief) (domain Restaurant, range Chief)

  • Comparable datatypeproperties

    compatible w.r.t the datatypes of XML schema

    cuisineTypeis comparable to food, acceptedCard … (domain Restaurant, range string)


  • Similarity of best comparable properties

    Similarity of Best comparable properties

    Exploit property values to select the best comparable properties for two compared class instances

    • For 2 datatypeproperty values : elementarysimilaritymeasures

      sim(«asian », « asian ») =1

    • Sum ( >giventhreshold)

      (i1, i2, prop1, prop2, sum,maxNumberOfPropertyInstances)

      (r1, r2, cuisineType, food, 2, 3)

    • Finally, similarity of (r1,r2) based on unmappeddatatypeproperties

      simNAP(r1,r2)= (1+2+1)/(2+3+2)=0.43

    • Sameprocess for objectproperty values, but propagation


    Extension of n2r

    Extension of N2R

    • Keep the key importance in the equation

    • Give a bigger importance to the mappedproperties

      fi(X)=max(fi-df(X),

      (fi-map(X) + α fi-unmap(X))


    Conclusions future work

    Conclusions – Future Work

    • Conclusions

    • Extension of a graph-based data linking tool to take into account unmapped properties

      • Future Work

    • Evaluation of this strategy on real data sets

    • Focus on declared (or learned) unmapped keys/unmapped discriminative properties [symeonidou11, atencia12]

      (i.e select phone, but not creditCard)

    • Discover new mappings between properties thanks to discovered links


    Thank you for your attention

    Thank you for your attention!

    Questions?


  • Login