1 / 17

KD2R: a Key Discovery method for semantic Reference Reconciliation

KD2R: a Key Discovery method for semantic Reference Reconciliation. Danai Symeonidou , Nathalie Pernelle and Fatiha Sa ϊ s LRI ( University Paris-Sud) WOD’2013 June , 3th. More and more heterogeneous RDF sources Links can be asserted between them

ros
Download Presentation

KD2R: a Key Discovery method for semantic Reference Reconciliation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. KD2R: a Key Discovery method for semantic Reference Reconciliation DanaiSymeonidou, Nathalie Pernelle and Fatiha Saϊs LRI (University Paris-Sud) WOD’2013 June, 3th

  2. More and more heterogeneous RDF sources • Links can be asserted between them • Same as is one of the most important types of links: combine information given in different data sources • LOD: the number of already existing links is very small • How to create links automatically ? Danai Symeonidou, WOD’2013 Data Linking Linked Open Data cloud

  3. FirstName: George LastName: Thomson SSN : 011223456 Job : Artist Danai Symeonidou, WOD’2013 Data Linking Problem Dataset1 Dataset2 P1 FirstName: George LastName: Thomson SSN : 011223456 Age : 45 P3 FirstName: George LastName: Thomson SSN : 444223456 Job: Professor P2

  4. FirstName: George LastName: Thomson SSN : 011223456 Job : Artist Danai Symeonidou, WOD’2013 Data Linking Problem Dataset1 Dataset2 P1 SameAs FirstName: George LastName: Thomson SSN : 011223456 Age : 45 P3 FirstName: George LastName: Thomson SSN : 444223456 Job: Professor P2

  5. FirstName: George LastName: Thomson SSN : 011223456 Job : Artist Danai Symeonidou, WOD’2013 Data Linking Problem Dataset1 Dataset2 P1 SameAs FirstName: George LastName: Thomson SSN : 011223456 Age : 45 P3 SameAs FirstName: George LastName: Thomson SSN : 444223456 Job: Professor P2

  6. No knowledge given about the properties: • all the properties have the same importance. • Knowledge given by an expert: • Specific expert rules [Arasu and al.’09, Low and al.’01, Volz and al.’09 (Silk)] Example: max(jaro(phone-number;phone-number; jaro-winkler(SSN;SSN)) > 0.88 • Key constraints [Saïs, Pernelle and Rousset’09] Example: hasKey(Museum (museumName) (museumAddress)) • OWL2 Key for a class expression: a combination of (inverse) properties which uniquely identify an entity • hasKey( CE ( OPE1 ... OPEm ) ( DPE1 ... DPEn ) ) Example: hasKey(Museum (museumName) (museumAddress)) expresses: Museum(x1)∧Museum(x2)∧museumName(x1, y)∧museumName(x2, y) ∧museumAddress(x1, w)∧museumAddress(x2, w)  sameAs(x1, x2) Danai Symeonidou, WOD’2013 Data Linking with or without key constraints

  7. Problem: when data sources contain numerous data and/or complex ontologies • Some keys are not obvious to find. • Erroneous keys can be given by the expert. • Aim: automatic discovery of a complete set of keys from data • Naïve automatic way to discover keys: examine all the possible combinations of properties • Example: given an instance described by 15 properties the number of candidate keys is 215-1 = 32767 • For each candidate key we have to scan all the instances of the data • Objective: find efficiently keys by: • Reducing the combinations • Partially scanning the data Danai Symeonidou, WOD’2013 Key DiscoveryProblem

  8. RDF data sources (conforming to an OWL 2 ontology) Mappings between classes and properties of the different ontologies Open world assumption (incomplete data) and multivalued properties may exist How to discover keys when we do not know if : i1 =?= i2 =?=i3 =?=i4 hasFriend(i1,i4), hasFriend(i2, i3) …. ?? firstName(i1, Elodie) … ? Danai Symeonidou, WOD’2013 Key DiscoveryProblem

  9. Unique Name Assumption (UNA): two different URIs refer to distinct entities (data sources generated from relational databases , Yago) • i1 <> i2<> i3 <> i4 • Two literals that are syntactically different are semantically different • (e.g. “Napoleon Bonaparte” <> “Napoleon”) Danai Symeonidou, WOD’2013 Key DiscoveryProblem:Assumptions

  10. Heuristic 1 - Pessimistic: • Not instantiated property  all the values are possible • Example: hasFriend(i2, i3), hasFriend(i4, i2) are possible. • Instantiated property  only given values are considered • Example: not hasFriend(i1, i4) Non keys: {lastName}, {hasFriend} Keys:{firstName}, {lastName, firstName}, {firstName, hasFriend} Undetermined keys: {hasFriend, lastName} Danai Symeonidou, WOD’2013 Key Discovery:Heuristics

  11. Heuristic 1 - Optimistic: • Not instantiated property  value not one of the already existing ones • Example: not hasFriend(i2, i3), not hasFriend(i2, i1), not hasFriend(i2, i4). • Instantiated property  only given values are considered • Example: not hasFriend(i1, i4) Non keys: {lastName}, {hasFriend} Keys:{firstName}, {lastName, firstName}, {firstName, hasFriend}, {hasFriend, lastName} Danai Symeonidou, WOD’2013 Key Discovery:Heuristics

  12. Topological sort of the classes (subsumption) • Key Finder • Discover non keys • Ex: {lastName}, {hasFriend} • Derive keys using non keys • Ex: {firstName}, {lastName, firstName}, {firstName, hasFriend}, {hasFriend, lastName} • Key Merge • Cartesian product of minimal key sets in S1,S2 • Ex. Ks1 = {firstName} Ks2 = {hasFriend} Ks1-s2= {firstName, hasFriend} Danai Symeonidou, WOD’2013 KD2R approach Technical report available: https://www.lri.fr/~bibli/Rapports-internes/2013/RR1559.pdf

  13. Danai Symeonidou, WOD’2013 • Computation of maximal non keys and undetermined keys • Represent data in a prefix-tree (a compact representation of the data of one class) KD2Rapproach: Key Finder

  14. Datasets where KD2R has been tested: Danai Symeonidou, WOD’2013 Validation of approach

  15. Ontologies • Data conforming to one ontology • RDF data • DbpediaNaturalPlace dataset (78400 instances) • OAEIPersondataset (2000 instances) • Data linking • Link data using LN2R • Measure quality of linking using: • recall • precision • f-measure Danai Symeonidou, WOD’2013 Demo

  16. QUESTIONS??? Danai Symeonidou, WOD’2013

  17. THANK YOU!!! Danai Symeonidou, WOD’2013

More Related