1 / 18

A Novel Approach for Entity Linkage

A Novel Approach for Entity Linkage. IEEE-IRI2009, Las Vegas 2009-08-11 Heiko Stoermer , Paolo Bouquet University of Trento, Italy. This work is co-funded by the European Commission in the context of the Large-scale Integrated project OKKAM (GA 215032). Outline.

lucky
Download Presentation

A Novel Approach for Entity Linkage

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Novel Approach for Entity Linkage IEEE-IRI2009, Las Vegas 2009-08-11 Heiko Stoermer, Paolo Bouquet University of Trento, Italy This work is co-funded by the European Commission in the context of the Large-scale Integrated project OKKAM (GA 215032)

  2. Outline • Part 1: Background and Context • Part 2: Problem, Approach, Implementation, Results IEEE-IRI2009, Las Vegas

  3. Web 2.0 seen from Outer Space Billions of people who create and share information and content producers (Web2.0)‏ Intelligent (semantic-driven) mash-ups based and its use in new complex and ubiquitous services IEEE-IRI2009, Las Vegas

  4. BUT However ... IEEE-IRI2009, Las Vegas

  5. Flood of Identifiers http://www.reuters.com/news/globalcoverage/barackobama http://www.OPENCALAIS.com/watch?v=z4W2_raF_iw http://en.wikipedia.org/wiki/Barack_obama ?? http://www.facebook.com/home.php#/barackobama?ref=s http://dbpedia.org/resource/Barack_Obama http://farm4.static.flickr.com/3193/2437394249_824e76ed76.jpg?v=0 http://www.linkedin.com/in/barackobama IEEE-IRI2009, Las Vegas

  6. Too many identifiers for the same thing out there … … not much used in content production … and poorly interlinked How do I find out what Web users have to say about our product XYZ? How can I avoid advertising restaurants in Venice (FL) for a query about Venice (IT)? How do we collect distributed information about a specific customer or project in a complex Intranet environment? In short: how can we enable mash-ups based on: select * from Web where ID=”…” on the Web of Data or in an enterprise-wide Intranet? The Flood of Identifiers IEEE-IRI2009, Las Vegas

  7. Our Wish for The Web X.0 ... IEEE-IRI2009, Las Vegas

  8. A Possible Solution -> An Entity Name System for the (Semantic) Web APIs • Open, decentralized service • Provides IDs for annotating any content in any application • Supports reuse of IDs • Maps ID schemas onto each other • Based on HTTP IEEE-IRI2009, Las Vegas

  9. The ENS – A large „phonebook“ • Input: • a simple search query • a reference record • Output: a re-usable entity identifier • Under the hood: • large-scale entity repository • pre-populated • collaboratively growing • entity matching architecture IEEE-IRI2009, Las Vegas

  10. ENS Overview IEEE-IRI2009, Las Vegas

  11. Part 2 IEEE-IRI2009, Las Vegas

  12. Entity Matching • Related work under different names: merge-purge, record linkage, deduplication, entity consolidation, entity linkage... • New aspects: • unknown entity representation • unknown query representation • multi-linguality • Our problem: • answer an entity search query with high top-1 success rate in very short time IEEE-IRI2009, Las Vegas

  13. Bottom-up Study • We asked about 250 individuals from all over the world which feature names they would use to describe a certain set of entity types • Key result • „name“ feature shared between all analyzed types • „name“ feature with very high relevance for all analyzed types IEEE-IRI2009, Las Vegas

  14. Name-feature based Entity Similarity IEEE-IRI2009, Las Vegas

  15. Avoiding „Spam“ • Example: • Q={q1, q2} • E={e1,e2,e3} • Establish fsim() for every pair (q,e) • Select only maximum similar pairs • Build final score between Q and E IEEE-IRI2009, Las Vegas

  16. Benchmark based on 67 example queries ~ 590k entities Top-1 improvement of ~12% over reference algorithm No performance penalty Results IEEE-IRI2009, Las Vegas

  17. Future Work • Improved similarity measure based on a knowledge model inferred from our study • Evaluation in the context of the 2009 Ontology Matching Contest (entity track) IEEE-IRI2009, Las Vegas

  18. Thank You! Contact stoermer@disi.unitn.it if you are interested in using the ENS in your experiments/projects/solutions.  IEEE-IRI2009, Las Vegas

More Related