1 / 44

Christian Bizer, Freie Universität Berlin

3rd Asian Semantic Web Conference (ASWC 2008) DIST Workshop, Bangkok, Thailand 8 December 2008 Fusing the Web of Data. Christian Bizer, Freie Universität Berlin. Overview. The Web of Data Linked Data Principles Linked Data Deployment Applications that consume Linked Data Linked Data Fusion

felcia
Download Presentation

Christian Bizer, Freie Universität Berlin

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 3rd Asian Semantic Web Conference (ASWC 2008)DIST Workshop, Bangkok, Thailand8 December 2008Fusing the Web of Data Christian Bizer, Freie Universität Berlin

  2. Overview • The Web of Data • Linked Data Principles • Linked Data Deployment • Applications that consume Linked Data • Linked Data Fusion • The Linking Process • Inconsistency Resolution • Provenance Tracking and Explanations

  3. The Classic Web Single global information space • URLs as • globally unique IDs • retrieval mechanism • HTML as shared content format • Hyperlinks Shortcomings • Content is not well structured • You can not ask expressive queries • You can not process content within applications Search Engines Web Browsers HTML HTML HTML hyper-links A C B

  4. Linked Data • Use Semantic Web technologies to • publish structured data on the Web, • set links between data from one data source to data within other data sources. Thing Thing Thing Thing Thing Thing Thing Thing Thing Thing typedlinks typedlinks typedlinks typedlinks A E C D B

  5. Linked Data Principles • Use URIs as names for things. • Use HTTP URIs so that people can look up those names. • When someone looks up a URI, provide useful RDF information. • Include RDF statements that link to other URIs so that they can discover related things. Tim Berners-Lee 2007 http://www.w3.org/DesignIssues/LinkedData.html

  6. The RDF Data Model rdf:type foaf:Person pd:cygri foaf:name Richard Cyganiak foaf:based_near dbpedia:Berlin

  7. Data objects are identified with HTTP URIs rdf:type foaf:Person pd:cygri foaf:name Richard Cyganiak foaf:based_near dbpedia:Berlin pd:cygri = http://richard.cyganiak.de/foaf.rdf#cygridbpedia:Berlin = http://dbpedia.org/resource/Berlin

  8. 3.405.259 dp:population skos:subject dp:Cities_in_Germany Dereferencing URIs over the Web rdf:type foaf:Person pd:cygri foaf:name Richard Cyganiak foaf:based_near dbpedia:Berlin

  9. 3.405.259 dp:population skos:subject dp:Cities_in_Germany Dereferencing URIs over the Web rdf:type foaf:Person pd:cygri foaf:name Richard Cyganiak foaf:based_near dbpedia:Berlin skos:subject dbpedia:Hamburg dbpedia:Muenchen skos:subject

  10. The Disco – Hyperdata Browser

  11. 2. Linked Data Deployment on the Web • Is this real? Thing Thing Thing Thing Thing Thing Thing Thing Thing Thing typedlinks typedlinks typedlinks typedlinks A E C D B

  12. W3C Linking Open Data Project • Community effort to • publish existing open license datasets as Linked Data on the Web • interlink things between different data sources

  13. LOD Datasets on the Web: May 2007 • Over 500 million RDF triples • Around 120,000 RDF links between data sources

  14. Example RDF Links • RDF links from DBpedia to other data sources • RDF link from a FOAF profile to DBpedia <http://dbpedia.org/resource/Berlin> owl:sameAs <http://sws.geonames.org/2950159> . <http://dbpedia.org/resource/Tim_Berners-Lee> owl:sameAs <http://www4.wiwiss.fu-berlin.de/dblp/resource/person/100007> . <http://richard.cyganiak.de/foaf.rdf#cygri> foaf:topic_interest <http://dbpedia.org/resource/Semantic_Web> .

  15. LOD Datasets on the Web: February 2008

  16. LOD Datasets on the Web: September 2008 > 2 billion RDF triples > 6 million RDF links

  17. The Bio2RDF Project Goals Make bioinformatics data available in RDF format on the Web. Promote the linked data vision within the bioinformatics community. Answer questions which were not possible or practical to ask before. Participants Université Laval, Canada Queensland University of Technology, Australia

  18. The Bio2RDF Cloud 27 data sources 260 million records 2,7 billion RDF triples

  19. 3. Applications • What can I do with this? Linked Data Browsers Linked DataMashups Search Engines Thing Thing Thing Thing Thing Thing Thing Thing Thing Thing typedlinks typedlinks typedlinks typedlinks A E C D B

  20. Linked Data Browsers • Tabulator Browser (MIT, USA) • Disco Hyperdata Browser (FU Berlin, DE) • OpenLink RDF Browser (OpenLink, UK) • Zitgist RDF Browser (Zitgist, USA) • Humboldt (HP Labs, UK) • Fenfire (DERI, Irland) • Marbles (FU Berlin, DE)

  21. Linked Data Mashups • Domain-specific applications using Linked Data from the Web

  22. DBtune Slashfacet • Visualizes music-related Linked Data • Uses LastFM, MySpace, and BBC data

  23. DBpedia Mobile • Geospatial entry point into the Web of Data • Starts with DBpedia, Revyu and Flickr data

  24. DERI Semantic Web Pipes

  25. Web of Data Search Engines • Falcons (IWS, China) • Sindice (DERI, Ireland) • MicroSearch (Yahoo, Spain) • Watson (Open University, UK) • SWSE (DERI, Ireland) • Swoogle (UMBC, USA)

  26. Falcons

  27. Is this good enough? No.

  28. 2. Linked Data Fusion Users want an integrated view on all data that is available about an real-world entity! Application Integrated View owl:sameAs Data Object 5 Data Object 1 Data Object 3 Data Object 6 Data Object 2 Data Object 4 owl:sameAs A C B

  29. Linked Data Fusion - Requirements • Map data into a single schema • so that data can be rendered and queried properly. • Smush data from all sources about a single real-world entity • while keeping track of information provenance. • Resolve inconsistencies in the data • by applying different data fusion heuristics. • Be able to explain the fusion process • Tim Berner-Lee‘s „Oh, yeah?“ button.

  30. Roles in the Linked Data Scenario • Client Application • Mapdataintosingleschema. • Smushdatafrom different sourcesabout real-worldentity. • Resolveinconsistencies in thedata. • Keep trackofinformationprovenanceandlineage. • Explainfusionprocess. • Data Publisher • Publishdataitself • Set RDF links tootherdataitemsdescribingthe same real-worldentity. • Reuse termsfromexistingvocabulariesorset links torelatedschemata. • Publishmetadataabout • provenance • timeliness • datalicense

  31. 2.1 Setting RDF Links • Today: • Simple pattern- and graph-matching based techniques used to generate links. • Usually proprietary code. • There is lots of existing work in database and knowledge representation communities on identity resolution to be used. • Rule-based approaches • Distance-based techniques • Probabilistic matching • Supervised and unsupervised learning • Using a wide range of distance metrics see: Elmagarmid et al: Duplicate Record Detection: A Survey. KaDE, 2007.

  32. Linking Frameworks • Goal: (Semi-)automatically generate RDF Links based on declarative rules. • Ongoing work • Oktei Hassanzadeh (University of Toronto): ODDLinker • Andriy Nikolov et al. (Open University): KnoFuss • Julius Volz (Freie Universität Berlin): XXXX seeAlso: http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/ EquivalenceMining CREATE LINKS owl:sameAs BETWEEN a FROM dbpedia AND b FROM factbook RESTRICT a TO { ?a rdf:type dbpedia-owl:Country } METRIC { STRING_SIMILARITY(a/rdfs:label, b/rdfs:label), NUM_SIMILARITY(a/p:populationEstimate, b/factbook:population_total), NUM_SIMILARITY(a/p:areaKm, b/factbook:area_total) } THRESHOLDS MATCH 0.9 VERIFY 0.7;

  33. Schema Level RDF Links • Today: Simple mappings: • owl:equivalentClass • owl:equivalentProperty • rdfs:subClassOf • rdfs:subPropertyOf • UMBEL effort: • Lots of existing work on schema/ontology matching to build on. • Missing: Agreed-upon way to publish more expressive mappingrules on the Web.

  34. 2.2 Publish Metadata • Document Metadata • Dublin Core, Semantic Web Publishing Vocabulary • Licensing Metadata • Creative Commons Licensing Framework • Open Data Commons Public Domain Dedication & Licence (PDDL) # Metadata and Licensing Information<http://dbpedia.org/data/Alec_Empire> rdf:type foaf:Document ; dc:publisher <http://dbpedia.org/resource/DBpedia> ; dc:date "2007-07-13"^^xsd:date ; dc:rights <http://en.wikipedia.org/wiki/WP:GFDL> . # The Document Content <http://dbpedia.org/resource/Alec_Empire> rdf:type foaf:Person ; foaf:name "Empire, Alec" ; dbpedia-owl:associatedBand dbpedia:Atari_Teenage_Riot ;

  35. 2.3. Provenance and Lineage Tracking • Named Graphs data model • part of W3C SPARQL Recommendation • implemented by an increasing number of RDF stores # TriG Representation of three Named Graphs :G1 { :Monica ex:name "Monica Murphy" . :Monica ex:homepage <http://www.monicamurphy.org> . :Monica ex:email <mailto:monica@monicamurphy.org> .} :G2 { :Monica rdf:type ex:Person . :Monica ex:hasSkill ex:Programming } :G3 { :G1 swp:assertedBy _:w1 . _:w1 swp:authority :Chris . _:w1 dc:date "2003-10-02"^^xsd:date . :G2 swp:quotedBy _:w2 . _:w2 swp:authority :Chris . _:w2 dc:date "2003-09-03"^^xsd:date . }

  36. 2.4. Inconsistency Resolution • There is lots of overlap betweenLOD datasets • Places: Dbpedia, Geonames, Riese, … • People: Freebase, LinkedMDB, DBLP, … • Music: Dbpedia, Musicbrainz, Jamendo,.. • There are naturally lots of inconsistencies • Dbpedia: Person born at date X. • Freebase: Person born at date Y. • Dbpedia: Band album X. • Musicbrainz: Band album Y. • Geonames: City has geo-coordinates • Freebase: City has geo-coordinates

  37. Inconsistency Resolution Strategies • Pass it on. • Pass conflicting values to the user and let him decide. • Take the information • If value is missing in dataset 1, use value from dataset 2 • Trust your friends • Prefer information from certain sources. • Cry with the wolfes • Choose most common value • Meet in the middle • Take the averadge of all values • Keep up to data • Use the newest value SeeAlso: Bleiholder and Naumann: Conflict Handling Strategies in an Integrated Information System. WWW2006.

  38. 2.5. Explain Data Provenance and Fusion Steps • Tim Berner-Lee‘s „Oh, yeah?“ button. • Existing Work: • Deborah McGuinness et al: Inference Web: Portable Explanations for the Web. • Chris Bizer: Web Information Quality Assessment Framework (WIQA)

  39. Example WIQA Explanations

  40. Outlook • Lots of exiting open issues to solve! • DIST related technologies will be one of the hot topicsfor next years (see for instance WWW2009) • Important for LOD • Progress with Publishing Schema Mappings on the Web • Progress with Data Fusion • Linked Data client applications that address all issues mentioned • Please submit such solutions and client applications to the • Semantic Web Challenge 2009 • Linked Data on the Web (LDOW2009) workshop at WWW2009 • IJSWIS Special Issue on Linked Data

  41. Thanks! References • Linking Open Data Project Wiki http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData • Tutorial on How to Publish Linked Data on the Webhttp://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/

More Related