1 / 47

Introduction to the Semantic Web and Linked Open Data

Introduction to the Semantic Web and Linked Open Data. Dramatis Personae. Christopher Gutteridge. Nick Gibbins (in spirit). Goals. Overview of issues relating to the publication and use of linked data in HEIs The lessons that we’ve learned! Pragmatism rather than perfection

erling
Download Presentation

Introduction to the Semantic Web and Linked Open Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to the Semantic Web and Linked Open Data

  2. Dramatis Personae Christopher Gutteridge Nick Gibbins(in spirit)

  3. Goals • Overview of issues relating to the publication and use of linked data in HEIs • The lessons that we’ve learned! • Pragmatism rather than perfection • General guidelines rather than detailed specifications • Coining cool URIs • Publication alongside existing resources • Licensing

  4. Don't Panic!

  5. http://is.gd/dqiJc (The only URL you need to write down)

  6. Non-Goals • Detailed tutorial on the finer points of: • RDF • RDFa • RDF Schema • OWL • SPARQL • … (an hour and a half isn’t enough for this – and there are good tutorials available online)

  7. “If HP knew what HP knows, we’d be three times more profitable” Lew Platt Hewlett-Packard Chairman and CEO

  8. Linked Data in a Nutshell http://www.flickr.com/photos/arielarielariel/322301228/

  9. Linked Data is about providing structured data on the Web • Doesn’t necessarily require RDF (though it usually uses it)

  10. JRR Tolkien The Hobbit The triple • Underlying model of triples used to describe the relations between entities in linked data • This is the basis of the RDF data model • (subject, predicate, object) • e.g. “The Hobbit”, “created by”, “JRR Tolkien” created by subject predicate object

  11. Example • Take a citation: • Tim Berners-Lee, James Hendler and Ora Lassila. The Semantic Web. Scientific American, May 2001 • We can identify a number of distinct statements in this citation: • There is an article titled “The Semantic Web” • One of its authors is a person named “Tim Berners-Lee” (etc) • It appeared in a publication titled “Scientific American” • It was published in May 2001

  12. Example • We can represent these statements graphically: 2001-05 The Semantic Web date Tim Berners-Lee title name creator James Hendler name publishedIn creator Ora Lassila creator name title Scientific American

  13. Example • There are two types of node in this graph: • Literals, which have a value but no identity(a string, a number, a date) • Resources, which represent objects with identity(a web page, a person, a journal) Scientific American

  14. http://www.sciam.com/ Example • Resources are identified by URIs • Property labels are also identified by URIs, and are drawn from a vocabulary or ontology http://purl.org/dc/elements/1.1/title Scientific American subject predicate object

  15. Mixing Vocabularies • The triple-based graph model makes it possible to mix terms from different vocabularies in the same graph • Simplifies the task of information integration 2001-05 The Semantic Web date Tim Berners-Lee title name creator James Hendler name publishedIn creator Ora Lassila creator name foaf title dc Scientific American bibo

  16. Linked Data Principles Set of publishing practices for SW data: 1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those names 3. When someone looks up a URI, provide useful information 4. Include links to other URIs. so that they can discover more things Effectively, putting the hypertext back into the Semantic Web Simplifies integration between datasets while maintaining loose coupling

  17. Example graph describing ‘sw’ graph describing ‘tbl’ 2001-05 tbl Tim Berners-Lee name date The Semantic Web tbl title creator graph describing ‘jh’ sw jh publishedIn creator jh James Hendler name ora sciam creator graph describing ‘ora’ ora Ora Lassila graph describing ‘sciam’ name title sciam Scientific American

  18. Person  Document • URIs must only identify one concept. Ever. • I am not my homepage.

  19. Publishing Example • URI represents a person. • Requesting URI via web gets a “See Other” response. • Requester redirected to most appropriate document URL. usually HTML or RDF+XML

  20. Publishing RDF <<>><<><>><>>><>><>><>><>><><>>>><<><><<<<<><><><><><><><><><><><><><<<<>>><><<><><>><> • DON’T worry about understanding the XML. It’s the equivalent of “view-source” in a webpage! • Use a tool to covert it to something less icky! (http:/graphite.ecs.soton.ac.uk/browser/ for example)

  21. Access Control • Worry about it later! • Start with data you can make freely available

  22. Licensing • You want your data to be used & reused, right? • Don’t prevent commercial use. • Don’t prevent derivative works (prevents people using it at all!) • If there are any things which your data should not be used for why are you publishing it?

  23. Licensing Options • Must-Attribute license • Public Domain license (your info still can’t be used in illegal ways, of course) • Procrastinate and worry about it later (much better than not publishing your data)

  24. Breakout

  25. Task • What datasets does your organisation already maintain? • What is the business case for making them available? • in a machine readable form • to all members • without bureaucracy or restriction. • What are the barriers to putting them online and maintaining them? • What are the benefits to the wider community? • What are the risks?

  26. Task • List your 3 easiest wins - the lowest hanging fruit. • Starting suggestion: Every building & campus in your organisation with: • Number • Building Name • Site (Campus) • Lat & Long This data changes very slowly and also made freely available already.

  27. ECS Demo

  28. http://id.ecs.soton.ac.uk/docs/ http://rdf.ecs.soton.ac.uk/person/1248 http://rdf.ecs.soton.ac.uk/project/42

  29. Cool URIs

  30. Beauty http://domain/classOfThing/scheme/identifier http://domain/classOfThing/scheme/identifier.rdf http://domain/classOfThing/scheme/identifier.html http://mysite.org/person/username/t23 http://mysite.org/person/username/t23.rdf http://mysite.org/person/username/t23.html Scheme is optional but futureproofs you against next time the university reorganises everything.

  31. And The Beast http://www.diy.com/diy/jsp/bq/nav.jsp?action=detail&fh_oneslice=true&fh_view_size=10&fh_reffacet=styleStyle&fh_location=%2f%2fcatalog01%2fen_GB%2fcategories%3C{9372014}%2fcategories%3C{9372039}%2fcategories%3C{9372150}%2fspecificationsProductType%3done_hole_taps%2fstyleStyle%3E{adelaide}&fh_refview=summary&fh_refpath=facet_159017215&fh_secondid=10507747&fh_eds=%C3%9F&ts=1279018688652

  32. Further Reading http://www.flickr.com/photos/markhillary/337685031/

  33. W3C Specifications • http://www.w3.org/standards/semanticweb/ • http://www.w3.org/standards/techs/rdf • http://www.w3.org/standards/techs/owl • http://www.w3.org/TR/swbp-vocab-pub/

  34. Tools Graphite Browser http://graphite.ecs.soton.ac.uk/browser/ Tabulator http://www.w3.org/2005/ajar/tab

  35. Linked Data Help Linked Data Website http://linkeddata.org/ The Patterns Book http://patterns.dataincubator.org/book/ Semantic Overflow http://www.semanticoverflow.com/

  36. Common Namespaces • SKOS (Simple Knowledge Organisation Scheme) • Taxonomies and thesauri • SIOC (Semantically Interlinked Online Communities) • Web forums, mailing lists, etc • FOAF (Friend of a Friend) • People, social networks • DC (Dublin Core) • Basic bibliographic information • BIBO (Bibliographic Ontology) • Advanced bibliographic information • GEO • Simple geolocation (lat/long) ontology

  37. Cool URIs Cool URIs don't change (by TimBL) http://www.w3.org/Provider/Style/URI Cool URIs for the Semantic Web http://www.w3.org/TR/cooluris/ ECS URI scheme documentation http://id.ecs.soton.ac.uk/docs/

  38. Infrastructure Namespaces RDF & RDFS These describe classes & predicates which are used to tie everything together. rdf:type is used to give a URI a class <http://id.ecs.soton.ac.uk/person/1248> rdf:type foaf:Person . OWL Used to describe the meaning of predicates & classes in machine-readable form. Start with a human readable documents, OWL is not widely consumed (yet?) XSD Describes datatypes like String, Positve Integer etc.

  39. Take Home Messages http://www.flickr.com/photos/71894657@N00/2696793132/

  40. Good URI Selection • ‘Cool URIs don’t change’ – once you’ve chosen a URI convention for your organisation, it’s a pain to change it • Getting this right is key to having your linked data used more widely We think that we got this one mostly right… …but we still had too many anonymous nodes around

  41. Start with the easy stuff • Go for an incremental approach • …but keep an eye on possible avenues for future expansion • RDFa is not for beginners! • Don’t do as we did: we tried to build linked data for all of our internal data in one go

  42. Don’t reinvent the wheel • Regardless of your application domain, there is probably already an ontology that does some of what you want • …but don’t be afraid to invent relationships and classes if you can’t find any suitable • Don’t do as we did! we wrote a new ontology from scratch, rather than reusing FOAF+DC)

  43. Eat your own dogfood • Build linked data for your own consumption first • You know what your use cases are – better to support these than to second guess those of unknown future users • Don’t do as we did: we overcomplicated our data by trying to support all of the plausible scenarios that we could think of, rather than concentrating on what mattered to us (be glad I couldn't find any clip art for this slide)

  44. Don’t underestimate CSV • You should aim to publish as RDF • Publishing as CSV may get your data out there faster as an interim measure We used CSV as a ‘glue’ data format between different systems, but chose not to expose data until we could do so as RDF.

  45. Thanks cjg@ecs.soton.ac.uk @cgutteridge http://blogs.ecs.soton.ac.uk/webteam/ http://is.gd/dqiJc

More Related