Linked Data COMS E6125 Prof. Gail Kaiser Presented By : MandarMohe ( msm2181 )
Basics When you have some of it, you can find other, related, data Why do you need it ? To make the machines understand the meanings of the links which connect webpages Difference between the current web and the web using linked data Linked Data is about publishing structured data in RDF using URIs
Web of Data The basic fundamentals of linked data involves connecting web pages in a machine understandable way as also connecting webpages based upon the relationship between the objects contained in those web pages . This linking of objects on different webpages and also on the same webpage is called as a web of data. Thus, with a web of data, everything on the web would be connected in an object-oriented wayWeb of Data = Linked data + vocabularies + embedded metadata ( RDFa , microformats , etc. )
RDF •Encodes meaning in sets of triples - subject, predicate and object - analogous to the subject, verb and object of an elementary sentence•Makes assertions that particular things (people, Web pages or whatever) have properties (such as “is a sister of”, “is the author of”) with certain values (another person, another Web page)•This structure can describe much of the data processed by machines
URI URI = URL ( Uniform Resource Locator ) + URN ( Uniform Resource Name ) One can classify URIs as locators (URLs), or as names (URNs), or as both. A Uniform Resource Name (URN) functions like a person's name, while a Uniform Resource Locator (URL) resembles that person's street address. In other words: the URN defines an item's identity, while the URL provides a method for finding it.a particular URI may be a name and a locator at the same time. URIs have links to other URIs :- These links are RDF properties likefoaf: knowsrdfs : seeAlso It can help improve the PRECISION and RECALL of search engines
Principles of Linked Data The four principles of Linked Data Design as suggested by Tim Berners-Lee Use URIs to identify things. Use HTTP URIs so that these things can be referred to and looked up ("dereferenced") by people and user agents Provide useful information about the thing when its URI is dereferenced, using standard formats such as RDF/XML Include links to other, related URIs in the exposed data to improve discovery of other related information on the Web.
The Cloud started with 2 datasets in 2007 and thereafter the number of organizations providing their datasets has doubled every 10 months Over 500 million RDF triplesAround 120,000 RDF links between data sources The interlinking of these diverse datasets promises a“Web of Data” that will enable users to easily navigate betweenthese datasets in a manner analogous to how userscurrently navigate from one webpage to another in the “Webof Documents”
Properties of the Web of Linked Data Anyone can publish data to the Web of Linked Data Entities are connected by linkscreating a global data graph that spans data sources and enables the discovery of new data sources. Data is self-describingIf an application encounters data represented using an unfamiliar vocabulary, the application can resolve the URIs that identify vocabulary terms in order to find their RDFS or OWL definition. The Web of Data is openapplications can discover new data sources at run-time by following links.
Relation between Semantic Web and Linked Data Linked Data is a subset of the Semantic Web. It uses a small slice of technologies standardized for the Semantic Web Ontologies essential for proper integration of data used for expressing, adding semantics to different terms in different datasets, same terms in different datasets, etc.
DBpedia and Geonames DBpediaextracts RDF triples from the "Infoboxes" commonly seenon the right hand side of Wikipedia articles, and makes theseavailable on the Web in RDF to be crawled or queried withSPARQL. Geonamesin turn provides RDF descriptions ofmillions of geographical locations worldwide DBpediaand Geonames provide URIs (and RDF descriptions) for many of thethings in the world to which we want to refer. As these URIs arereused within other data sets, DBpedia and Geonames developinto hubs to which an increasing number of other data sets areconnected, thereby increasing the potential for network effectsTwo open datasets are generally connected using <http URI> owl : sameAs <http URI>Example :-<http://dbpedia.org/resource/Berlin> owl:sameAs <http://sws.geonames.org/2950159>
OpenStreetMap Individuals everywhere are contributing tiny bits of information to OpenStreetMap.org,a free world map that anyone can add to or edit, building up geographical data in a collaborative way. At the end of 2009, the OpenStreetMap of Port-au-Prince, Haiti, showed very few specifics about the city. But just after the massive earthquake hit in January 2010, a company called GeoEye released satellite imagery of Haiti, with a license that allowed people to use it.Immediately, individuals from all over the world who wanted to help Haiti scrutinized the imagery—zoomed in and peered into it—and added to the OpenStreetMap all sorts of details about the devastated area. The online map had an immediate impact on rescue efforts, as teams accessed the information via portable navigation devices
Information extraction from Linked Data SPARQL is the query language used. It has certain shortcomings like Schema Heterogeneity Users need to have knowledge of the schemas of a multiple and heterogeneous datasets Entity Disambiguation DBpediareferences Geonames using the owl:sameAsproperty. So, it might confuse the user as to which is the bestsource to answer the query.E.g. DBpedia quotes the population of Barcelona as 1,615,908, whereasaccording to Geonames it is 1,581,595.[ This was detected because of the above OWL property ]This might be because of difference in the notion of the city of Barcelona.So, from the drawback of SPARQL, it gets us back to the drawback of Linked Data w.r.t. it’s lack of expression
Is Linked Data limited to publishing and interlinking datasets ? Linked Data browsers : Tabulator, Disco, OpenLink data browser, Zitgist browserWeb of Data search engines : Falcons, Sindice, Swoogle and WatsonOther applications also use the Linked Data from the web like Linked data crawlers, etc.
Conclusion As more and more organizations provide their datasets and become a part of the Linked Open Data Cloud, Web can finally be one Global Database Subscribing to the Linking Open Data Project http://lists.w3.org/Archives/public/public-lod/
References http://bank.cs.columbia.edu/classes/cs6125/lectures/22Feb11.ppthttp://en.wikipedia.org/wiki/Uniform_Resource_Identifierhttp://www.w3.org/DesignIssues/LinkedDatahttp://knoesis.wright.edu/library/publications/linkedai2010_submission_13.pdfhttp://www.scientificamerican.com/article.cfm?id=berners-lee-linked-datahttp://www2008.org/papers/pdf/p1265-bizer.pdfhttp://tomheath.com/papers/bizer-heath-berners-lee-ijswis-linked-data.pdf Videos :- http://www.youtube.com/watch?v=zwbs4ej0gpc