1 / 70

Exploiting large scale web semantics to build end user applications

Learn about the Semantic Web and how it can be used to build semantic applications that provide useful information to users. Explore the potential of the Semantic Web as a 'web of data' and its applications in various domains. Presented by Enrico Motta, Professor of Knowledge Technologies at The Open University.

cmicah
Download Presentation

Exploiting large scale web semantics to build end user applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open University

  2. Aims of the Talk • What is the Semantic Web • Perspectives • The SW as a ‘web of data’ • The SW as a new context in which to build semantic applications and an unprecedented opportunity in which to address some classic AI problems • Typical misconceptions • What the SW is not! • Semantic Web for Users • Applications that do something interesting and useful to users, by exploiting available web semantics

  3. The Semantic Web as a ‘Web of Data’ Making data available to SW-aware software

  4. <foaf:Person rdf:about="http://identifiers.kmi.open.ac.uk/people/enrico-motta/"> <foaf:name>Enrico Motta</foaf:name> <foaf:firstName>Enrico</foaf:firstName> <foaf:surname>Motta</foaf:surname> <foaf:phone rdf:resource="tel:+44-(0)1908-653506"/> <foaf:homepage rdf:resource="http://kmi.open.ac.uk/people/motta/"/> <foaf:workplaceHomepage rdf:resource="http://kmi.open.ac.uk/"/> <foaf:depiction rdf:resource="http://kmi.open.ac.uk/img/members/enrico.jpg"/> <foaf:topic_interest>Knowledge Technologies</foaf:topic_interest> <foaf:topic_interest>Semantic Web</foaf:topic_interest> <foaf:topic_interest>Ontologies</foaf:topic_interest> <foaf:topic_interest>Problem Solving Methods</foaf:topic_interest> <foaf:topic_interest>Knowledge Modelling</foaf:topic_interest> <foaf:topic_interest>Knowledge Management</foaf:topic_interest> <foaf:based_near> <geo:Point> <geo:lat>52.024868</geo:lat> <geo:long>-0.707143</geo:long> <contact:nearestAirport> <airport:name>London Luton Airport</airport:name> <airport:iataCode>LTN</airport:iataCode> <airport:location>Luton, United Kingdom</airport:location> <geo:lat>51.866666666667</geo:lat> <geo:long>-0.36666666666667</geo:long> <rdfs:seeAlso rdf:resource="http://www.daml.org/cgi-bin/airport?LTN"/> <foaf:currentProject> <foaf:Project> <foaf:name>AquaLog</foaf:name> </foaf:currentProject>

  5. The web of SW documents

  6. Current status of the semantic web • 10-20 million semantic web documents • Expressed in RDF, OWL, DAML+OIL • 7K-10K ontologies • These cover a variety of domains - music, multimedia, computing, management, bio-medical sciences, upper level concepts, etc… • Hence: • To a significant extent the semantic web is already in place • However, domain coverage is very uneven • Still primarily a research enterprise, however interest is rapidly increasing in both governmental and business organizations • “early adopters” phase The above figures refer to resources which are publicly accessible on the web

  7. <data data data> <data data data> <data data data> <data data data> <data data data> <data data data>

  8. Bibliographic Data CS Dept Data Geography AKT Reference Ontology RDF Data

  9. “Corporate Semantic Webs” • A ‘corporate ontology’ is used to provide a homogeneous view over heterogeneous data sources. • Often tackle Enterprise Information Integration scenarios • Hailed by Gartner as one of the key emerging strategic technology trends • E.g., Garlik is a multi-million startup recently set up in UK to support personal information management, which uses an ontology to integrate data mined from the web on a large scale

  10. AquaLog

  11. Applications that exploit large scale semantic content

  12. The web of data

  13. Gateways to the SW SemanticWeb Application Semantic Web Gateway

  14. Sophisticated quality control mechanism • Detects duplications • Fixes obvious syntax problems • E.g., duplicated ontology IDs, namespaces, etc.. • Structures ontologies in a network • Using relations such as: extends, inconsistentWith, duplicates • Provides interfaces for both human users and software programs • Provides efficient API • Supports formal queries (SPARQL) • Variety of ontology ranking mechanisms • Modularization/Combination support • Plug-ins for Protégé and NeOn Toolkit • Very cool logo!

  15. Case Study 1: Automatic Alignment of Thesauri in the Agricultural/Fishery Domain

  16. Method • SCARLET - matching by Harvesting the SW • Automatically select and combine multiple online ontologies to derive a relation Access Semantic Web Scarlet Deduce Concept_A (e.g., Supermarket) Concept_B (e.g., Building) Semantic Relation ( )

  17. Two strategies Building OrganicChemical PublicBuilding Lipid Shop Steroid Steroid Supermarket Cholesterol Semantic Web Scarlet Scarlet Building Cholesterol OrganicChemical Supermarket (A) (B) Deriving relations from (A) one ontology and (B) across ontologies.

  18. Experiment • Matching: • AGROVOC • UN’s Food and Agriculture • Organisation (FAO) thesaurus • 28.174 descriptor terms • 10.028 non-descriptor terms • NALT • US National Agricultural • Library Thesaurus • 41.577 descriptor terms • 24.525 non-descriptor terms

  19. 226 Used Ontologies http://139.91.183.30:9090/RDF/VRP/Examples/tap.rdf http://reliant.teknowledge.com/DAML/SUMO.daml http://reliant.teknowledge.com/DAML/Mid-level-ontology.daml http://gate.ac.uk/projects/ htechsight/Technologies.daml http://reliant.teknowledge.com/DAML/Economy.daml

  20. Evaluation 1 - Precision • Manual assessment of 1000 mappings (15%) • Evaluators: • Researchers in the area of the Semantic Web • 6 people split in two groups • Results: • Comparable to best results for background knowledge based matchers.

  21. Evaluation 2 – Error Analysis

  22. Case Study 2:Folksonomy Tagspace Enrichment

  23. Features of Web2.0 sites • Tagging as opposed to rigid classification • Dynamic vocabulary does not require much annotation effort and evolves easily • Shared vocabulary emerge over time • certain tags become particularly popular

  24. Limitations of tagging • Different granularity of tagging • rome vs colosseum vs roman monument • Flower vs tulip • Etc.. • Multilinguality • Spelling errors, different terminology, plural vs singular, etc… • This has a number of negative implications for the effective use of tagged resources • e.g., Search exhibits very poor recall

  25. Giving meaning to tags

  26. 2. Linking two "SW tags" using semantic relations {japan, asia} <japan subRegionOf asia> What does it mean to add semantics to tags? 1. Mapping a tag to a SW element "japan" <akt:Country Japan>

  27. Applications of the approach • To improve recall in keyword search • To support annotation by dynamically suggesting relevant tags or visualizing the structure of relevant tags • To enable formal queries over a space of tags • Hence, going beyond keyword search • To support new forms of intelligent navigation • i.e., using the 'semantic layer' to support navigation

  28. Pre-processing Tags Clean tags Group similar tags Filter infrequent tags Concise tags Folksonomy Clustering Analyze co-occurrence of tags Co-occurence matrix Cluster tags Cluster1 Cluster2 … Clustern • Concept and relation identification Yes SW search engine 2 “related” tags Remaining tags? Wikipedia Find mappings & relation for pair of tags No Google END <concept, relation, concept>

  29. Examples Information Object archive in-event has-mention-of event resource partici- patesIn applica- tion participant activity typeRange user component creator admin interface example innovation planning developer

  30. Examples activities4 education learning4 teaching4 training1,4 school2 qualification corporate1 postSecondary School2 institution studiesAt college2 student3 university2,3 takesCourse offersCourse course3 1http://gate.ac.uk/projects/htechsight/Employment.daml. 2http://reliant.teknowledge.com/DAML/Mid-level-ontology.daml. 3http://www.mondeca.com/owl/moses/ita.owl. 4http://www.cs.utexas.edu/users/mfkb/RKF/tree/CLib-core-office.owl.

  31. Faceted Ontology • Ontology creation and maintenance is automated • Ontology evolution is driven by task features and by user changes • Large scale integration of ontology elements from massively distributed online ontologies • Very different from traditional top-down-designed ontologies

  32. Case Study 3:Reviewing and Rating on the Web

  33. Revyu.com

  34. Trust Factors

  35. solution subjective objective affinity expertiseexperience factorsemphasised

  36. Applying the framework to revyu.com • Affinity • Operationalised as the degree of overlap in items reviewed, and in ratings given • Experience • Proxy metric: Usage of particular tags (as proxies for topics) • Experience scores based on tagging data • Integrates also data from del.icio.us for those users who have chosen to publish their del.icio.us account on FOAF • Expertise • Proxy metric: Credibility • Captures the social aspect of expertise: endorsement

  37. Using trust factors for ranking reviews

More Related