1 / 50

The Semantic Web: there and back again

The Semantic Web: there and back again. Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad , Anupam Joshi. http:// ebiq.org /r/353. LOD 123 : Making the semantic w eb e asier to u se. Tim Finin University of Maryland, Baltimore County

trilby
Download Presentation

The Semantic Web: there and back again

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Semantic Web:there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi http://ebiq.org/r/353

  2. LOD 123: Making the semantic web easier to use Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

  3. Semantic Web: then and now Ten years ago the we developed complex ontologies used to encode and reason over small datasets of 1000s of facts Recently the focus has shifted to using simple ontologies and minimal reasoning over very large datasets of 100s of millions of facts Major companies are moving: Google Know-ledge Graph, Facebook Open Graph, Microsoft Satori, Apple SiriKB, IMB Watson KB Linked open data or “Things, not strings”

  4. Linked Open Data (LOD) • Linked data is just RDF data, lots of it,with a small schema • RDF data is a graph of triples (subject, predicate • URI URI String: dbr:Barack_Obamadbo:spouse “Michelle Obama” • URI URI URI:dbr:Barack_Obamadbo:spousedbpedia:Michelle_Obama • Best linked data practice prefers 2nd pattern, using nodes rather than strings for “entities” • Things, not strings! • Linked open data is just linked data freely acces-sible on the Web along with their ontologies

  5. Semantic Web Use Semantic Web Technology to publish shared data & knowledge Semantic web technologies allow machines to share data and knowledge using common web language and protocols. ~ 1997 Semantic Web beginning

  6. Semantic Web => Linked Open Data 2007 Use Semantic Web Technology to publish shared data & knowledge Data is inter- linked to support inte- gration and fusion of knowledge LOD beginning

  7. Semantic Web => Linked Open Data 2008 Use Semantic Web Technology to publish shared data & knowledge Data is inter- linked to support inte- gration and fusion of knowledge LOD growing

  8. Semantic Web => Linked Open Data 2009 Use Semantic Web Technology to publish shared data & knowledge Data is inter- linked to support inte- gration and fusion of knowledge … and growing

  9. Linked Open Data Use Semantic Web Technology to publish shared data & knowledge LOD is the new Cyc: a common source of background knowledge Data is inter- linked to support inte- gration and fusion of knowledge 2010 …growing faster

  10. Linked Open Data 2011: 31B facts in 295 datasets interlinked by 504M assertions on ckan.net Use Semantic Web Technology to publish shared data & knowledge LOD is the new Cyc: a common source of background knowledge Data is inter- linked to support inte- gration and fusion of knowledge

  11. Exploiting LOD not (yet) Easy • Publishing or using LOD data hasinherent difficulties for the potential user • It’s difficult to explore LOD data and to query it for answers • It’s challenging to publish data using appropriate LOD vocabularies & link it to existing data • Problem: O(104) schema terms, O(1011) instances • I’ll describe two ongoing research projects that are addressing these problems

  12. GoRelations:Intuitive Query Systemfor Linked DataResearch with Lushan Han http://ebiq.org/j/93

  13. Dbpedia is the Stereotypical LOD • DBpedia is an important example of Linked Open Data • Extracts structured data from Infoboxes in Wikipedia • Stores in RDF using custom ontologies Yago terms • The major integration point for the entire LOD cloud • Explorable as HTML, but harder to query in SPARQL DBpedia

  14. Browsing DBpedia’s Mark Twain

  15. Why it’s hard to query LOD • Querying DBpedia requires a lot of a user • Understand the RDF model • Master SPARQL, a formal query language • Understand ontology terms: 320 classes & 1600 properties ! • Know instance URIs (>2M entities !) • Term heterogeneity (Place vs. PopulatedPlace) • Querying large LODsets overwhelming • Natural languagequery systems stilla research goal

  16. Goal • Let users with a basic understanding of RDF query DBpedia and other LOD collections • Explore what data is in the system • Get answers to question • Create SPARQL queries for reuse or adaptation • Desiderata • Easy to learn and to use • Good accuracy (e.g., precision and recall) • Fast

  17. Key Idea Structured keyword queries reduce problem complexity: • User enters a simple graph, and • Annotates the nodes and arcs with words and phrases

  18. Structured Keyword Queries • Nodes are entities and links binary relations • Entities described by two unrestricted terms: name or value and typeor concept • Outputs marked with ? • Compromise between a natural language Q&A system and formal query • Users provide compositional structure of the question • Free to use their own terms to annotate structure

  19. Translation – Step Onefinding semantically similar ontology terms For each graph concept/relation, generate k most semantically similar ontology classes/properties • Lexical similarity metric based on distributional similarity, LSA, and WordNet

  20. Semantic similarity: http://bit.ly/SEMSIM

  21. Semantic Textual Similarity task • 2013 lexical and computational semantics conference • Do two sentences have same meaning (0…5) 1: “The woman is playing the violin” vs. “The young lady enjoys listening to the guitar” 4: "In May 2010, the troops attempted to invade Kabul” vs. "The US army invaded Kabul on May 7th last year, 2010" • 2012: 35 teams, 88 runs, 2013:36 teams, 89 runs • 2250 sentence pairsfrom four domains • Our three runs#1, #2 and #4

  22. Translation – Step Twodisambiguation algorithm • Assemble best interpretation using statistics of the data • Use pointwise mutual informa-tion (PMI) between RDF terms in the LOD collection Measures degree to which two RDF terms co-occur in knowledge base • In a good interpretation, ontology terms associate like their corresponding user terms connect in the query

  23. Translation – Step Twodisambiguation algorithm Three aspects are combined to derive an overall goodness measure for each candidate interpretation Joint disam-biguation Resolvingdirection Link reason-ableness

  24. Translation result Concepts: Place => Place, Author => Writer, Book => Book Properties: born in => birthPlace, wrote => author (inverse direction)

  25. SPARQL Generation The translation of a semantic graph query to SPARQL is straightforward given the mappings • Concepts • Place => Place • Author => Writer • Book => Book • Relations • born in => birthPlace • wrote => author

  26. Evaluation • 33 test questions from 2011 Workshop on Question Answering over Linked Dataanswerable using DBpedia • Three human subjects unfamiliar with DBpedia translated the test questions into semantic graph queries • Compared with two top natural language QA systems: PowerAqua and True Knowledge

  27. Current work • Baseline system works well for Dbpedia; we’re testing a second use case now • Current work • Better entity matching • Relaxing the need for type information • Abetter Web interface with user feedback & advice • See http://ebiq.org/93 for more information & try our alpha version at http://ebiq.org/GOR

  28. Generating Linked Databy Inferring theSemantics of TablesResearch with Varish Mulwad http://ebiq.org/j/96

  29. Goal: Table => LOD* http://dbpedia.org/class/yago/NationalBasketballAssociationTeams dbprop:team http://dbpedia.org/resource/Allen_Iverson Player height in meters * DBpedia

  30. Goal: Table => LOD* @prefix dbpedia: <http://dbpedia.org/resource/> . @prefix dbo: <http://dbpedia.org/ontology/> . @prefix yago: <http://dbpedia.org/class/yago/> . "Name"@en is rdfs:label of dbo:BasketballPlayer. "Team"@en is rdfs:label of yago:NationalBasketballAssociationTeams . "Michael Jordan"@en is rdfs:label of dbpedia:Michael Jordan . dbpedia:Michael Jordan a dbo:BasketballPlayer. "Chicago Bulls"@en is rdfs:label of dbpedia:Chicago Bulls . dbpedia:Chicago Bulls a yago:NationalBasketballAssociationTeams . RDF Linked Data All this in a completely automated way * DBpedia

  31. Tables are everywhere !! … yet … The web – 154 millionhigh quality relational tables Fewer than 1% of the 400K tables at data.gov have rich semantic schemas

  32. A Domain Independent Framework Pre-processing modules Sampling Acronym detection Query and generate initial mappings 1 2 Joint Inference/Assignment Generate Linked RDF Verify (optional) Store in a knowledge base & publish as LOD

  33. Query and Rank Can be replaced by Domain Specific / other LOD knowledge bases Rank(String Similarity, Popularity) Chicago Boston 1. Chicago_Bulls2. Chicago3. Judy_Chicago Allen Iverson possible entitiesfor Chicago

  34. Generating candidate ‘types’ for Columns Class {dbpedia-owl:Place,dbpedia-owl:City,yago:WomenArtist,yago:LivingPeople,yago:NationalBasketballAssociationTeams } 1. Chicago Bulls 2. Chicago 3. Judy Chicago {dbpedia-owl:Place, dbpedia-owl:PopulatedPlace, dbpedia-owl:Film,yago:NationalBasketballAssociationTeams …. ….. ….. } {……………………………………………………………. } dbpedia-owl:Place, dbpedia-owl:City, yago:WomenArtist, yago:LivingPeople, yago:NationalBasketballAssociationTeams, dbpedia-owl:PopulatedPlace, dbpedia-owl:Film …. Instance

  35. Joint Inference / Assignment

  36. A graphical model for tablesJoint inference over evidence in a table Class C2 C3 C1 R21 R31 R11 R12 R22 R32 R13 R23 R33 Instance

  37. Parameterized Graphical Model Captures interaction between row values R33 R11 R12 R13 R21 R22 R23 R31 R32 Row value Factor Node C2 C1 C3 Function capturing affinity between column headers and row values Variable Node: Column header Captures interaction between column headers

  38. Standard message passing P(C1, C2, C3, R11, R12 ,R13, R21, R22, R23, R31, R32, R33) Joint Assignment : Graphical Models : Exploit Conditional Independences P(C1, R11, R12 ,R13) P(C2,R21, R22, R23) P(C3, R31, R32, R33) P(R31, R32, R33) Still … 4 Variables; Each having 25 options -- 390,625 entries !

  39. Semantic message passing “No Change” “No Change” “Change” Related to Chicago_Bulls R11:[Michael_I_Jordan] R21:[Chicago_Bulls] R13:[Allen_Iverson] R12:[Yao_Ming] R31:[Shooting_Guard] …… Shooting_guard Yao_Ming Allen_Iverson Michael_I_Jordan Chicago_Bulls “No Change” “Change” BasketBallPlayer “No Change” “No Change” BasketBallPositions NBATeam C1:[BasketballPlayer] C3:[BasketBallPositions] C2:[NBATeam] BasketballPlayer “No Change” “No Change” “No Change”

  40. Inference – Example R11:[Michael_I_Jordan] R12:[Allen_Iverson] R13:[Yao_Ming] Yao_Ming Allen_Iverson Michael_I_Jordan R11 “No Change” “No Change” “Change” Michael Jordan “BasketBallPlayer” 1. Michael_I_Jordan (Professor)2. ….. 3. Michael_Jordan (BasketballPlayer) …. (Michael_I_Jordan, Yao_Ming, Allen_Iverson) C1:[Name] “BasketBallPlayer”

  41. Column header – row value agreement [Michael_I_Jordan, Allen_Iverson, Yao_Ming] 1: Majority Voting LivingPeople GeoPopulatedPlace BasketBallPlayer Art Work WomenArtist City PopulatedPlace Athlete Film LivingPeople AI_Researchers +1 +1 +1 BasketballPlayer Athelete LivingPeople +1 Athlete City … 1.LivingPeople 2. BasketBallPlayer 3.GeoPopulatedPlace …. +1 2: Choose the top Class Yago Tie-breaker/Re-order : Choose more ‘descriptive’ class. E.g. BasketBallPlayer better than LivingPeople ClassGranularityScore = 1-[] Top Yago : BasketBallPlayerTopDBpedia : Athelete

  42. Column header – row value agreement topClassScore = (numberOfVotes)/(numberofRows)Compute topScoreYago & topScoreDBpedia (topScoreYago || topScoreDBpedia) >= Threshold (both)Score < Threshold Check for Alignment Is Atheletesub/superClassof BasketBallPlayer ? Columnn Header Annotation = BasketBallPlayer, Athlete Update Column Header Annotation = “No-Annotation” LivingPeople AI_Researchers Change BasketballPlayer Athelete LivingPeople No - Change

  43. Update row value entity annotations R11 “CHANGE” Entity Class : BasketBallPlayer or Athlete Michael Jordan LivingPeople AI_Researchers 1. Michael_I_Jordan 2. ….. 3. Michael_Jordan …. Candidate Entities for R11 BasketBallPlayer Athlete R11 Michael_Jordan Michael Jordan

  44. Evaluation • Dataset of 80 tables (Wikipedia tables; part of larger dataset released by IIT-Bombay) • Evaluated Column Header Annotation Accuracy • How good was the mapping Teamto NationalBasketballAssociationTeams • Evaluated Entity Linking Accuracy • Mapping Michael Jordan to Michael_Jordan

  45. Column Header Annotation Accuracy Incorrect Okay Accurate • System produced a ranked a list of Yago & DBpedia classes • Human judges evaluated each class • For precision, judges scored each class • 1 if the class was accurate • 0.5 if the class ok, but not best (e.g., Place vs. City) • 0 if it was incorrect • For Recall, score 1 if accurate/correct, 0 for incorrect

  46. Top K classes, F-measure

  47. Entity linking accuracy Allen Iverson http://dbpedia.org/resource/Allen_Iverson Accuracy : 75.91 %

  48. Other Challenges • Using table captions and other text is associated documents to provide context • Size of some data.gov tables (> 400K rows!) makes using full graphical model impractical • Sample table and run model on the subset • Achieving acceptable accuracy may require human input • 100% accuracy unattainable automatically • How best to let humans offer advice and/or correct interpretations?

  49. Final Conclusions • Linked data great for sharing structured and semi-structured data • Backed by machine-understandable semantics • Uses successful Web languages and protocols • Generating and exploring linked data resources is challenging • Schemas are too large, too many URIs • New tools mapping tables to linked data and translating structured natural language queries reduce the barriers

  50. http://ebiq.org/

More Related