1 / 41

Natural Language Questions for the Web of Data

Natural Language Questions for the Web of Data. Mohamed Yahya , Klaus Berberich , Gerhard Weikum Max Planck Institute for Informatics, Germany Shady Elbassuoni Qatar Computing Research Institute Maya Ramanath Dept. of CSE, IIT-Delhi, India Volker Tresp

miller
Download Presentation

Natural Language Questions for the Web of Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Natural Language Questions for the Web of Data Mohamed Yahya, Klaus Berberich, Gerhard Weikum Max Planck Institute for Informatics, Germany Shady Elbassuoni Qatar Computing Research Institute Maya Ramanath Dept. of CSE, IIT-Delhi, India Volker Tresp Siemens AG, Corporate Technology, Munich, Germany EMNLP 2012

  2. QNL Translation to QNL : Natural Language Questions “Which female actor played in Casablanca and is married to a writer who was born in Rome?”. Translation • QFL: SPARQL 1.0 ?x hasGender female ?x marriedTo ?w ?x isa actor ?w isa writer ?x actedIn Casablanca_(film) ?w bornIn Rome Problem : This complex query is difficult for the user Soluction : automatically Translate qNL to qFL Natural Language Questions for the Web of Data

  3. Knowledge base YAGO2 is a huge semantic knowledge base, derived from Wikipedia, WordNet and GeoNames. Relation Class Entities Natural Language Questions for the Web of Data

  4. Architecture of System • DEANNA (DEep Answers for maNy Naturally Asked questions) Natural Language Questions for the Web of Data

  5. Phrase detection A detected phrase p is a pair < Toks, l > Toks: phrase l : label (l ∈ {concept, relation}) QNL Phrase detection Phrase Pr : {<*, relation >} Pc : {<*, concept >} Natural Language Questions for the Web of Data

  6. Phrase detection concept phrase detection : e.q. “Which female actor played in Casablanca and is married to a writer who was born in Rome?” use a detector that works against a phrase-concept dictionary phrase-concept dictionary : instances of the means relation in Yago2 Natural Language Questions for the Web of Data

  7. Phrase detection relation phrase detection : rely on a relation detector based on ReVerb (Fader et al., 2011) with additional POS tag patterns e.q. “Which female actor played in Casablanca and is married to a writer who was born in Rome?” Natural Language Questions for the Web of Data

  8. Phrase Mapping • Two kinds of phrase Mapping: • The mapping of concept phrases • The mapping of relation phrases Phrase Phrase Mapping Mappings Natural Language Questions for the Web of Data

  9. Phrase Mapping the mapping of concept phrases: also use a detector that works against a phrase-concept dictionary phrase-concept dictionary : instances of the means relation in Yago2 e.q. “Which female actor played in Casablanca and is married to a writer who was born in Rome?” Natural Language Questions for the Web of Data

  10. Phrase Mapping • the mapping relation phrases: • rely on a corpus of textual patterns to relation mappings textual patterns relation e.q. “Which female actor played in Casablanca and is married to a writer who was born in Rome?” Natural Language Questions for the Web of Data

  11. Q-Unit Generation Candidategraph Mapping Q-Unit Generation Two parts of q-uint generation step: Dependency parsing q-unit is a triple of sets of phrases Natural Language Questions for the Web of Data

  12. Q-Unit Generation Dependency parsing : identifies triples of tokens: <trel, targ1, targ2>, where trel, targ1, targ2∈qNL e.q. who was born in Rome? root trel born nsubjpass in nsubjpass(born-3, who-1) auxpass(born-3, was-2) root(ROOT-0, born-3) prep_in(born-3, Rome-5) targ2 targ1 who Rome <born, who, Rome>, Natural Language Questions for the Web of Data

  13. Q-Unit Generation q-unit is a triple of sets of phrases <{prel∈Pr}, {parg1∈ Pc}, {parg2∈ Pc}> ,trel∈prel , targ1∈ parg1 , and targ2∈ parg2 . born a writer Rome , , was born Pc Pc Pr Natural Language Questions for the Web of Data

  14. Joint Disambiguation Rule 1: resolves the phrase boundary ambiguity (only nonoverlapping phrases are mapped) Rule 2: each phrase is assigned to at most one semantic item e Joint Disambiguation Natural Language Questions for the Web of Data

  15. Joint Disambiguation Disambiguation Graph • Joint disambiguation takes place over a disambiguation graph DG = (V, E), • V = Vs∪Vp∪Vq • E = Esim∪Ecoh∪Eq Natural Language Questions for the Web of Data

  16. Joint Disambiguation Disambiguation Graph: Vertices Vs : the set of s-node Vp : the set of p-node Vrp : the set of relation phrases Vrc : the set of concept phrases Vq : a set of placeholder nodes for q–units Natural Language Questions for the Web of Data

  17. Disambiguation Graph Disambiguation Graph: Edges Eq: Eq⊆ Vq× Vp× d d ∈ {rel, arg1, arg2} Esim: Esim⊆Vp × Vs a set of weighted similarity edges Ecoh: Ecoh⊆ Vs × Vs a set of weighted coherence edges Ecoh: sim-edges Q-edges Natural Language Questions for the Web of Data

  18. Disambiguation Graph Edge Weights • Cohsem(Semantic Coherence) • between two semantic items s1 and s2 as the Jaccard coefficient of their sets of inlinks. • Three kinds of inlink • InLinks(e) • InLinks(c) • InLinks(r) Natural Language Questions for the Web of Data

  19. Disambiguation Graph: Edge Weights Cohsem : inlinks of entity • InLinks(e): • the set of Yago2 entities whose corresponding Wikipedia pages link to the entity. • E.q. • InLinks(Casablanca) = {Marwan_al-Shehhi , Ingrid_Bergman, …, Morocco,…} InLinks(Casablanca) https://d5gate.ag5.mpi-sb.mpg.de/webyagospo/Browser Natural Language Questions for the Web of Data

  20. Disambiguation Graph: Edge Weights Cohsem : inlinks of class • InLinks(c) = ∪e∈c Inlinks(e) • E.q. • InLinks(wikicategory_Metropolitan_areas_of_Morocco) = InLinks(Casablanca) ∪InLinks(Marrakech) ∪…∪InLinks(Rabat) class entities Natural Language Questions for the Web of Data

  21. Disambiguation Graph: Edge Weights • Cohsem : inlinks of ralation • InLinks(r) = ∪(e1, e2) ∈ r (InLinks(e1) ∩InLinks(e2)) Natural Language Questions for the Web of Data

  22. Similarity Weights • Similarity Weights of entities • how often a phrase refers to a certain entity in Wikipedia. • Similarity Weights of classes • reflects the number of members in a class • Similarity Weights of relations • reflects the maximum n-gram similarity between the phrase and any of the relation’s surface forms Natural Language Questions for the Web of Data

  23. Joint Disambiguation Disambiguation Graph Processing • The result of disambiguation is a subgraph of the disambiguation graph, yielding the most coherent mappings. • We employ an ILP(integer linear program) to this end. ILP e Natural Language Questions for the Web of Data

  24. Joint Disambiguation : ILP Definitions : Natural Language Questions for the Web of Data

  25. Joint Disambiguation : ILP objective function : Natural Language Questions for the Web of Data

  26. Joint Disambiguation : ILP Constraints: Natural Language Questions for the Web of Data

  27. Joint Disambiguation : ILP resulting subgraph e Natural Language Questions for the Web of Data

  28. Query Generation • not assign subject/object roles in triploids and q-units • Replacing each semantic class with distinct type-constrained variable • Example: • “Which singer is married to a singer?” • ?x type singer , ?x marriedTo ?y , and ?y type singer Natural Language Questions for the Web of Data

  29. Query Generation • E.q. Replacing each semantic class ?x type writer Q-uint: arg1 rel arg2 ?y type person Generation e ?x bornIn Rome ?x ?y actedIn Casablanca ?y ?x ?y married Natural Language Questions for the Web of Data

  30. Evaluation Three part of Evaluation: • Datasets • Evaluation Metrics • Results & Discussion Natural Language Questions for the Web of Data

  31. Datasets • Experiments are based on two datasets: • QALD-1 • 1st Workshop on Question Answering over Linked Data (QALD-1) • the context of the NAGA project • NAGA collection • The NAGA collection is based on linking data from the Yago2 knowledge base • Training set: • 23 QALD-1 questions • 43 NAGA questions • Test set: • 27 QALD-1 questions • 44 NAGA questions • hyperparameters (α, β, γ) in the ILP objective function. • 19 QALD-1 questions in Test set Natural Language Questions for the Web of Data

  32. Evaluation Metrics • evaluated the output of DEANNA at three stages • after the disambiguation of phrases • after the generation of the SPARQL query • after obtaining answers from the underlying linked-data sources • Judgement • two human assessors • If they were in disagreement then a third person resolved the judgment. Natural Language Questions for the Web of Data

  33. Evaluation Metrics disambiguation stage • looked at each q-node/s-node pair. • whether the mapping was correct or not. • whether any expected mappings were missing. e Natural Language Questions for the Web of Data

  34. Evaluation Metrics query-generation stage • Looked at each triple pattern. • whether the pattern was meaningful for the question or not. • whether any expected triple pattern was missing. e.q. (triple pattern) • ?x bornIn Rome • ?y actedIn Casablanca • ?y married ?x Natural Language Questions for the Web of Data

  35. query-answering stage query-answering stage • the judges were asked to identify if the result sets for the generated queries are satisfactory. Natural Language Questions for the Web of Data

  36. Results • question q • item set s • correct(q, s) : • the number of correct items in s • ideal(q) : the size of the ideal item set • retrieved(q, s) : the number of retrieved items • define: • coverage and precision as follows: • cov(q, s) = correct(q, s) / ideal(q) • prec(q, s) = correct(q, s) / retrieved(q, s). Natural Language Questions for the Web of Data

  37. For a question q and item set s in one of the stages of evaluation • correct(q, s) : the number of correct items in s • ideal(q) : the size of the ideal item set • retrieved(q, s) : the number of retrieved items • define coverage and precision as follows: • cov(q, s) = correct(q, s) / ideal(q) • prec(q, s) = correct(q, s) / retrieved(q, s). • Micro-averaging • aggregates over all assessed items regardless of the questions to which they belong. • Macro-averaging • first aggregates the items for the same question, and then averages the quality measure over all questions. Natural Language Questions for the Web of Data

  38. Results • Example questions, the generated SPARQL queries and their answers the relation bornInrelates people to cities and not countries in Yago2. Natural Language Questions for the Web of Data

  39. Results Relaxation use (Elbassuoni et al., 2009) Natural Language Questions for the Web of Data

  40. Natural Language Questions for the Web of Data

  41. Conclusions • Author presented a method for translating natural language questions into structured queries. • Although author’s model, in principle, leads to high combinatorial complexity, they observed that the Gurobi solver could handle they judiciously designed ILP very efficiently. • Author’s experimental studies showed very high precision and good coverage of the query translation, and good results in the actual question answers. Natural Language Questions for the Web of Data

More Related