1 / 48

Natural Language Questions for the Web of Data

Natural Language Questions for the Web of Data. Mohamed Yahya 1 , Klaus Berberich 1 , Shady Elbassuoni 2 Maya Ramanath 3 , Volker Tresp 4 , Gerhard Weikum 1 1 Max Planck Institute for Informatics, Germany 2 Qatar Computing Research Institute 3 Dept. of CSE, IIT-Delhi, India

suzy
Download Presentation

Natural Language Questions for the Web of Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Natural Language Questions for the Web of Data Mohamed Yahya1, Klaus Berberich1, Shady Elbassuoni2 Maya Ramanath3, Volker Tresp4, Gerhard Weikum1 1 Max Planck Institute for Informatics, Germany 2 Qatar Computing Research Institute 3 Dept. of CSE, IIT-Delhi, India 4 Siemens AG, Corporate Technology, Munich, Germany EMNLP 2012

  2. Example of question • “Which female actor played in Casablanca and is married to a writer who was born in Rome?”. • Translation to SPARQL : • ?x hasGender female • ?x isa actor • ?x actedIn Casablanca_(film) • ?x marriedTo ?w • ?w isa writer • ?w bornIn Rome • Characteristics of SPARQL : • Complex query • good results • Difficult for the user • Author wants: automatically create such structured queries by mapping the user’s question into this representation Natural Language Questions for the Web of Data

  3. Translate qNL to qFL • qNL→qFL • qNL : natural language question • qFL : formal language query • (SPARQL 1.0) Natural Language Questions for the Web of Data

  4. Yago2 • YAGO2s is a huge semantic knowledge base, derived from Wikipedia, WordNet and GeoNames. • http://www.mpi-inf.mpg.de/yago-naga/yago/ Natural Language Questions for the Web of Data

  5. sample facts from Yago2 • Examples of relations: • type, subclassOf, and actedIn. • Examples of class: • person and film. • Examples of Entities : • Entities are represented in canonical form such as ‘Ingrid_Bergman’ and ‘Casablanca_(film)’. • special type of entities : strings, numbers, and dates. Natural Language Questions for the Web of Data

  6. DEANNA • DEANNA (DEepAnswers for maNy Naturally Asked questions) Natural Language Questions for the Web of Data

  7. question sentence • qNL = (t0, t1, ..., tn). • Phrase = (ti, ti+1, ..., ti+l) ⊆ qNL, 0 ≤ i, 0 ≤ l ≤ n • Phrase focus on entities, classes, and relations Natural Language Questions for the Web of Data

  8. Phrase detection Phrases are detected that potentially correspond to semantic items such as ‘Who’, ‘played in’, ‘movie’ and ‘Casablanca’. Natural Language Questions for the Web of Data

  9. Phrase detection • A detected phrase p is a pair < Toks, l > • Toks: phrase • l : label (l ∈ {concept, relation}) • Pr : {<*, relation >} • Pc : {<*, concept >} Natural Language Questions for the Web of Data

  10. concept detection • works against a phrase-concept dictionary • phrase-concept dictionary : instances of the means relation in Yago2 Natural Language Questions for the Web of Data

  11. relation detection • rely on a relation detector based on ReVerb (Fader et al., 2011) with additional POS tag patterns, in addition to our own which looks for patterns in dependency parses. Natural Language Questions for the Web of Data

  12. Phrase Mapping Natural Language Questions for the Web of Data

  13. Phrase Mapping • each phrase is mapped to a set of semantic items. • To map concept phrases: • also relies on the phrase-concept dictionary. • To map relation phrases: • rely on a corpus of textual patterns to relation mappings of the form • {‘play’, ‘star in’, ‘act’, ‘leading role’} → actedIn • {‘married’, ‘spouse’, ‘wife’} → marriedTo Natural Language Questions for the Web of Data

  14. Example of Phrase Mapping • ‘played in’ can either refer to the semantic relation actedIn or to playedForTeam and • ‘Casablanca’ can potentially refer to Casablanca_(film) or Casablanca,_Morocco. Natural Language Questions for the Web of Data

  15. Dependency Parsing & Q-Unit Generation Natural Language Questions for the Web of Data

  16. Dependency parsing • Dependency parsing identifies triples of tokens,or triploids • <trel, targ1, targ2>, where trel, targ1, targ2∈qNL • trel: the seed for the relation phrase • targ1, targ2 : seeds for the concept phrase. • there is no attempt to assign subject/object roles to the arguments. Natural Language Questions for the Web of Data

  17. Which female actor played in Casablanca?who is married to a writer who was born in Rome?

  18. Q-Unit Generation • By combining triploids with detected phrases, we obtain q-units. • q-unit is a triple of sets of phrases, • <{prel∈ Pr}, {parg1∈ Pc}, {parg2∈ Pc}> • trel∈prel , targ1∈ parg1 , and targ2∈ parg2 . Natural Language Questions for the Web of Data

  19. Joint Disambiguation Natural Language Questions for the Web of Data

  20. Natural Language Questions for the Web of Data

  21. goal of the disambiguation step • each phrase is assigned to at most one semantic item • resolves the phrase boundary ambiguity • (only nonoverlapping phrases are mapped) Natural Language Questions for the Web of Data

  22. resulting subgraph for the disambiguation graph of Figure 3 Natural Language Questions for the Web of Data

  23. Disambiguation Graph • Joint disambiguation takes place over a disambiguation graph DG = (V, E), • V = Vs∪Vp∪Vq • E = Esim∪Ecoh∪Eq Natural Language Questions for the Web of Data

  24. Type of vertices • V = Vs∪Vp∪Vq • Vs : the set of s-node • s-node is semantic items • Vp : the set of p-node • p-node is phrases • Vrp : the set of relation phrases • Vrc : the set of concept phrases • Vq : a set of placeholder nodes for q–units Natural Language Questions for the Web of Data

  25. Type of edges • Esim⊆Vp × Vs • a set of weighted similarity edges • Ecoh⊆ Vs × Vs • a set of weighted coherence edges • Eq⊆ Vq× Vp× d, d ∈ {rel, arg1, arg2} • Called q-edge Natural Language Questions for the Web of Data

  26. Cohsem (Semantic Coherence) • define the semantic coherence (Cohsem) • between two semantic items s1 and s2 as the Jaccard coefficient of their sets of inlinks. • Three kinds of inlink • InLinks(e) • InLinks(c) • InLinks(r) Natural Language Questions for the Web of Data

  27. InLinks(e) • InLinks(e): the set of Yago2 entities whose corresponding Wikipedia pages link to the entity. • e.q. • Let e = Casablanca • InLinks(Casablanca) = {Marwan_al-Shehhi , Ingrid_Bergman, …, Morocco…} Natural Language Questions for the Web of Data

  28. InLinks(c) • InLinks(c) = ∪e∈c Inlinks(e) • e.q. let c = wikicategory_Metropolitan_areas_of_Morocco • InLinks(wikicategory_Metropolitan_areas_of_Morocco) = InLinks(Casablanca) ∪InLinks(Marrakech) ∪InLinks(Fes) ∪InLinks(Agadir) ∪InLinks(Safi,_Morocco) ∪InLinks(Oujda) ∪InLinks(Tangier) ∪InLinks(Rabat) Natural Language Questions for the Web of Data

  29. InLinks(r) • InLinks(r) = ∪(e1, e2) ∈ r (InLinks(e1) ∩InLinks(e2)) Natural Language Questions for the Web of Data

  30. Similarity Weights • For entities • how often a phrase refers to a certain entity in Wikipedia. • For classes • reflects the number of members in a class • For relations • reflects the maximum n-gram similarity between the phrase and any of the relation’s surface forms Natural Language Questions for the Web of Data

  31. Disambiguation Graph Processing • The result of disambiguation is a subgraph of the disambiguation graph, yielding the most coherent mappings. • We employ an ILP to this end. Natural Language Questions for the Web of Data

  32. Definitions (part1) Natural Language Questions for the Web of Data

  33. Definitions (part2) Natural Language Questions for the Web of Data

  34. objective function Natural Language Questions for the Web of Data

  35. Constraints(1~3) Natural Language Questions for the Web of Data

  36. Constraints(4~7) Natural Language Questions for the Web of Data

  37. Constraints(8~9) This is not invoked for existential questions Natural Language Questions for the Web of Data

  38. resulting subgraph for the disambiguation graph of Figure 3 Natural Language Questions for the Web of Data

  39. Query Generation • not assign subject/object roles in triploids and q-units • Example: • “Which singer is married to a singer?” • ?x type singer , ?x marriedTo ?y , and ?y type singer Natural Language Questions for the Web of Data

  40. 5 Evaluation • Datasets • Evaluation Metrics • Results & Discussion Natural Language Questions for the Web of Data

  41. Datasets • author's experiments are based on two collections of questions: • QALD-1 • 1st Workshop on Question Answering over Linked Data (QALD-1) • the context of the NAGA project • NAGA collection • The NAGA collection is based on linking data from the Yago2 knowledge base • Training set • 23 QALD-1 questions • 43 NAGA questions • Test set • 27 QALD-1 questions • 44 NAGA questions • Get hyperparameters (α, β, γ) in the ILP objective function. • 19 QALD-1 questions in Test set Natural Language Questions for the Web of Data

  42. Evaluation Metrics • author evaluated the output of DEANNA at three stages • 1. after the disambiguation of phrases • 2. after the generation of the SPARQL query • 3. after obtaining answers from the underlying linked-data sources • Judgement • two human assessors who judged whether an output item was good or not • If the two were in disagreement , then a third person resolved the judgment. Natural Language Questions for the Web of Data

  43. disambiguation stage • The task of judges • looked at each q-node/s-node pair, in the context of the question and the underlying data schemas, • determined whether the mapping was correct or not • determined whether any expected mappings were missing. Natural Language Questions for the Web of Data

  44. query-generation stage • The task of judges • Looked at each triple pattern • determined whether the pattern was meaningful for the question or not • whether any expected triple pattern was missing. Natural Language Questions for the Web of Data

  45. query-answering stage • the judges were asked to identify if the result sets for the generated queries are satisfactory. Natural Language Questions for the Web of Data

  46. For a question q and item set s in one of the stages of evaluation • correct(q, s) : the number of correct items in s • ideal(q) : the size of the ideal item set • retrieved(q, s) : the number of retrieved items • define coverage and precision as follows: • cov(q, s) = correct(q, s) / ideal(q) • prec(q, s) = correct(q, s) / retrieved(q, s). • Micro-averaging • aggregates over all assessed items regardless of the questions to which they belong. • Macro-averaging • first aggregates the items for the same question, and then averages the quality measure over all questions. Natural Language Questions for the Web of Data

  47. Natural Language Questions for the Web of Data

  48. Conclusions • Author presented a method for translating natural language questions into structured queries. • Although author’s model, in principle, leads to high combinatorial complexity, they observed that the Gurobi solver could handle they judiciously designed ILP very efficiently. • Author’s experimental studies showed very high precision and good coverage of the query translation, and good results in the actual question answers. Natural Language Questions for the Web of Data

More Related