Natural language questions for the web of data
This presentation is the property of its rightful owner.
Sponsored Links
1 / 39

Natural Language Questions for the Web of Data PowerPoint PPT Presentation


  • 56 Views
  • Uploaded on
  • Presentation posted in: General

Natural Language Questions for the Web of Data. 1 Mohamed Yahya , Klaus Berberich , Gerhard Weikum Max Planck Institute for Informatics, Germany 2 Shady Elbassuoni Qatar Computing Research Institute 3 Maya Ramanath Dept. of CSE, IIT-Delhi, India 4 Volker Tresp

Download Presentation

Natural Language Questions for the Web of Data

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Natural language questions for the web of data

Natural Language Questions for the Web of Data

1 Mohamed Yahya, Klaus Berberich, Gerhard Weikum

Max Planck Institute for Informatics, Germany

2 Shady Elbassuoni

Qatar Computing Research Institute

3Maya Ramanath

Dept. of CSE, IIT-Delhi, India

4Volker Tresp

4Siemens AG, Corporate Technology, Munich, Germany

EMNLP 2012


Q nl translation to

QNL Translation to

QNL : Natural Language Questions

“Which female actor played in Casablanca and is married to a writer who was born in Rome?”.

Translation

  • QFL: SPARQL 1.0

    ?x hasGender female ?x marriedTo ?w

    ?x isa actor?w isa writer

    ?x actedIn Casablanca_(film) ?w bornIn Rome

  • Characteristics of SPARQL :

  • Complex query

  • good results

  • Difficult for the user


Natural language questions for the web of data

Yago2

YAGO2s is a huge semantic knowledge base, derived from Wikipedia, WordNet and GeoNames.

Relation

Class

Entities

Natural Language Questions for the Web of Data


Architecture of deanna

Architecture of DEANNA.


Phrase detection

Phrase detection

A detected phrase p is a pair < Toks, l >

Toks: phrase

l : label (l ∈ {concept, relation})

QNL

Phrase detection

Phrase

Pr : {<*, relation >}

Pc : {<*, concept >}

Natural Language Questions for the Web of Data


Phrase detection1

Phrase detection

concept phrase detection :

e.q. “Which female actor played in Casablanca and is married to a writer who was born in Rome?”

Search instances of the means relation in Yago2


Phrase detection2

Phrase detection

relation phrase detection :

rely on a relation detector based on ReVerb (Fader et al., 2011) with additional POS tag patterns

e.q. “Which female actor played in Casablanca and is married to a writer who was born in Rome?”


Phrase mapping

Phrase Mapping

to map concept phrases:

also Search instances of the means relation in Yago2

Phrase

Phrase Mapping

Mapping

  • to map relation phrases:

  • rely on a corpus of textual patterns to relation mappings

textual patterns

relation

e.q. “Which female actor played in Casablanca and is married to a writer who was born in Rome?”


Q unit generation

Q-Unit Generation

Candidategraph

Mapping

Q-Unit Generation

Dependency parsing :

q-unit is a triple of sets of phrases


Q unit generation1

Q-Unit Generation

Dependency parsing :

identifies triples of tokens:

<trel, targ1, targ2>, where trel, targ1, targ2∈qNL

e.q.

root

who was born in Rome?

trel

born

nsubjpass

in

targ2

targ1

who

Rome

nsubjpass(born-3, who-1)

auxpass(born-3, was-2)

root(ROOT-0, born-3)

prep_in(born-3, Rome-5)

<born, who, Rome>,


Q unit generation2

Q-Unit Generation

q-unit is a triple of sets of phrases

<{prel∈Pr}, {parg1∈ Pc}, {parg2∈ Pc}> ,trel∈prel , targ1∈ parg1 , and targ2∈ parg2 .

<born, relation >

<was born, relation >

<Rome, concept >

<a writer, concept >

<born, writer, Rome>

triples of tokens

phrase


Joint disambiguation

Joint Disambiguation

Joint Disambiguation

Rule

1.each phrase is assigned to at most one semantic item

2.resolves the phrase boundary ambiguity

(only nonoverlapping phrases are mapped)


Joint disambiguation1

Joint Disambiguation

Disambiguation Graph

  • Joint disambiguation takes place over a disambiguation graph DG = (V, E),

    • V = Vs∪Vp∪Vq

    • E = Esim∪Ecoh∪Eq


Joint disambiguation2

Joint Disambiguation

Disambiguation Graph

  • V = Vs∪Vp∪Vq

Vs: the set of s-node

(s-node is semantic items)

Vp: the set of p-node

p-node is phrases

Vrp: the set of relation phrases

Vrc: the set of concept phrases

Vq : a set of placeholder nodes for q–units


Disambiguation graph

Disambiguation Graph

Disambiguation Graph

E = Esim∪Ecoh∪Eq

Esim⊆Vp × Vs

a set of weighted similarity edges

Ecoh⊆ Vs × Vs

a set of weighted coherence edges

Eq⊆ Vq× Vp× d, d ∈ {rel, arg1, arg2}

Called q-edge


Disambiguation graph1

Disambiguation Graph

Edge Weights

  • Cohsem(Semantic Coherence)

    • between two semantic items s1 and s2 as the Jaccard coefficient of their sets of inlinks.

  • Three kinds of inlink

    • InLinks(e)

    • InLinks(c)

    • InLinks(r)


Inlinks e

InLinks(e)

  • InLinks(e): the set of Yago2 entities whose corresponding Wikipedia pages link to the entity.

  • e.q.

    • Let e = Casablanca

    • InLinks(Casablanca) = {Marwan_al-Shehhi , Ingrid_Bergman, …, Morocco…}

Natural Language Questions for the Web of Data


Inlinks c

InLinks(c)

  • InLinks(c) = ∪e∈c Inlinks(e)

  • e.q. let c = wikicategory_Metropolitan_areas_of_Morocco

    • InLinks(wikicategory_Metropolitan_areas_of_Morocco) = InLinks(Casablanca) ∪InLinks(Marrakech) ∪InLinks(Fes) ∪InLinks(Agadir) ∪InLinks(Safi,_Morocco) ∪InLinks(Oujda) ∪InLinks(Tangier) ∪InLinks(Rabat)

Natural Language Questions for the Web of Data


Inlinks r

InLinks(r)

  • InLinks(r) = ∪(e1, e2) ∈ r (InLinks(e1) ∩InLinks(e2))

Natural Language Questions for the Web of Data


Similarity weights

Similarity Weights

  • For entities

    • how often a phrase refers to a certain entity in Wikipedia.

  • For classes

    • reflects the number of members in a class

  • For relations

    • reflects the maximum n-gram similarity between the phrase and any of the relation’s surface forms

Natural Language Questions for the Web of Data


Disambiguation graph processing

Disambiguation Graph Processing

  • The result of disambiguation is a subgraph of the disambiguation graph, yielding the most coherent mappings.

  • We employ an ILP to this end.

Natural Language Questions for the Web of Data


Definitions part1

Definitions (part1)

Natural Language Questions for the Web of Data


Definitions part2

Definitions (part2)

Natural Language Questions for the Web of Data


Objective function

objective function

Natural Language Questions for the Web of Data


Constraints 1 3

Constraints(1~3)

Natural Language Questions for the Web of Data


Constraints 4 7

Constraints(4~7)

Natural Language Questions for the Web of Data


Constraints 8 9

Constraints(8~9)

This is not invoked for existential questions

Natural Language Questions for the Web of Data


Resulting subgraph for the disambiguation graph of figure 3

resulting subgraph for the disambiguation graph of Figure 3

Natural Language Questions for the Web of Data


Query generation

Query Generation

  • not assign subject/object roles in triploids and q-units

  • Example:

    • “Which singer is married to a singer?”

      • ?x type singer , ?x marriedTo ?y , and ?y type singer

Natural Language Questions for the Web of Data


5 evaluation

5 Evaluation

  • Datasets

  • Evaluation Metrics

  • Results & Discussion

Natural Language Questions for the Web of Data


Datasets

Datasets

  • author's experiments are based on two collections of questions:

    • QALD-1

      • 1st Workshop on Question Answering over Linked Data (QALD-1)

      • the context of the NAGA project

    • NAGA collection

      • The NAGA collection is based on linking data from the Yago2 knowledge base

  • Training set

    • 23 QALD-1 questions

    • 43 NAGA questions

  • Test set

    • 27 QALD-1 questions

    • 44 NAGA questions

  • Get hyperparameters (α, β, γ) in the ILP objective function.

    • 19 QALD-1 questions in Test set

Natural Language Questions for the Web of Data


Evaluation metrics

Evaluation Metrics

  • author evaluated the output of DEANNA at three stages

    • 1. after the disambiguation of phrases

    • 2. after the generation of the SPARQL query

    • 3. after obtaining answers from the underlying linked-data sources

  • Judgement

    • two human assessors who judged whether an output item was good or not

    • If the two were in disagreement , then a third person resolved the judgment.

Natural Language Questions for the Web of Data


Disambiguation stage

disambiguation stage

  • The task of judges

    • looked at each q-node/s-node pair, in the context of the question and the underlying data schemas,

    • determined whether the mapping was correct or not

    • determined whether any expected mappings were missing.

Natural Language Questions for the Web of Data


Query generation stage

query-generation stage

  • The task of judges

    • Looked at each triple pattern

    • determined whether the pattern was meaningful for the question or not

    • whether any expected triple pattern was missing.

Natural Language Questions for the Web of Data


Query answering stage

query-answering stage

  • the judges were asked to identify if the result sets for the generated queries are satisfactory.

Natural Language Questions for the Web of Data


Natural language questions for the web of data

  • For a question q and item set s in one of the stages of evaluation

  • correct(q, s) : the number of correct items in s

  • ideal(q) : the size of the ideal item set

  • retrieved(q, s) : the number of retrieved items

  • define coverage and precision as follows:

  • cov(q, s) = correct(q, s) / ideal(q)

  • prec(q, s) = correct(q, s) / retrieved(q, s).

  • Micro-averaging

    • aggregates over all assessed items regardless of the questions to which they belong.

  • Macro-averaging

    • first aggregates the items for the same question, and then averages the quality measure over all questions.

Natural Language Questions for the Web of Data


Natural language questions for the web of data

Natural Language Questions for the Web of Data


Conclusions

Conclusions

  • Author presented a method for translating natural language questions into structured queries.

  • Although author’s model, in principle, leads to high combinatorial complexity, they observed that the Gurobi solver could handle they judiciously designed ILP very efficiently.

  • Author’s experimental studies showed very high precision and good coverage of the query translation, and good results in the actual question answers.

Natural Language Questions for the Web of Data


  • Login