Question answering
Download
1 / 54

Question Answering - PowerPoint PPT Presentation


  • 507 Views
  • Updated On :

Question Answering. Lecture 1 (two weeks ago): Introduction; History of QA; Architecture of a QA system; Evaluation. Lecture 2 (last week): Question Classification; NLP techniques for question analysis; Tokenisation; Lemmatisation; POS-tagging; Parsing; WordNet.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Question Answering' - Jims


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Question answering l.jpg
Question Answering

  • Lecture 1 (two weeks ago):Introduction; History of QA; Architecture of a QA system; Evaluation.

  • Lecture 2 (last week):Question Classification; NLP techniques for question analysis; Tokenisation; Lemmatisation; POS-tagging; Parsing; WordNet.

  • Lecture 3 (today):Named Entity Recognition; Anaphora Resolution; Matching; Reranking; Answer Validation.



A panda l.jpg
A panda…

A panda walks into a cafe.

He orders a sandwich, eats it, then draws a gun and fires two shots in the air.


A panda4 l.jpg
A panda…

“Why?” asks the confused waiter, as the panda makes towards the exit.

The panda produces a dictionary and tosses it over his shoulder.

“I am a panda,” he says. “Look it up.”


The panda s dictionary l.jpg
The panda’s dictionary

Panda. Large black-and-white bear-like mammal, native to China. Eats, shoots and leaves.


Ambiguities l.jpg
Ambiguities

Eats, shoots and leaves. VBZ VBZ CC VBZ


Ambiguities7 l.jpg
Ambiguities

Eats shoots and leaves. VBZ NNS CC NNS


Question answering8 l.jpg
Question Answering

  • Lecture 1 (two weeks ago):Introduction; History of QA; Architecture of a QA system; Evaluation.

  • Lecture 2 (last week):Question Classification; NLP techniques for question analysis; Tokenisation; Lemmatisation; POS-tagging; Parsing; WordNet.

  • Lecture 3 (today):Named Entity Recognition; Anaphora Resolution; Matching; Reranking;Answer Validation.


Architecture of a qa system l.jpg
Architecture of a QA system

corpus

query

IR

Question

Analysis

question

documents/passages

answer-type

Document

Analysis

question

representation

passage

representation

Answer

Extraction

answers


Architecture of a qa system10 l.jpg
Architecture of a QA system

corpus

query

IR

Question

Analysis

question

documents/passages

answer-type

Document

Analysis

question

representation

passage

representation

Answer

Extraction

answers


Recall the answer type taxonomy l.jpg
Recall the Answer-Type Taxonomy

  • We divided questions according to their expected answer type

  • Simple Answer-Type Typology

  • PERSON

  • NUMERAL

  • DATE

  • MEASURE

  • LOCATION

  • ORGANISATION

  • ENTITY


Named entity recognition l.jpg
Named Entity Recognition

  • In order to make use of the answer types, we need to be able to recognisenamedentities of the same types in the corpus

  • PERSON

  • NUMERAL

  • DATE

  • MEASURE

  • LOCATION

  • ORGANISATION

  • ENTITY


Example text l.jpg
Example Text

Italy’s business world was rocked by the announcementlast Thursday that Mr.Verdi would leave his job as vice-president ofMusic Masters of Milan, Inc to become operations director ofArthur Andersen.


Named entity recognition14 l.jpg
Named Entity Recognition

<ENAMEX TYPE=„LOCATION“>Italy</ENAME>‘s business world was rocked by the announcement <TIMEX TYPE=„DATE“>last Thursday</TIMEX> that Mr. <ENAMEX TYPE=„PERSON“>Verdi</ENAMEX> would leave his job as vice-president of <ENAMEX TYPE=„ORGANIZATION“>Music Masters of Milan, Inc</ENAMEX> to become operations director of  <ENAMEX TYPE=„ORGANIZATION“>Arthur Andersen</ENAMEX>.


Ner difficulties l.jpg
NER difficulties

  • Several types of entities are too numerous to include in dictionaries

  • New names turn up every day

  • Different forms of same entities in same text

    • Brian Jones … Mr. Jones

  • Capitalisation


Ner approaches l.jpg
NER approaches

  • Rule-based approach

    • Hand-crafted rules

    • Help from databases of known named entities

  • Statistical approach

    • Features

    • Machine learning



What is anaphora l.jpg
What is anaphora?

  • Relation between a pronoun and another element in the same or earlier sentence

  • Anaphoric pronouns:

    • he, she, it, they

  • Anaphoric noun phrases:

    • the country,

    • that idiot,

    • his hat, her dress


Anaphora pronouns l.jpg
Anaphora (pronouns)

  • Question:What is the biggest sector in Andorra’s economy?

  • Corpus:Andorra is a tiny land-locked country in southwestern Europe, between France and Spain. Tourism, the largest sector of its tiny, well-to-do economy, accounts for roughly 80% of the GDP.

  • Answer: ?


Anaphora definite descriptions l.jpg
Anaphora (definite descriptions)

  • Question:What is the biggest sector in Andorra’s economy?

  • Corpus:Andorra is a tiny land-locked country in southwestern Europe, between France and Spain. Tourism, the largest sector of the country’s tiny, well-to-do economy, accounts for roughly 80% of the GDP.

  • Answer: ?


Anaphora resolution l.jpg
Anaphora Resolution

  • Anaphora Resolution is the task of finding the antecedents of anaphoric expressions

  • Example system:

    • Mitkov, Evans & Orasan (2002)

    • http://clg.wlv.ac.uk/MARS/


Anaphora pronouns22 l.jpg
Anaphora (pronouns)

  • Question:What is the biggest sector in Andorra’s economy?

  • Corpus:Andorra is a tiny land-locked country in southwestern Europe, between France and Spain. Tourism, the largest sector of Andorra’s tiny, well-to-do economy, accounts for roughly 80% of the GDP.

  • Answer: Tourism


Architecture of a qa system23 l.jpg
Architecture of a QA system

corpus

query

IR

Question

Analysis

question

documents/passages

answer-type

Document

Analysis

question

representation

passage

representation

Answer

Extraction

answers


Matching l.jpg
Matching

  • Given a question and an expression with a potential answer, calculate a matching score S = match(Q,A) that indicates how well Q matches A

  • Example

    • Q: When was Franz Kafka born?

    • A1: Franz Kafka died in 1924.

    • A2: Kafka was born in 1883.


Semantic matching l.jpg
Semantic Matching

Q:

answer(X)

franz(Y)

kafka(Y)

born(E)

patient(E,Y)

temp(E,X)

A1:

franz(x1)

kafka(x1)

die(x3)

agent(x3,x1)

in(x3,x2)

1924(x2)


Semantic matching26 l.jpg
Semantic Matching

Q:

answer(X)

franz(Y)

kafka(Y)

born(E)

patient(E,Y)

temp(E,X)

A1:

franz(x1)

kafka(x1)

die(x3)

agent(x3,x1)

in(x3,x2)

1924(x2)

X=x2


Semantic matching27 l.jpg
Semantic Matching

Q:

answer(x2)

franz(Y)

kafka(Y)

born(E)

patient(E,Y)

temp(E,x2)

A1:

franz(x1)

kafka(x1)

die(x3)

agent(x3,x1)

in(x3,x2)

1924(x2)

Y=x1


Semantic matching28 l.jpg
Semantic Matching

Q:

answer(x2)

franz(x1)

kafka(x1)

born(E)

patient(E,Y)

temp(E,x2)

A1:

franz(x1)

kafka(x1)

die(x3)

agent(x3,x1)

in(x3,x2)

1924(x2)

Y=x1


Semantic matching29 l.jpg
Semantic Matching

Q:

answer(x2)

franz(x1)

kafka(x1)

born(E)

patient(E,Y)

temp(E,x2)

A1:

franz(x1)

kafka(x1)

die(x3)

agent(x3,x1)

in(x3,x2)

1924(x2)

Match score = 3/6 = 0.50


Semantic matching30 l.jpg
Semantic Matching

Q:

answer(X)

franz(Y)

kafka(Y)

born(E)

patient(E,Y)

temp(E,X)

A2:

kafka(x1)

born(x3)

patient(x3,x1)

in(x3,x2)

1883(x2)


Semantic matching31 l.jpg
Semantic Matching

Q:

answer(X)

franz(Y)

kafka(Y)

born(E)

patient(E,Y)

temp(E,X)

A2:

kafka(x1)

born(x3)

patient(x3,x1)

in(x3,x2)

1883(x2)

X=x2


Semantic matching32 l.jpg
Semantic Matching

Q:

answer(x2)

franz(Y)

kafka(Y)

born(E)

patient(E,Y)

temp(E,x2)

A2:

kafka(x1)

born(x3)

patient(x3,x1)

in(x3,x2)

1883(x2)

Y=x1


Semantic matching33 l.jpg
Semantic Matching

Q:

answer(x2)

franz(x1)

kafka(x1)

born(E)

patient(E,x1)

temp(E,x2)

A2:

kafka(x1)

born(x3)

patient(x3,x1)

in(x3,x2)

1883(x2)

E=x3


Semantic matching34 l.jpg
Semantic Matching

Q:

answer(x2)

franz(x1)

kafka(x1)

born(x3)

patient(x3,x1)

temp(x3,x2)

A2:

kafka(x1)

born(x3)

patient(x3,x1)

in(x3,x2)

1883(x2)

E=x3


Semantic matching35 l.jpg
Semantic Matching

Q:

answer(x2)

franz(x1)

kafka(x1)

born(x3)

patient(x3,x1)

temp(x3,x2)

A2:

kafka(x1)

born(x3)

patient(x3,x1)

in(x3,x2)

1883(x2)

Match score = 4/6 = 0.67


Matching techniques l.jpg
Matching Techniques

  • Weighted matching

    • Higher weight for named entities

  • WordNet

    • Hyponyms

  • Inferences rules

    • Example:

      BORN(E) & IN(E,Y) & DATE(Y)  TEMP(E,Y)



Reranking l.jpg
Reranking

  • Most QA systems first produce a list of possible answers…

  • This is usually followed by a process called reranking

  • Reranking promotes correct answers to a higher rank


Factors in reranking l.jpg
Factors in reranking

  • Matching score

    • The better the match with the question, the more likely the answers

  • Frequency

    • If the same answer occurs many times, it is likely to be correct


Sanity checking l.jpg
Sanity Checking

Answer should be informative

Q: Who is Tom Cruise married to?

A: Tom Cruise

Q: Where was Florence Nightingale born?

A: Florence


Answer validation l.jpg
Answer Validation

  • Given a ranked list of answers, some of these might not make sense at all

  • Promote answers that make sense

  • How?

  • Use even a larger corpus!

    • “Sloppy” approach

    • “Strict” approach



Answer validation sloppy l.jpg
Answer validation (sloppy)

  • Given a question Q and a set of answers A1…An

  • For each i, generate query Q Ai

  • Count the number of hits for each i

  • Choose Ai with most number of hits

  • Use existing search engines

    • Google, AltaVista

    • Magnini et al. 2002 (CCP)


Corrected conditional probability l.jpg
Corrected Conditional Probability

  • Treat Q and A as a bag of words

    • Q = content words question

    • A = answer

      hits(A NEAR Q)

  • CCP(Qsp,Asp) = ------------------------------ hits(A) x hits(Q)

  • Accept answers above a certain CCP threshold


Answer validation strict l.jpg
Answer validation (strict)

  • Given a question Q and a set of answers A1…An

  • Create a declarative sentence with the focus of the question replaced by Ai

  • Use the strict search option in Google

    • High precision

    • Low recall

  • Any terms of the target not in the sentence as added to the query


Example l.jpg
Example

  • TREC 99.3Target: Woody Guthrie.Question: Where was Guthrie born?

  • Top-5 Answers:

    1) Britain

    * 2) Okemah, Okla.3) Newport

    * 4) Oklahoma5) New York


Example generate queries l.jpg
Example: generate queries

  • TREC 99.3Target: Woody Guthrie.Question: Where was Guthrie born?

  • Generated queries:

    1) “Guthrie was born in Britain”

    2) “Guthrie was born in Okemah, Okla.”3) “Guthrie was born in Newport”4) “Guthrie was born in Oklahoma”5) “Guthrie was born in New York”


Example add target words l.jpg
Example: add target words

  • TREC 99.3Target: Woody Guthrie.Question: Where was Guthrie born?

  • Generated queries:

    1) “Guthrie was born in Britain” Woody

    2) “Guthrie was born in Okemah, Okla.” Woody3) “Guthrie was born in Newport” Woody4) “Guthrie was born in Oklahoma” Woody5) “Guthrie was born in New York” Woody


Example morphological variants l.jpg
Example: morphological variants

TREC 99.3

Target: Woody Guthrie.

Question: Where was Guthrie born?

Generated queries:

“Guthrie is OR was OR are OR were born in Britain” Woody

“Guthrie is OR was OR are OR were born in Okemah, Okla.” Woody

“Guthrie is OR was OR are OR were born in Newport” Woody

“Guthrie is OR was OR are OR were born in Oklahoma” Woody

“Guthrie is OR was OR are OR were born in New York” Woody


Example google hits l.jpg
Example: google hits

TREC 99.3

Target: Woody Guthrie.

Question: Where was Guthrie born?

Generated queries:

“Guthrie is OR was OR are OR were born in Britain” Woody 0

“Guthrie is OR was OR are OR were born in Okemah, Okla.” Woody 10

“Guthrie is OR was OR are OR were born in Newport” Woody 0

“Guthrie is OR was OR are OR were born in Oklahoma” Woody 42

“Guthrie is OR was OR are OR were born in New York” Woody 2


Example reranked answers l.jpg
Example: reranked answers

TREC 99.3Target: Woody Guthrie.Question: Where was Guthrie born?

Original answers

1) Britain

* 2) Okemah, Okla.3) Newport

* 4) Oklahoma5) New York

Reranked answers

* 4) Oklahoma

* 2) Okemah, Okla.5) New York 1) Britain3) Newport


Summary l.jpg
Summary

  • Introduction to QA

    • Typical Architecture, Evaluation

    • Types of Questions and Answers

  • Use of general NLP techniques

    • Tokenisation, POS tagging, Parsing

    • NER, Anaphora Resolution

  • QA Techniques

    • Matching

    • Reranking

    • Answer Validation


Where to go from here l.jpg
Where to go from here

  • Producing answers in real-time

  • Improve accuracy

  • Answer explanation

  • User modelling

  • Speech interfaces

  • Dialogue (interactive QA)

  • Multi-lingual QA