Question answering at trec mark a greenwood
This presentation is the property of its rightful owner.
Sponsored Links
1 / 38

Question Answering at TREC Mark A. Greenwood PowerPoint PPT Presentation


  • 74 Views
  • Uploaded on
  • Presentation posted in: General

Question Answering at TREC Mark A. Greenwood. Outline of Talk. History of QA at TREC TREC 2005 Task Overview Evaluation Metrics Official Evaluation Results Answering Factoid/List Questions Question Processing Document Retrieval Answer Extraction Answering Definition Questions

Download Presentation

Question Answering at TREC Mark A. Greenwood

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Question answering at trec mark a greenwood

Question Answering at TRECMark A. Greenwood


Outline of talk

Outline of Talk

  • History of QA at TREC

  • TREC 2005

    • Task Overview

    • Evaluation Metrics

    • Official Evaluation Results

  • Answering Factoid/List Questions

    • Question Processing

    • Document Retrieval

    • Answer Extraction

  • Answering Definition Questions

    • Bare Target + Reduce + Filter + Approach

    • Target Enrichment + Filter Approach

  • Conclusions

  • Future Work

NLP Meeting


History of qa at trec

History of QA at TREC

  • QA Track first introduced at TREC 8 (Voorhees, 1999)

    • 200 fact-based short-answer questions

    • Questions mainly back formulated from documents

    • Answers could be 50-byte or 250-bytes snippets

    • 5 answers could be returned for each question

    • Best systems could answer over 2/3 of the questions (Moldovan et al., 1999; Srihari and Li, 1999).

  • TREC 10 (Voorhees, 2001) introduced:

    • List questions such as “Name 20 countries that produce coffee”

    • Questions which don’t have an answer in the collection

NLP Meeting


History of qa at trec1

History of QA at TREC

  • In TREC 11 (Voorhees, 2002):

    • Answers had to be exact

    • Only one answer could be returned per question.

  • TREC 12 (Voorhees, 2003) Introduced definition questions:

    • Define a target such as “aspirin” or “Aaron Copland”

    • A definition should contain a number of important facts (vital nuggets)

    • Can also include other associated information (non-vital nuggets)

    • Evaluated using a length based precision metric which penalizes long answers containing few nuggets.

NLP Meeting


History of qa at trec2

History of QA at TREC

  • TREC 13 (Voorhees, 2004) combines the three question types into a scenarios around targets. For instance

    • Target: Hale Bopp Comet

    • Factoid: When was the comet discovered?

    • Factoid: How often does it approach the earth?

    • List: In what countries was the comet visible on it’s last return?

    • Other: Tell me anything else not covered by the above questions

NLP Meeting


Outline of talk1

Outline of Talk

  • History of QA at TREC

  • TREC 2005

    • Task Overview

    • Evaluation Metrics

    • Official Evaluation Results

  • Answering Factoid/List Questions

    • Question Processing

    • Document Retrieval

    • Answer Extraction

  • Answering Definition Questions

    • Bare Target + Reduce + Filter + Approach

    • Target Enrichment + Filter Approach

  • Conclusions

  • Future Work

NLP Meeting


Trec 2005

TREC 2005

  • Questions were based around 75 targets

    • 19 people

    • 19 organizations

    • 19 things

    • 18 events

  • The series of targets contained a total of:

    • 362 factoid questions

    • 93 list questions

    • 75 (one per target) other questions

  • All answers had to be with reference to a document in the AQUAINT collection of newswire texts.

NLP Meeting


Example scenarios

Example Scenarios

  • AMWAY

    • F: When was AMWAY founded?

    • F: Where is it headquartered?

    • F: Who is president of the company

    • L: Name the officials of the company

    • F: What is the name “AMWAY” short for?

    • O:

  • return of Hong Kong to Chinese sovereignty

    • F: What is Hong Kong’s population?

    • F: When was Hong Kong returned to Chinese sovereignty?

    • F: Who was the Chinese President at the time of the return?

    • F: Who was the British Foreign Secretary at the time?

    • L: What other countries formally congratulated China on the return?

    • O:

NLP Meeting


Example scenarios1

Example Scenarios

  • Shiite

    • F: Who was the first Imam of the Shiite sect of Islam?

    • F: Where is his tomb?

    • F: What was this person’s relationship to the Prophet Mohammad?

    • F: Who was the third Imam of Shiite Muslims?

    • F: When did he die?

    • F: What portion of Muslims are Shiite?

    • L: What Shiite leaders were killed in Pakistan?

    • O:

NLP Meeting


Evaluation metrics

Evaluation Metrics

  • For factoid questions the metric is accuracy

    • Only exact supported answers and correct NIL responses are counted

  • For list questions the metric is F-measure (β = 1)

    • Only exact supported answers are counted

    • Set of correct answers (for recall purposes) is the union of all correct answers across all submitted runs plus any instances found during question development.

  • For other questions the metric F-measure (β = 3)

    • Recall is the proportion of vital nuggets returned

    • Precision is a length based penalty, where each valid nugget allows 100 non-whitespace characters to be returned.

  • These are combined to give a weighted score per target

    • Weighted Score = 0.5xFactoid + 0.25xListAvgF + 0.25xOtherAvgF

NLP Meeting


Official evaluation results

Official Evaluation Results

  • 30 groups participated in TREC 2005

  • In all 71 runs were submitted for evaluation

  • We submitted three runs

    • shef05lmg

    • shef05mc

    • shef05lc

  • The main evaluation is the per-series score (average of the weighted target score) but separate results are also given for the three different question types.

NLP Meeting


Factoid evaluation

Factoid Evaluation

Wrong, Unsupported, Inexact, Right

NLP Meeting


Factoid evaluation1

Factoid Evaluation

NLP Meeting


List evaluation

List Evaluation

NLP Meeting


List evaluation1

List Evaluation

NLP Meeting


Other evaluation

Other Evaluation

NLP Meeting


Other evaluation1

Other Evaluation

NLP Meeting


Per series evaluation

Per-Series Evaluation

NLP Meeting


Per series evaluation1

Per-Series Evaluation

NLP Meeting


Evaluation by group

Evaluation By Group

  • 30 groups submitted one or more runs to TREC 2005 including

    • Language Computer Corporation,

    • IBM,

    • NSA,

    • National Uni of Singapore,

    • Mitre Corporation,

    • Microsoft

  • Examining only the best run submitted by a group places us

    • 12th for answering factoid questions (shef05lmg)

    • 10th for answering list questions (shef05lmg)

    • 11th for answering other questions (shef05mc)

    • 9th for the per-series score (shef05lmg)

NLP Meeting


Outline of talk2

Outline of Talk

  • History of QA at TREC

  • TREC 2005

    • Task Overview

    • Evaluation Metrics

    • Official Evaluation Results

  • Answering Factoid/List Questions

    • Question Processing

    • Document Retrieval

    • Answer Extraction

  • Answering Definition Questions

    • Bare Target + Reduce + Filter + Approach

    • Target Enrichment + Filter Approach

  • Conclusions

  • Future Work

NLP Meeting


Answering factoid questions

Answering Factoid Questions

  • Most factoid QA systems use a three component architecture

    • Question analysis

    • Document retrieval

    • Answer Extraction

  • We have developed two approaches to each component

  • Question Analysis

    • Expected answer type analysis

    • Grammatical answer requirements

  • Document Retrieval

    • Lucene

    • MadCow

  • Answer Extraction

    • Matching on Logical Forms

    • Shallow Multi-Strategy Approach

NLP Meeting


Answering factoid questions1

Answering Factoid Questions

  • shef05lmg

    • Expected answer type analysis

    • Lucene

    • Shallow Multi-Strategy Approach

  • shef05mc

    • Grammatical answer requirements

    • MadCow

    • QA-LaSIE

  • shef05lc

    • Grammatical answer requirements

    • Lucene

    • QA-LaSIE

NLP Meeting


Initial question processing

Initial Question Processing

  • All our approaches to QA assume that each question can be both asked and answered in isolation.

  • The introduction of target based scenarios means that this is no longer true.

  • We use a single approach based on both pronominal and nominal coreference resolution to merge the target and questions.

NLP Meeting


Initial question processing1

Initial Question Processing

NLP Meeting


Question analysis

Question Analysis

  • Grammatical answer requirements

    • Parse the sentence using SUPPLE to produce a qlf representation

    • qlf representation places constraints on possible answers

    • For example “Who wrote Hamlet?”

      qvar(e1), qattr(e1,name), person(e1), lsubj(e2,e1), write(e2), time(e2,past), aspect(e2,simple), voice(e2,active), lobj(e2,e3), name(e3,‘Hamlet’)

  • Expected answer type analysis

    • The expected answer type (EAT) is determined using a hand-built rule based question classifier

    • A hierarchy of EATs is used to allow relaxing of constraints

    • For example “Who is Paul Newman married to?”

      Person {gender=female}

NLP Meeting


Document retrieval

Document Retrieval

  • We use document retrieval to select a small subset of the whole collection which we can then process in more detail.

  • Lucene

    • Boolean based document selection

    • Vector space document ranking

    • Query is the processed question

    • We use it to retrieve relevant passages

    • Generally use the top 20 passages

  • MadCow

    • Boolean based document selection

    • Iterative approach to query construction

    • We use it to retrieve relevant sentences

NLP Meeting


Answer extraction

Answer Extraction

  • Matching on Logical Forms

    • SUPPLE is used to parse retrieved documents

    • Discourse interpretation then attempts to find entities that satisfy the requirements to be considered an answer.

    • Equivalent answers are grouped together as part of the ranking function

  • Shallow Multi-Strategy Approach

    • All entities of the EAT are extracted from the retrieved documents

    • Equivalent answers are grouped together

    • Each answer group is then scored based on

      • The frequency of occurrence

      • The best document rank

      • Similarity between the containing sentences and the question

    • For list questions where the classifier fails to determine the EAT

      • Assume the answer is a noun phrase

      • Extract, group and rank all noun phrases in retrieved documents

NLP Meeting


Outline of talk3

Outline of Talk

  • History of QA at TREC

  • TREC 2005

    • Task Overview

    • Evaluation Metrics

    • Official Evaluation Results

  • Answering Factoid/List Questions

    • Question Processing

    • Document Retrieval

    • Answer Extraction

  • Answering Definition Questions

    • Bare Target + Reduce + Filter + Approach

    • Target Enrichment + Filter Approach

  • Conclusions

  • Future Work

NLP Meeting


Answering definition questions

Answering Definition Questions

  • Two different systems for answering definition questions

    • Bare Target + Reduce + Filter + Approach

    • Target Enrichment + Filter Approach

  • Both approaches can be used with either Lucene or MadCow

  • shef05lmg

    • Bare Target + Reduce + Filter Approach

    • Lucene

  • shef05mc

    • Target Enrichment + Filter Approach

    • MadCow

  • shef05lc

    • Target Enrichment + Filter Approach

    • Lucene

NLP Meeting


Bare target reduce filter

Bare Target + Reduce + Filter

  • The target is processed to determine the focus and optional qualification. For example, “Abraham in the Old Testament”:

    • Focus: Abraham

    • Qualification: Old Testament

  • Relevant sentences (those containing the focus) are retrieved

  • Sentences are reduced by removing redundant phrase

  • A two stage filtering process removes duplicate information

    • Two sentences are equivalent if they overlap 70% at the word level

    • If sum of increasing n-gram overlap passes a threshold

  • Keep finding relevant sentences until either

    • No more sentences

    • Definition length reaches 4000 non-whitespace characters

NLP Meeting


Target enrichment filter

Target Enrichment + Filter

  • The focus of the target is determined and used to generate

    • X is a

    • such as X

  • Relevant texts are retrieved from “trusted sources”

    • WordNet ,Online version of Britannica, The web in general

  • Highly co-occuring terms are extracted from these texts using the generated patterns

  • Boolean retrieval is then used to locate sentences containing the target

  • Sentences are then grouped and ranked based on their similarity to each other and the mined terms

  • Maximum definition size is 14 nuggets or 4000 non-whitespace characters

NLP Meeting


Outline of talk4

Outline of Talk

  • History of QA at TREC

  • TREC 2005

    • Task Overview

    • Evaluation Metrics

    • Official Evaluation Results

  • Answering Factoid/List Questions

    • Question Processing

    • Document Retrieval

    • Answer Extraction

  • Answering Definition Questions

    • Bare Target + Reduce + Filter + Approach

    • Target Enrichment + Filter Approach

  • Conclusions

  • Future Work

NLP Meeting


Conclusions

Conclusions

  • Our best performing system performs above average when independently evaluated.

  • TREC is becoming harder each year

    • We keep up (9th in both 2004 and 2005)

    • We don’t significantly improve

  • We have developed multiple approaches to QA

    • At least two approaches to all three components of factoid QA

    • Two different approaches to definitional QA

  • We assume each question can be asked in isolation

    • As of TREC 2004 this is not true

    • We need a better strategy for dealing with a question series

NLP Meeting


Outline of talk5

Outline of Talk

  • History of QA at TREC

  • TREC 2005

    • Task Overview

    • Evaluation Metrics

    • Official Evaluation Results

  • Answering Factoid/List Questions

    • Question Processing

    • Document Retrieval

    • Answer Extraction

  • Answering Definition Questions

    • Bare Target + Reduce + Filter + Approach

    • Target Enrichment + Filter Approach

  • Conclusions

  • Future Work

NLP Meeting


Future work

Future Work

  • Participation in TREC 2006

    • Don’t yet know exactly what the format will be

    • Assume target based questions like 2005

  • Currently no funded QA research taking place in Sheffield

    • We rely on those with an interest contributing whatever time they can

    • Extra people always welcome

    • If we start early in the year less stressful in August!

  • Is there enough interest to (re-)start a QA reading/work group?

NLP Meeting


Question answering at trec mark a greenwood

Any Questions?

NLP Meeting


Bibliography

Bibliography

Dan Moldovan, Sanda Harabagiu, Marius Paşca, Rada Mihalcea, Richard Goodrum, Roxana Gîrju and Vasile Rus. LASSO: A Tool for Surfing the Answer Net. In Proveedings of the 8th Text Retrieval Conference, 1999.

Rohini Srihari and Wei Li. Information Extraction Supported Question Answering. In Proceedings of the 8th Text Retrieval Conference, 1999.

Ellen Voorhees. The TREC-8 Question Answering Track Report. In Proceedings of the 8th Text Retrieval Conference, 1999.

Ellen Voorhees. Overview of the TREC 2002 Question Answering Track. In Proceedings of the 11th Text Retrieval Conference, 2002.

Ellen Voorhees. Overview of the TREC 2003 Question Answering Track. In Proceedings of the 12th Text Retrieval Conference, 2003.

Ellen Voorhees. Overview of the TREC 2004 Question Answering Track. In Proceedings of the 13th Text Retrieval Conference, 2004.

NLP Meeting


  • Login