1 / 38

Question Answering at TREC Mark A. Greenwood

Question Answering at TREC Mark A. Greenwood. Outline of Talk. History of QA at TREC TREC 2005 Task Overview Evaluation Metrics Official Evaluation Results Answering Factoid/List Questions Question Processing Document Retrieval Answer Extraction Answering Definition Questions

shirleyi
Download Presentation

Question Answering at TREC Mark A. Greenwood

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Question Answering at TRECMark A. Greenwood

  2. Outline of Talk • History of QA at TREC • TREC 2005 • Task Overview • Evaluation Metrics • Official Evaluation Results • Answering Factoid/List Questions • Question Processing • Document Retrieval • Answer Extraction • Answering Definition Questions • Bare Target + Reduce + Filter + Approach • Target Enrichment + Filter Approach • Conclusions • Future Work NLP Meeting

  3. History of QA at TREC • QA Track first introduced at TREC 8 (Voorhees, 1999) • 200 fact-based short-answer questions • Questions mainly back formulated from documents • Answers could be 50-byte or 250-bytes snippets • 5 answers could be returned for each question • Best systems could answer over 2/3 of the questions (Moldovan et al., 1999; Srihari and Li, 1999). • TREC 10 (Voorhees, 2001) introduced: • List questions such as “Name 20 countries that produce coffee” • Questions which don’t have an answer in the collection NLP Meeting

  4. History of QA at TREC • In TREC 11 (Voorhees, 2002): • Answers had to be exact • Only one answer could be returned per question. • TREC 12 (Voorhees, 2003) Introduced definition questions: • Define a target such as “aspirin” or “Aaron Copland” • A definition should contain a number of important facts (vital nuggets) • Can also include other associated information (non-vital nuggets) • Evaluated using a length based precision metric which penalizes long answers containing few nuggets. NLP Meeting

  5. History of QA at TREC • TREC 13 (Voorhees, 2004) combines the three question types into a scenarios around targets. For instance • Target: Hale Bopp Comet • Factoid: When was the comet discovered? • Factoid: How often does it approach the earth? • List: In what countries was the comet visible on it’s last return? • Other: Tell me anything else not covered by the above questions NLP Meeting

  6. Outline of Talk • History of QA at TREC • TREC 2005 • Task Overview • Evaluation Metrics • Official Evaluation Results • Answering Factoid/List Questions • Question Processing • Document Retrieval • Answer Extraction • Answering Definition Questions • Bare Target + Reduce + Filter + Approach • Target Enrichment + Filter Approach • Conclusions • Future Work NLP Meeting

  7. TREC 2005 • Questions were based around 75 targets • 19 people • 19 organizations • 19 things • 18 events • The series of targets contained a total of: • 362 factoid questions • 93 list questions • 75 (one per target) other questions • All answers had to be with reference to a document in the AQUAINT collection of newswire texts. NLP Meeting

  8. Example Scenarios • AMWAY • F: When was AMWAY founded? • F: Where is it headquartered? • F: Who is president of the company • L: Name the officials of the company • F: What is the name “AMWAY” short for? • O: • return of Hong Kong to Chinese sovereignty • F: What is Hong Kong’s population? • F: When was Hong Kong returned to Chinese sovereignty? • F: Who was the Chinese President at the time of the return? • F: Who was the British Foreign Secretary at the time? • L: What other countries formally congratulated China on the return? • O: NLP Meeting

  9. Example Scenarios • Shiite • F: Who was the first Imam of the Shiite sect of Islam? • F: Where is his tomb? • F: What was this person’s relationship to the Prophet Mohammad? • F: Who was the third Imam of Shiite Muslims? • F: When did he die? • F: What portion of Muslims are Shiite? • L: What Shiite leaders were killed in Pakistan? • O: NLP Meeting

  10. Evaluation Metrics • For factoid questions the metric is accuracy • Only exact supported answers and correct NIL responses are counted • For list questions the metric is F-measure (β = 1) • Only exact supported answers are counted • Set of correct answers (for recall purposes) is the union of all correct answers across all submitted runs plus any instances found during question development. • For other questions the metric F-measure (β = 3) • Recall is the proportion of vital nuggets returned • Precision is a length based penalty, where each valid nugget allows 100 non-whitespace characters to be returned. • These are combined to give a weighted score per target • Weighted Score = 0.5xFactoid + 0.25xListAvgF + 0.25xOtherAvgF NLP Meeting

  11. Official Evaluation Results • 30 groups participated in TREC 2005 • In all 71 runs were submitted for evaluation • We submitted three runs • shef05lmg • shef05mc • shef05lc • The main evaluation is the per-series score (average of the weighted target score) but separate results are also given for the three different question types. NLP Meeting

  12. Factoid Evaluation Wrong, Unsupported, Inexact, Right NLP Meeting

  13. Factoid Evaluation NLP Meeting

  14. List Evaluation NLP Meeting

  15. List Evaluation NLP Meeting

  16. Other Evaluation NLP Meeting

  17. Other Evaluation NLP Meeting

  18. Per-Series Evaluation NLP Meeting

  19. Per-Series Evaluation NLP Meeting

  20. Evaluation By Group • 30 groups submitted one or more runs to TREC 2005 including • Language Computer Corporation, • IBM, • NSA, • National Uni of Singapore, • Mitre Corporation, • Microsoft • … • Examining only the best run submitted by a group places us • 12th for answering factoid questions (shef05lmg) • 10th for answering list questions (shef05lmg) • 11th for answering other questions (shef05mc) • 9th for the per-series score (shef05lmg) NLP Meeting

  21. Outline of Talk • History of QA at TREC • TREC 2005 • Task Overview • Evaluation Metrics • Official Evaluation Results • Answering Factoid/List Questions • Question Processing • Document Retrieval • Answer Extraction • Answering Definition Questions • Bare Target + Reduce + Filter + Approach • Target Enrichment + Filter Approach • Conclusions • Future Work NLP Meeting

  22. Answering Factoid Questions • Most factoid QA systems use a three component architecture • Question analysis • Document retrieval • Answer Extraction • We have developed two approaches to each component • Question Analysis • Expected answer type analysis • Grammatical answer requirements • Document Retrieval • Lucene • MadCow • Answer Extraction • Matching on Logical Forms • Shallow Multi-Strategy Approach NLP Meeting

  23. Answering Factoid Questions • shef05lmg • Expected answer type analysis • Lucene • Shallow Multi-Strategy Approach • shef05mc • Grammatical answer requirements • MadCow • QA-LaSIE • shef05lc • Grammatical answer requirements • Lucene • QA-LaSIE NLP Meeting

  24. Initial Question Processing • All our approaches to QA assume that each question can be both asked and answered in isolation. • The introduction of target based scenarios means that this is no longer true. • We use a single approach based on both pronominal and nominal coreference resolution to merge the target and questions. NLP Meeting

  25. Initial Question Processing NLP Meeting

  26. Question Analysis • Grammatical answer requirements • Parse the sentence using SUPPLE to produce a qlf representation • qlf representation places constraints on possible answers • For example “Who wrote Hamlet?” qvar(e1), qattr(e1,name), person(e1), lsubj(e2,e1), write(e2), time(e2,past), aspect(e2,simple), voice(e2,active), lobj(e2,e3), name(e3,‘Hamlet’) • Expected answer type analysis • The expected answer type (EAT) is determined using a hand-built rule based question classifier • A hierarchy of EATs is used to allow relaxing of constraints • For example “Who is Paul Newman married to?” Person {gender=female} NLP Meeting

  27. Document Retrieval • We use document retrieval to select a small subset of the whole collection which we can then process in more detail. • Lucene • Boolean based document selection • Vector space document ranking • Query is the processed question • We use it to retrieve relevant passages • Generally use the top 20 passages • MadCow • Boolean based document selection • Iterative approach to query construction • We use it to retrieve relevant sentences NLP Meeting

  28. Answer Extraction • Matching on Logical Forms • SUPPLE is used to parse retrieved documents • Discourse interpretation then attempts to find entities that satisfy the requirements to be considered an answer. • Equivalent answers are grouped together as part of the ranking function • Shallow Multi-Strategy Approach • All entities of the EAT are extracted from the retrieved documents • Equivalent answers are grouped together • Each answer group is then scored based on • The frequency of occurrence • The best document rank • Similarity between the containing sentences and the question • For list questions where the classifier fails to determine the EAT • Assume the answer is a noun phrase • Extract, group and rank all noun phrases in retrieved documents NLP Meeting

  29. Outline of Talk • History of QA at TREC • TREC 2005 • Task Overview • Evaluation Metrics • Official Evaluation Results • Answering Factoid/List Questions • Question Processing • Document Retrieval • Answer Extraction • Answering Definition Questions • Bare Target + Reduce + Filter + Approach • Target Enrichment + Filter Approach • Conclusions • Future Work NLP Meeting

  30. Answering Definition Questions • Two different systems for answering definition questions • Bare Target + Reduce + Filter + Approach • Target Enrichment + Filter Approach • Both approaches can be used with either Lucene or MadCow • shef05lmg • Bare Target + Reduce + Filter Approach • Lucene • shef05mc • Target Enrichment + Filter Approach • MadCow • shef05lc • Target Enrichment + Filter Approach • Lucene NLP Meeting

  31. Bare Target + Reduce + Filter • The target is processed to determine the focus and optional qualification. For example, “Abraham in the Old Testament”: • Focus: Abraham • Qualification: Old Testament • Relevant sentences (those containing the focus) are retrieved • Sentences are reduced by removing redundant phrase • A two stage filtering process removes duplicate information • Two sentences are equivalent if they overlap 70% at the word level • If sum of increasing n-gram overlap passes a threshold • Keep finding relevant sentences until either • No more sentences • Definition length reaches 4000 non-whitespace characters NLP Meeting

  32. Target Enrichment + Filter • The focus of the target is determined and used to generate • X is a • such as X • Relevant texts are retrieved from “trusted sources” • WordNet ,Online version of Britannica, The web in general • Highly co-occuring terms are extracted from these texts using the generated patterns • Boolean retrieval is then used to locate sentences containing the target • Sentences are then grouped and ranked based on their similarity to each other and the mined terms • Maximum definition size is 14 nuggets or 4000 non-whitespace characters NLP Meeting

  33. Outline of Talk • History of QA at TREC • TREC 2005 • Task Overview • Evaluation Metrics • Official Evaluation Results • Answering Factoid/List Questions • Question Processing • Document Retrieval • Answer Extraction • Answering Definition Questions • Bare Target + Reduce + Filter + Approach • Target Enrichment + Filter Approach • Conclusions • Future Work NLP Meeting

  34. Conclusions • Our best performing system performs above average when independently evaluated. • TREC is becoming harder each year • We keep up (9th in both 2004 and 2005) • We don’t significantly improve • We have developed multiple approaches to QA • At least two approaches to all three components of factoid QA • Two different approaches to definitional QA • We assume each question can be asked in isolation • As of TREC 2004 this is not true • We need a better strategy for dealing with a question series NLP Meeting

  35. Outline of Talk • History of QA at TREC • TREC 2005 • Task Overview • Evaluation Metrics • Official Evaluation Results • Answering Factoid/List Questions • Question Processing • Document Retrieval • Answer Extraction • Answering Definition Questions • Bare Target + Reduce + Filter + Approach • Target Enrichment + Filter Approach • Conclusions • Future Work NLP Meeting

  36. Future Work • Participation in TREC 2006 • Don’t yet know exactly what the format will be • Assume target based questions like 2005 • Currently no funded QA research taking place in Sheffield • We rely on those with an interest contributing whatever time they can • Extra people always welcome • If we start early in the year less stressful in August! • Is there enough interest to (re-)start a QA reading/work group? NLP Meeting

  37. Any Questions? NLP Meeting

  38. Bibliography Dan Moldovan, Sanda Harabagiu, Marius Paşca, Rada Mihalcea, Richard Goodrum, Roxana Gîrju and Vasile Rus. LASSO: A Tool for Surfing the Answer Net. In Proveedings of the 8th Text Retrieval Conference, 1999. Rohini Srihari and Wei Li. Information Extraction Supported Question Answering. In Proceedings of the 8th Text Retrieval Conference, 1999. Ellen Voorhees. The TREC-8 Question Answering Track Report. In Proceedings of the 8th Text Retrieval Conference, 1999. Ellen Voorhees. Overview of the TREC 2002 Question Answering Track. In Proceedings of the 11th Text Retrieval Conference, 2002. Ellen Voorhees. Overview of the TREC 2003 Question Answering Track. In Proceedings of the 12th Text Retrieval Conference, 2003. Ellen Voorhees. Overview of the TREC 2004 Question Answering Track. In Proceedings of the 13th Text Retrieval Conference, 2004. NLP Meeting

More Related