1 / 25

IR4QA: An Unhappy Marriage Mark A. Greenwood

IR4QA: An Unhappy Marriage Mark A. Greenwood. Outline of Talk. Background ‘Ancient’ History Recent Past An Uncertain Future Possible New Directions. Background.

aman
Download Presentation

IR4QA: An Unhappy Marriage Mark A. Greenwood

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IR4QA: An Unhappy Marriage Mark A. Greenwood

  2. Outline of Talk • Background • ‘Ancient’ History • Recent Past • An Uncertain Future • Possible New Directions

  3. Background Although QA is not new, the language processing community has yet to develop a clearly articulated and commonly accepted guiding framework and research methodology, parallel to that of IR, MT, or text summarization. As a result, despite ten years of system evaluations in the TREC QA track for specific kinds of questions and answers, the community does not have a clear idea how much progress was made during that period for QA in general. OAQA09 Call for Papers

  4. Background • We will focus here on the selection of promising documents which can be subjected to further processing in order to extract exact answers to questions. • The common approach to this problem has been to employ an IR engine to retrieve a small set of relevant documents, a field known as IR4QA. • The rest of this talk will explain • How we got to this point • Why it is fundamentally flawed • Where we might go from here

  5. Outline of Talk • Background • ‘Ancient’ History • Recent Past • An Uncertain Future • Possible New Directions

  6. ‘Ancient’ History • Traditionally IR and QA were separate research areas • They had different users and goals • The inputs and outputs to both systems were radically different • Both had their own strengths and weaknesses

  7. ‘Ancient’ History • Early QA systems were usually just interfaces to structured data • LUNAR (Woods, 1973) • BASEBALL (Green et al., 1961) • Those systems which worked over text were usually based around reading comprehension exercises and used scenario templates • SAM (Schank and Abelson, 1977) • Questions varied in length but were asking for information which wasn’t known to the user • Systems were not open-domain, i.e. LUNAR only knew about moon rocks.

  8. ‘Ancient’ History • In comparison to QA systems early IR systems could be applied to any document collection • Performance varied from collection to collection but in principal • Queries were usually quite long and described the documents the user was looking for • The CACM collection is a good example • Systems returned full documents not exact answers • As the user already knew what they were looking for this was OK • Full documents doesn’t help when you don’t know what you are looking for as you then have to read all the returned documents

  9. Outline of Talk • Background • ‘Ancient’ History • Recent Past • An Uncertain Future • Possible New Directions

  10. Recent Past • Recent QA research has been guided by the TREC evaluations • The TREC QA track was originally conceived as a task that would interest both the IR and IE communities • Focused IR • Open-Domain IE • It was hoped that over time the two communities would work together to develop new combined approaches • Unfortunately it would seem that the IR community is not, on the whole, interested in the QA task

  11. Recent Past • Most, if not all, modern QA systems have adopted a (roughly) three stage architecture: question analysis, document retrieval, and answer extraction.

  12. Recent Past • IR4QA has not been aggressively researched by the community yet we know that... • IR performance places an upper-bound on end-to-end performance – a commonly quoted figure is 60% (Tellex et al., 2003) • Even if we look at the top 1000 documents no relevant documents are returned for 8% of the questions (Hovy et al., 2000) • Most systems use off-the-shelf IR components with little or no tuning to the task, i.e. Lucene, Okapi... • Complex multi-query strategies have been tried in an effort to solve the problem, but they only serve to highlight how bad performance at this step actually is.

  13. Recent Past • IR4QA has focused on the development and evaluation of the document retrieval component in such systems. • The main problems are • QA researchers are not IR researchers • We don’t fully understand the intricate details of IR engines • QA and IR are fundamentally different tasks

  14. Recent Past • Commonly accepted evaluation framework consists of (Roberts and Gaizauskas, 2004) • Coverage – the proportion of documents for which at least one answer bearing document is retrieved • Redundancy – the average number of answer bearing documents retrieved for a question

  15. Recent Past • There have been two workshops focused on the problem of IR4QA • Sheffield, SIGIR 2004 • Manchester, Coling 2008 • The main conclusions of both were that • IR4QA is very hard • Approaches that lead to increased IR performance do not necessarily lead to appreciable increases in end-to-end performance • Selection of documents shouldn’t be performed in isolation from the rest of the system

  16. Outline of Talk • Background • ‘Ancient’ History • Recent Past • An Uncertain Future • Possible New Directions

  17. An Uncertain Future • It seems clear that, on the whole, the IR community are not interested in QA • Using off-the-shelf IR components has been shown to introduce unacceptable caps on performance • The IR4QA community need to consider radically different approaches to the problem of selecting relevant documents from large corpora

  18. Outline of Talk • Background • ‘Ancient’ History • Recent Past • An Uncertain Future • Possible New Directions

  19. Possible New Directions • Answer extraction requires complex text processing • Answer extraction techniques don’t scale well • Some form of text selection component is required • There are two orthogonal directions we could take • Continue to use traditional IR techniques but discard the traditional view of what makes a document (and/or query) • Continue to work with traditional documents but use a radically different selection approach We need approaches that scale – working on AQUAINT size collections is nice for self contained experiments but shouldn’t be the end goal!

  20. What Is A Document? • Topic Indexing and Retrieval (Ahn and Webber, 2008) throws away the common idea of documents while using a standard IR engine to directly retrieve answers not text. • Topics are entities that answer questions • People, companies, locations etc. • Topic documents are built by simply joining together all sentences from a corpus that contain the topic (or variations of, i.e. Bill Clinton and William Clinton) • QA is then a matter of retrieving the most relevant topic document using an IR engine and returning the associated topic as the answer

  21. What Is A Document?

  22. Let The Data Guide You • A decade of recent QA research has yielded a lot of useful data • We have lots of example questions (at least a few thousand just from TREC) each of which... • Has a known correct answer • Is associated with at least one answer bearing document • We should use this data to guide new selection approaches. • A simple approach would be to perform query expansion by looking for terms which are often associated with correct answers to certain question types (Derczynski et al., 2008) • Look for patterns in the answer bearing documents and index collections based on these patterns rather than words

  23. Answer By Understanding • I’ve always been of the opinion that QA is intelligent IR • Where intelligence equates to some level of understanding • This suggests we should index meaning not just textual content. • Take into account co-reference when selecting text passages • Indexing relations should allow for more focused selection • ‘Hybrid’ search that uses annotations and text (Bhagdev et al., 2008)

  24. Discussion

  25. References • Kisuh Ahn and Bonnie Webber. 2008. Topic Indexing and Retrieval for Factoid QA. In Proceedings of the 2nd Workshop on Information Retrieval for Question Answering (IR4QA). • Ravish Bhagdev, Sam Chapman, Fabio Ciravegna, VitaveskaLanfranchi and Daniela Petrelli. 2008. Hybrid Search: Effectively Combining Keywords and Semantic Searches. In Proceedings of the 5th European Semantic Web Conference, ESWC 08, Tenerife. • Leon Derczynski, Jun Wang, Robert Gaizauskas and Mark A. Greenwood. 2008. A Data Driven Approach to Query Expansion in Question Answering. In Proceedings of the 2nd Workshop on Information Retrieval for Question Answering (IR4QA). • Bert F. Green, Alice K. Wolf, Carol Chomsky, and Kenneth Laughery. 1961. BASEBALL: An Automatic Question Answerer. In Proceedings of the Western Joint Computer Conference, volume 19, pages 219--224. • Eduard Hovy, Laurie Gerber, Ulf Hermjakob, Michael Junk, and Chin-Yew Lin. 2000. Question Answering in Webclopedia. In Proceedings of the 9th Text REtrieval Conference. • Ian Roberts and Robert Gaizauskas. 2004. Evaluating Passage Retrieval Approaches for Question Answering. In Proceedings of 26th European Conference on Information Retrieval (ECIR’04), pages 72--84, University of Sunderland, UK. • Roger C. Schank and Robert Abelson. 1977. Scripts, Plans, Goals and Understanding. Hillsdale. • Stefanie Tellex, Boris Katz, Jimmy Lin, Aaron Fernandes, and Gregory Marton. 2003. Quantitative Evaluation of Passage Retrieval Algorithms for Question Answering. In Proceedings of the Twenty-Sixth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 41--47, Toronto, Canada, July. • William Woods. 1973. Progress in Natural Language Understanding - An Application to Lunar Geology. In AFIPS Conference Proceedings, volume 42, pages 441--450.

More Related