1 / 32

Learning to Find Answers to Questions

Learning to Find Answers to Questions. Eugene Agichtein Steve Lawrence Columbia University NEC Research Luis Gravano Columbia University. Motivation. Millions of natural language questions are submitted to web search engines daily.

mariko
Download Presentation

Learning to Find Answers to Questions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning to Find Answers to Questions Eugene Agichtein Steve Lawrence Columbia UniversityNEC Research Luis Gravano Columbia University

  2. Motivation • Millions of natural language questions are submitted to web search engines daily. • An increasing number of search services specifically target natural language questions • AskJeeves • Databases of precompiled information, metasearching, other proprietary methods • AskMe.com and similar • Facilitate interaction with human experts

  3. Problem Statement • Problem/Goal: Find documents containing answers to questions within a collection of text documents • Collection: Pages on the web as indexed by a web search engine • General Method: Transform questions into a set of new queries that maximize the probability of returning answers to the questions, using existing IR systems or Search Engines.

  4. Example • Question: “What is a hard disk?” • Current search engines might ignore the stopwords, processing a query {hard, disk} and may return homepages of hard drive manufacturers. • A good answer might include a definition or an explanation of what a hard disk is. • Such answers are likely to contain phrases such as “… is a …”, “… is used to …”, etc… • Submitting queries such as { {hard, disk} NEAR “is a” },{{hard, disk} NEAR “is used to” }, etc., may bias the search engine to return answers to the original question.

  5. Method • Automatically learn generally applicable question-answer transformations based on the question-answer pairs in the training collection. • Automatically probe each IR system (e.g., web search engine) to discover which transformations work better than others for each IR system. • At run-time, transform the question using the best transformations from Step 2 and submit to each IR system. Training

  6. Background/Related Work • Decades of NLP research on Question-Answering • Manual methods-linguistics, parsing, heuristics… • Learning-based methods • General approach to Q-A: • Find candidate documents in database (our focus) • Extract answer from documents (traditional focus) • Text Retrieval Evaluation Conference (TREC) Question-Answering Track • Retrieve a short (50 or 250 byte) answer to a set of test questions

  7. Related Work (cont.) • Most systems focus on extracting answers from documents • Most use variants of standard vector space or probabilistic retrieval models to retrieve documents, followed by heuristics and/or linguistics-based methods to extract best passages • Evaluation has focused on questions with precise answers Abney et al., Cardie et al., …

  8. Related Work (cont.) • Berger et al. – independently considered statistical models for finding co-occurring terms in question/answer pairs to facilitate answer retrieval (SIGIR 2000). • Lawrence, Giles (IEEE IC 1998) • Queries transformed into specific ways of expressing an answer, e.g. “What is <X>?” is transformed into phrases such as “<X> is” and “<X> refers to”. • Transformations manually coded, are same for all search engines • Glover et al. (SAINT 2001) – Category-specific query modification

  9. Our Contributions: The Tritus System • Introduced a method for automatically learning multiple query transformations optimized for a specific information retrieval system, with the goal of maximizing the probability of retrieving documents containing the answers. • Developed a prototype implementation of a meta-search engine Tritus automatically optimized for real-world web search engines. • Performed a thorough evaluation of the Tritus system, comparing it to state-of-the-art search engines.

  10. TrainingTritus • Training Algorithm • Generate question phrases • Generate candidate transforms • Evaluate candidate transforms on target IR system(s) • Data • 30,000 question-answer pairs from 270 Frequently Asked Question (FAQ) files obtained from the FAQFinder project

  11. Training Step 1: Generating Question Phrases • Generate phrases that identify different categories of questions • For example, the phrase "what is a" in the question "what is a virtual private network?" tells us the goal of the question • Find commonly occurring n-grams at the beginning of questions

  12. Training Step 1 (cont.) • Limitations – e.g. “How do I find out what a sonic boom is?” • Advantages of this approach • Very inexpensive to compute (especially important at run-time) • Domain and language independent, and can be extended in a relatively straightforward fashion for other European languages.

  13. Training Step 2: Generating Candidate Transforms • Generate candidate terms and phrases for each of the question phrases from the previous stage • For each question in the training data matching the current question phrase we rank n-grams in the corresponding answers according to co-occurrence frequency. • To reduce domain bias, candidate transforms with nouns were discarded using a part-of-speech tagger (Brill's) • e.g., the term "telephone" is intuitively not very useful for a question "what is a rainbow?"

  14. Training Step 2: Generating Candidate Transforms (cont.) • We take the top topKphrases n-grams with the highest frequency counts and apply term weighting • Weight calculated as in Okapi BM25 (uses Robertson/Sparck Jones weights) • where r = number of relevant documents containing t, R = number of relevant documents, n = number of documents containing t, N = number of documents in collection • Estimate of selectivity/discrimination of a candidate transform with respect to a specific question type • Weighting extended to phrases

  15. Training Step 2: Sample Candidate Transforms • Final TermSelection Weight twt = qtft * wt • where qtft= frequency of t in the relevant question type, wt = term selectivity/discrimination weight, twt = resulting candidate transform weight

  16. Training Step 3: Evaluate Candidate Transforms on a Target IR System • Search engines have different ranking methods and treat different queries in different ways (phrases, stop words, etc.) • Candidate transforms grouped into buckets according to length • Phrases of different length may be treated differently • Top n in each bucket evaluated on target IR system

  17. Training Step 3 (cont.) • For each question phrase and search engine: • For up to numExamples QA pairs matching question, sorted by answer length, test each candidate transform • e.g. for the QP "what is a", candidate transform "refers to", and question "what is a VPN", the rewritten query {VPN and "refers to" } is sent to each SE • Similarity of retrieved documents to known answer computed • Final weight for transforms is computed as average similarity between known answers and documents retrieved, across all matching questions evaluated • Query syntax transformed for each search engine, transforms encoded as phrases, "NEAR" operator used for AltaVista [Google reports including term proximity in ranking]

  18. Computing Similarity of Known Answers and Retrieved Documents • Consider subdocuments of length subdocLen within the retrieved documents, overlapping by subdocLen / 2 • Assumption that answers are localized • Find maximum similarity of any subdocument with the known answer • docScore(D) = max (BM25phrase (Answer, Di)) • where t = term, Q = query, k1 = 1.2, k3 = 1000, K = k1((1-b)+b.dl/avdl), b = 0.5, dl is the document length in tokens, avdl is the average document length in tokens, wtis the term relevance weight, tftis the frequency of term t in the document, qtftis the term frequency within the question phrase (query topic in original BM25), and terms include phrases

  19. Sample Transforms

  20. Evaluating Queries at Runtime • Search for matching question phrases, with preference for longer (more specific) phrases • Retrieve corresponding transforms and send transformed queries to search engine • Compute similarity of returned documents with respect to transformed query • If document retrieved by multiple transforms, use maximum similarity

  21. Sample Query

  22. Experimental Setup/Evaluation • Real questions from the query log of the Excite search engine from 12/20/99 • Evaluated the following four question types: Where, What, How, and Who • These are the four most common types of questions and account for over 90% of natural language questions to Excite • Random sample of 50 questions extracted for each question type • Potentially offensive queries removed • Checked that queries were not in the training set • None of the evaluation queries were used in any part of the training process • Results from each search engine retrieved in advance • Results shown to evaluators in random order • Evaluators do not know which engine produced the results • 89 questions evaluated

  23. Sample Questions Evaluated • Who was the first Japanese player in baseball? • Who was the original singer of fly me to the moon? • Where is the fastest passenger train located? • How do I replace a Chevy engine in my pickup? • How do I keep my refrigerator smelling good? • How do I get a day off school? • How do I improve my vocal range? • What are ways people can be motivated? • What is a sonic boom? • What are the advantages of being unicellular?

  24. Systems Evaluated • AskJeeves (AJ) – Search engine specializing in answering natural language questions • Returns different types of responses - we parse each different type • Google (GO) – The Google search engine as is • Tritus optimized for Google (TR-GO ) • AltaVista (AV) – The AltaVista search engine as is • Tritus optimized for AltaVista (TR-AV)

  25. Best Performing System Percentage of questions where a system returns the most relevant documents at document cutoff K. All engines considered best for ties. Results for lowest performing systems not statistically significant (very small number of queries where they perform best)

  26. Average Precision Average precision at document cutoff K

  27. Precision by Question Type • Results indicate advantages of Tritus, and best underlying search engine to use vary, but amount of data limits strong conclusions Precision at K for What (a), How (b), Where (c)and Who (d)type questions.

  28. Document Overlap (a) (b) (c) Overlap of documents retrieved by transformed queries withthe original system: top 150 (a), top 10 (b) and relevant of the top 10 (c).

  29. Future Research • Combining multiple transformations into a single query • Using multiple search engines simultaneously • Identifying and routing the questions to the best search engines for different question types • Identifying phrase transforms containing content words from the query • Dynamic query submission using results of initial transformations to guide subsequent transformations.

  30. Summary • Introduced a method for learning query transformations that improves the ability to retrieve documents containing answers to questions from an IR system • In our approach, we: • Automatically classify questions into different question types • Automatically generate candidate transforms from a training set of question/answer pairs • Automatically evaluate transforms on the target IR system(s) • Implemented and evaluated for web search engines • Blind evaluation on a set of real queries shows the method significantly outperforms the underlying search engines for common question types.

  31. Additional Information http://tritus.cs.columbia.edu/ Contact the authors: • http://www.cs.columbia.edu/~eugene/ • http://www.neci.nj.nec.com/homepages/lawrence/ • http://www.cs.columbia.edu/~gravano/

  32. Assumption • For some common types of natural language questions (e.g., “What is”, “Who is”, etc…) there exist common ways of expressing answers to the question.

More Related