1 / 40

Day 14

Day 14. Information Retrieval Question Answering. TREC. TREC – Text REtrieval Conference Administered by the National Institute of Standards (NIST) Annual competition held annually since 1992.

neola
Download Presentation

Day 14

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Day 14 Information Retrieval Question Answering

  2. TREC • TREC – Text REtrieval Conference • Administered by the National Institute of Standards (NIST) • Annual competition held annually since 1992. • First conference included leading text retrieval groups at UMass, City University London, Cornell, and a smattering of industry groups.

  3. The Aquaint Corpus • The corpus used by TREC from which answers are drawn. • LDC2002T31 (on patas) • Newswire from three sources: • Xinhua News Service (People's Republic of China) • New York Times News Service • Associated Press Worldstream News Service • Not current: years 1996-2000 for Xinhua, 1998-2000 for NYT and AP • For TREC competition: assumed current

  4. TREC QA track • Three types of questions in TREC QA track: • Factoid • List • Other • All clustered into topics

  5. TREC Question File

  6. Question Answering (QA) • Uses IR and IE techniques (and more…) • Questions posed in Natural Language • Who was Genghis Khan? • What songs did Barry Manilow compose? • What countries fly the F-16? • When was James Dean born? • What does Park Jae-sang sing? • Answers retrieved from a collection of documents (or a database, or the Web)

  7. Park Jae-sang

  8. Park Jae-sang

  9. Designing a QA System • Start with a question: • Who won the Nobel Peace Prize in 1991? • Assume you have a search engine API at your disposal • Need to return the answer: • Aung San SuuKyi • Aung San SuuKyi won the Nobel Peace Prize in 1991 • What do you do?

  10. Designing a QA System • Assume: • The search engine returns • Snippets, and, • Documents

  11. Designing a QA System

  12. Designing a QA System • Assume: • The search engine returns • Snippets, and, • Documents • Documents • Are in English • Contain passages of interest • Not all documents will have the answer

  13. Designing a QA System Who won the Nobel Peace Prize in 1991? But many foreign investors remain sceptical, and western governments are withholding aid because of the Slorc's dismal human rights record and the continued detention of Ms Aung San SuuKyi, the opposition leader who won the Nobel Peace Prize in 1991. The military junta took power in 1988 as pro-democracy demonstrations were sweeping the country. It held elections in 1990, but has ignored their result. It has kept the 1991 Nobel peace prize winner, Aung San SuuKyi - leader of the opposition party which won a landslide victory in the poll - under house arrest since July 1989. The regime, which is also engaged in a battle with insurgents near its eastern border with Thailand, ignored a 1990 election victory by an opposition party and is detaining its leader, Ms Aung San SuuKyi, who was awarded the 1991 Nobel Peace Prize. According to the British Red Cross, 5,000 or more refugees, mainly the elderly and women and children, are crossing into Bangladesh each day.

  14. Designing a QA System • Assume: • The search engine returns • Snippets, and, • Documents • Documents • Are in English • Contain passages of interest • Not all documents will have the answer • The sky’s the limit wrt tools, resources, time, etc.

  15. Designing a QA System Who won the Nobel Peace Prize in 1991?

  16. Designing a QA System Who won the Nobel Peace Prize in 1991? But many foreign investors remain sceptical, and western governments are withholding aid because of the Slorc's dismal human rights record and the continued detention of Ms Aung San SuuKyi, the opposition leader who won the Nobel Peace Prize in 1991. The military junta took power in 1988 as pro-democracy demonstrations were sweeping the country. It held elections in 1990, but has ignored their result. It has kept the 1991 Nobel peace prize winner, Aung San SuuKyi - leader of the opposition party which won a landslide victory in the poll - under house arrest since July 1989. The regime, which is also engaged in a battle with insurgents near its eastern border with Thailand, ignored a 1990 election victory by an opposition party and is detaining its leader, Ms Aung San SuuKyi, who was awarded the 1991 Nobel Peace Prize. According to the British Red Cross, 5,000 or more refugees, mainly the elderly and women and children, are crossing into Bangladesh each day.

  17. An Example Who won the Nobel Peace Prize in 1991? But many foreign investors remain sceptical, and western governments are withholding aid because of the Slorc's dismal human rights record and the continued detention of Ms Aung San Suu Kyi, the opposition leader who won the Nobel Peace Prize in 1991. The military junta took power in 1988 as pro-democracy demonstrations were sweeping the country. It held elections in 1990, but has ignored their result. It has kept the 1991 Nobel peace prize winner, Aung San Suu Kyi - leader of the opposition party which won a landslide victory in the poll - under house arrest since July 1989. The regime, which is also engaged in a battle with insurgents near its eastern border with Thailand, ignored a 1990 election victory by an opposition party and is detaining its leader, Ms Aung San Suu Kyi, who was awarded the 1991 Nobel Peace Prize. According to the British Red Cross, 5,000 or more refugees, mainly the elderly and women and children, are crossing into Bangladesh each day.

  18. Question Answering (QA) • For a QA system to work, we need to • Find documents that may contain the answer • Form search engine query from original question • Find passages within the documents that may contain the answer • What is “type” of answer? • Determine what kind of answer is expected (query classification) • Extract the answer from the relevant passage(s) • Repeated occurrences may reinforce • Return the answer

  19. A Generic QA Framework • Passage extractor needed too

  20. The UWCLMAQA System

  21. Steps for the UWCLMAQA System • Query Analysis • Query Processing (some additional steps) • Document Selection • Passage Extraction & Ranking • Answer Extraction • “Unit” evaluation done at each step

  22. UIUC: http://l2r.cs.uiuc.edu/~cogcomp/Data/QA/QC/definition.html Query Analysis • Grouped questions into types • Purpose: Determine what the answer will look like • Categorized by enhanced UIUC scheme: • Abbreviation • Description • Entity • Human • Location – Country, State, City • Numeric – Date, Measure

  23. Alternative Strategy: Query Analysis and Rewrite • Intuition: The user’s question is often syntactically quite close to sentences that contain the answer • Where istheLouvreMuseumlocated? • TheLouvreMuseumislocated in Paris • Who createdthecharacterofScrooge? • Charles DickenscreatedthecharacterofScrooge.

  24. Alternative Strategy:Query Analysis and Rewrite • Hand-craft category-specific transformation rules e.g.: “Where is the Louvre Museum located?”  “is the Louvre Museum located”  “the is Louvre Museum located”  “the Louvre is Museum located”  “the Louvre Museum is located”  “the Louvre Museum located is” • Search for all permutations

  25. Query Processing • Basic process: • Extracted question • Appended topic • “Web boosted” query • Threw against Lucene

  26. Query Processing • Web boosting strategy • Supplied question and topic to Google API • Results were • Stop-worded, query terms removed • Ranked by frequency • 5 most frequent terms added to Lucene query

  27. Document Selection • Lucene returned top 1,000 documents • Took top 3 for Factoid, Top 25 for List • (Hook for reranking provided, but not implemented.) • Our doc retrieval performance for 2005 Qs: • F-measure - .3517 n=3, .3620 n=1 • Mean 2005: .2958 • Max (LCC) 2005: .7920

  28. Passage Extraction & Ranking • From top documents, extracted relevant paragraphs • Paragraphs ranked by tf/idf: • tf = 1+log(word frequency in paragraph) • idf = log(total doc count/# docs containing word) • total doc count = # docs by day by news source • tf/idf score normalized by paragraph length

  29. Passage Ranking • tf/idf multiplied by count of query terms in paragraph (giving them more weight) • 10 paragraphs returned for factoids • 45 paragraphs returned for lists

  30. Passage Extraction & Ranking Who won the Nobel Peace Prize in 1991? But many foreign investors remain sceptical, and western governments are withholding aid because of the Slorc's dismal human rights record and the continued detention of Ms Aung San SuuKyi, the opposition leader who won the Nobel Peace Prize in 1991. The military junta took power in 1988 as pro-democracy demonstrations were sweeping the country. It held elections in 1990, but has ignored their result. It has kept the 1991 Nobel peace prize winner, Aung San SuuKyi - leader of the opposition party which won a landslide victory in the poll - under house arrest since July 1989. The regime, which is also engaged in a battle with insurgents near its eastern border with Thailand, ignored a 1990 election victory by an opposition party and is detaining its leader, Ms Aung San SuuKyi, who was awarded the 1991 Nobel Peace Prize. According to the British Red Cross, 5,000 or more refugees, mainly the elderly and women and children, are crossing into Bangladesh each day.

  31. Answer Extraction • Most factoids need NP answer (e.g., most are NEs, such as countries, cities, dates, people’s names, company names, …) • All NPs considered as possible answers • For passages • Used Lingua::Stem to find sentences (sentence breaking) • POS tagged (Stanford POS Tagger) • Chunked using the fnTBLChunker (ID NP-chunks) • Prior query classification used to identify kind of NP answer expected • Some other heuristics (e.g., most likely place NP would occur)

  32. An Example Who won the Nobel Peace Prize in 1991? But many foreign investors remain sceptical, and western governments are withholding aid because of the Slorc's dismal human rights record and the continued detention of Ms Aung San Suu Kyi, the opposition leader who won the Nobel Peace Prize in 1991. The military junta took power in 1988 as pro-democracy demonstrations were sweeping the country. It held elections in 1990, but has ignored their result. It has kept the 1991 Nobel peace prize winner, Aung San Suu Kyi - leader of the opposition party which won a landslide victory in the poll - under house arrest since July 1989. The regime, which is also engaged in a battle with insurgents near its eastern border with Thailand, ignored a 1990 election victory by an opposition party and is detaining its leader, Ms Aung San Suu Kyi, who was awarded the 1991 Nobel Peace Prize. According to the British Red Cross, 5,000 or more refugees, mainly the elderly and women and children, are crossing into Bangladesh each day.

  33. Answer Extraction • For lists: • Question topic appeared the most important • Heavily weighted topic terms for Lucene • Similar process to Factoids (tagging, chunking) for finding answers • Cut-off determined by 2005 data

  34. Answer Extraction • For others: • Anything left over that might be answer bearing • Top 15 returned

  35. How’d we do? • Before answering the question: • Mean Reciprocal Rank (MRR)

  36. Mean Reciprocal Rank (MRR) • Assumes: test set of questions with human-labeled answers • Assumes: system returns short ranked list of answers or passages with answers • Answers scored with the sum of the reciprocal rank of the correct answers over total returned answers (for N questions)

  37. How’d we do? • Factoid • UWCLMAQA: .112 and .109 • Median: .186, Best: .578, Worst: .040 • List • UWCLMAQA: .051 and .046 • Median: .087, Best: .433, Worst: .000 • Other • UWCLMAQA: .164 and .153 • Median: .125, Best: .250, Worst: .000

  38. Full List of Tools Used • SGML::Parser::OpenSP http://search.cpan.org/~bjoern/SGML-Parser-OpenSP-0.98/ • OpenSP http://openjade.sourceforge.net/ • UIUC Question Classification http://l2r.cs.uiuc.edu/~cogcomp/Data/QA/QC/ • Lucene http://lucene.apache.org/java/docs/index.html • SAX (Simple API for XML) http://www.saxproject.org/ • Maxent Toolkit http://homepages.inf.ed.ac.uk/s0450736/maxent_toolkit.html • PyGoogle http://pygoogle.sourceforge.net/ • SOAPy http://soapy.sourceforge.net/ • Google API http://www.google.com/apis/ • Lingua::Stem http://search.cpan.org/~snowhare/Lingua-Stem-0.82/lib/Lingua/Stem/En.pm • Lingua::Sentence http://search.cpan.org/~shlomoy/Lingua-EN-Sentence-0.25/lib/Lingua/EN/Sentence.pm • Stanford POS Tagger http://nlp.stanford.edu/software/tagger.shtml • fnTBLChunker http://nlp.cs.jhu.edu/~rflorian/fntbl/ • Lingpipe http://www.alias-i.com/lingpipe/ • LevenshteinXS.pm http://search.cpan.org/~jgoldberg/Text-LevenshteinXS-0.03/

More Related