1 / 40

Open-Domain Question Answering

Open-Domain Question Answering. Roxana Girju Linguistics Department Computer Science Department (Affiliate Prof.) University of Illinois at Urbana-Champaign. A little bit of history. Machine Translation (Cold War) Need for systems capable of automatic translation of Russian into English

haru
Download Presentation

Open-Domain Question Answering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Open-Domain Question Answering Roxana Girju Linguistics Department Computer Science Department (Affiliate Prof.) University of Illinois at Urbana-Champaign

  2. A little bit of history • Machine Translation (Cold War) • Need for systems capable of automatic translation of Russian into English • Late 60’s, some encouraging preliminary results • Simple speech recognition programs • System that reads newswire articles about natural disasters • So forth. • Today, a lot of progress: • Speech recognition products • More efficient NLP systems • the beginning of the widespread commercial use of natural language technology

  3. What is NLP? • an area that is concerned with the creation of computer systems for the purpose of performing various NL tasks: • NL interfaces to DBs • Reading/writing a book • Writing a letter • Holding a conversation • Machine Translation • Searching for useful and hidden information • very broad range of applications • plays an increasing role in curbing the information on Internet.

  4. What can NLP do today “Bush/NNP/_president seeks/VB support/NN on/Pp Iraq/NNP War/NNP” Fox News, Aug. 30, 2002 • reliable surface-level preprocessing (POS tagging, shallow syntactic parsing (English), NE extraction, etc.): 90% • Summarization: ~60% for extracts (DUC competitions) • QA: ~60-70% for factoids (TREC competitions)

  5. So, why is NLP so hard? • Ambiguity: • World knowledge • Culture differences • Context • NLP has a multidisciplinary nature: • Linguistics • Psycholinguistics / cognitive psychology • Computational linguistics • Philosophy • Artificial intelligence • NL Engineering

  6. Question Answering: An NLP Application • Provide a (short) answer to the user’s natural language question by searching massive collections of text documents.

  7. What do people want? • To get concise answers and not a list of documents; • Examples from Excite log (1999): • How tall is the Sears Tower? • Where does chocolate come from? • What is the best Japanese restaurant in Chicago?

  8. AskJeeves • Manually stored question-answer pairs based on frequency; • Mainly pattern matching to match user’s question to the QA knowledge base; • If no answer in the knowledge base, use pure web search techniques; • Initially a very interesting idea, but pretty weak as almost no NLP techniques were applied; AskJeeves

  9. Question Answering Not a new research topic, but open-domain QA: TREC QA competitions (since 1999); Examples from TREC: 1.What was the monetary value of the Nobel Peace Prize in 1989? 2. What does the Peugeot company manufacture? 3. Why did David Koresh ask the FBI for a word processor? 4. What is the name of the rare neurological disease with symptoms such as: involuntary movements (tics), swearing, and incoherent vocalizations (grunts, shouts, etc.)? 5. How did Socrates die?

  10. TREC Question Answering Competitions • QA systems have to answer a set of factoid questions (~500) • In the first three years, the systems were asked to return 5 ranked text passages (50/250 bytes) to each question: • Mean Reciprocal Rank scoring: • 1, 0.5, 0.33, 0.25, 0.2, 0 (for 1st, 2nd, 3rd, 4th, 5th, rest); • Starting 2002, a single exact answer was required based on the notion of confidence;

  11. TREC QA – Answer Ranking Q: When did Lincoln die? A: • During the civil war • In the spring time • at a theatre • April 15, 1865 *** • In April; 1965 • MRR =1/4 = 0.25

  12. The TREC QA Text Collection • 3GB of text; • News articles from various sources: • New York Times (1998 – 2000); • AP newswire (1998 – 2000); • Xinhua News Agency newswire (1998 – 2000); • Requires Information Retrieval techniques before applying NLP; • Other information sources can be used;

  13. State-of-the-art in Open-domain QA • The best performing systems: 70% of the questions are answered; • Various approaches: • Knowledge-rich (Texas system): • http://www.languagecomputer.com • Pattern-matching (MIT): • http://start.csail.mit.edu • Statistical QA (AnswerBus): • http://www.answerbus.com

  14. General Approach • Question analysis and parsing • Query formulation for an IR system (search engine) • Retrieval of ranked documents • Retrieval of passages • Application of NLP techniques • Snippet ranking based on NLP processing

  15. The Architecture of a Generic QA System answers Passage Retrieval Answer Extr. Question Process. question query Document Retrieval The examples in the next slides are for the LCC’s QA system;

  16. Question Processing • Captures the semantics of the question; • Tasks: • Determine the question type • Determining the answer type • Extract keywords from the question and formulate a query

  17. Question and Potential Answer Types • What is the question looking for? • Who, where, when, how many, etc: • For factoid questions, the answers usually ask for a somewhat predictable set of categories • Who questions ask for PERSON /ORG: • Who invented the telegraph? • Who sells the most hybrid cars? • Where questions ask for LOCATION; • Generally, systems rely on a set of Named Entities relatively easy to extract

  18. Question and Potential Answer Types • However, not always obvious: • Which president went to war with Mexico? • What type of car is most reliable? • How tall is Mt. Everest?

  19. Taxonomy of Potential Answer Types • Contains ~9000 concepts reflecting expected answer types • Merges named entities with the WordNet hierarchy LCC’s taxonomy;

  20. Potential Answer Type Detection • Most systems use a look-up table (simple manual patterns) and/or supervised machine learning to determine the right answer type for a question. • It is very important later on for answer matching.

  21. Keyword Selection • List of keywords in the question to help in finding relevant texts • Some systems expanded them with lexical/semantic alternations for better matching: • inventor -> invent • have been sold -> sell • dog -> animal

  22. Keyword Selection Examples:

  23. Keyword Selection Some QA systems focus on some words based on importance: • non-stopwords in quotations • all NNP words in recognized NEs • all complex nominals (plus adjectives) • all other nouns • all verbs (don’t focus on tense) • potential answer type

  24. The Architecture of a Generic QA System answers Passage Retrieval Answer Extr. Question Process. question query Document Retrieval The examples in the next slides are from the LCC’s QA system;

  25. Passage Retrieval • Extracts passages that contain all selected keywords • Passage quality based on loops: • In the first iteration use the first 6 keyword selection heuristics • If no. passages < a threshold  query is too strict  drop a keyword • If no. passages > a threshold  query is too relaxed add a keyword

  26. Ranking of Passages • Use keyword windows • Eg: Q has {k1, k2, k3, k4} keywords, and in a passage: • k1 and k2 are matched twice, k3 is matched once, and k4 is not matched, the following windows are constructed: Window 1 Window 2 k1 k2 k3 k2 k1 k1 k2 k3 k2 k1 Window 3 Window 4 k1 k2 k3 k2 k1 k1 k2 k3 k2 k1

  27. Passage Scoring • Function of: • The number of question keywords recognized in the same sequence in the window • The number of keywords that separate the most distant keywords in the window • The number of unmatched keywords in the window

  28. The Architecture of a Generic QA System answers Passage Retrieval Answer Extr. Question Process. question query Document Retrieval The examples in the next slides are from the LCC’s QA system;

  29. Answer Extraction Q: Name the first private citizen to fly in space. • Answer type: PERSON • Text passage: “Among them was Christa McAuliffe, the first private citizen to fly in space. Karen Allen, best known for her starring role in “Raiders of the Lost Ark”, plays McAuliffe. Brian Kerwin is featured as shuttle pilot MikeSmith...”

  30. Ranking Candidate Answers Q: Name the first private citizen to fly in space. • Answer type: Person • Text passage: “Among them was Christa McAuliffe, the first private citizen to fly in space. Karen Allen, best known for her starring role in “Raiders of the Lost Ark”, plays McAuliffe. Brian Kerwin is featured as shuttle pilot MikeSmith...” • Best candidate answer: Christa McAuliffe

  31. Answer Ranking Features • Number of question terms matched in the answer passage • Number of question terms matched in the same phrase as the candidate answer • Number of question terms matched in the same sentence as the candidate answer • Number of question terms matched, separated from the candidate answer by at most three words and one comma • Number of terms occurring in the same order in the answer passage as in the question • Average distance from candidate answer to question term matches

  32. Evaluation • In Q/A the most frequent metric is • Mean Reciprocal Rank You’re allowed to return N answers. Your score is based on 1/Rank of the first right answer. Averaged over all the questions you answer.

  33. LCC/UTD/SMU

  34. Limitations • Where do lobsters like to live? • on a Canadian airline • Where are zebras most likely found? • near dumps • in the dictionary • Why can't ostriches fly? • Because of American economic sanctions • What’s the population of Mexico? • Three • What can trigger an allergic reaction? • ..something that can trigger an allergic reaction

  35. A Broad Classification of Questions • Yes/No questions • Least commonly occurring type • Factoid questions • Who? What? Where? When? • Easy to evaluate • Usually based on named entities • List questions • List of partial answers (usually facts);

  36. Answer Fusion Knowledge intensive applications Q: What software products does Microsoft sell? (Girju 2001)

  37. A Broad Classification of Questions • Definition questions • Harder to evaluate • Q: Who is Aaron Copland? • A: American composer or civil rights advocate • Complex questions: • Why? How? Etc. • The answer can be expressed in several ways; • The answer may be scattered among several paragraphs/documents; • The hardest type to process/analyze;

  38. Complex Question Answering Q: “Why do robins sing in Spring?” (Girju 2002) • A1: Causation (What is the cause?) “.. increases in day length trigger hormonal action.” • A2: Development (How does it develop?) “.. they have learned the songs from their fathers and neighbors.” • A3: Origin (How did it evolve?) “The song evolved as a means of communication early in the avian lineage.” • A4: Function/Purpose (What is the function?) “Robins sing in spring to attract mates.”

  39. Conclusion • Question answering is an exciting research area! • Somewhere between Information Retrieval and Natural Language Processing • A real-world application of NLP technologies • Open-domain QA is possible but there is a lot of room for improvement

  40. Thank you!

More Related