Open-Domain Question Answering

Open-Domain Question Answering Roxana Girju Linguistics Department Computer Science Department (Affiliate Prof.) University of Illinois at Urbana-Champaign

A little bit of history • Machine Translation (Cold War) • Need for systems capable of automatic translation of Russian into English • Late 60’s, some encouraging preliminary results • Simple speech recognition programs • System that reads newswire articles about natural disasters • So forth. • Today, a lot of progress: • Speech recognition products • More efficient NLP systems • the beginning of the widespread commercial use of natural language technology

What is NLP? • an area that is concerned with the creation of computer systems for the purpose of performing various NL tasks: • NL interfaces to DBs • Reading/writing a book • Writing a letter • Holding a conversation • Machine Translation • Searching for useful and hidden information • very broad range of applications • plays an increasing role in curbing the information on Internet.

What can NLP do today “Bush/NNP/_president seeks/VB support/NN on/Pp Iraq/NNP War/NNP” Fox News, Aug. 30, 2002 • reliable surface-level preprocessing (POS tagging, shallow syntactic parsing (English), NE extraction, etc.): 90% • Summarization: ~60% for extracts (DUC competitions) • QA: ~60-70% for factoids (TREC competitions)

So, why is NLP so hard? • Ambiguity: • World knowledge • Culture differences • Context • NLP has a multidisciplinary nature: • Linguistics • Psycholinguistics / cognitive psychology • Computational linguistics • Philosophy • Artificial intelligence • NL Engineering

Question Answering: An NLP Application • Provide a (short) answer to the user’s natural language question by searching massive collections of text documents.

What do people want? • To get concise answers and not a list of documents; • Examples from Excite log (1999): • How tall is the Sears Tower? • Where does chocolate come from? • What is the best Japanese restaurant in Chicago?

AskJeeves • Manually stored question-answer pairs based on frequency; • Mainly pattern matching to match user’s question to the QA knowledge base; • If no answer in the knowledge base, use pure web search techniques; • Initially a very interesting idea, but pretty weak as almost no NLP techniques were applied; AskJeeves

Question Answering Not a new research topic, but open-domain QA: TREC QA competitions (since 1999); Examples from TREC: 1.What was the monetary value of the Nobel Peace Prize in 1989? 2. What does the Peugeot company manufacture? 3. Why did David Koresh ask the FBI for a word processor? 4. What is the name of the rare neurological disease with symptoms such as: involuntary movements (tics), swearing, and incoherent vocalizations (grunts, shouts, etc.)? 5. How did Socrates die?

TREC Question Answering Competitions • QA systems have to answer a set of factoid questions (~500) • In the first three years, the systems were asked to return 5 ranked text passages (50/250 bytes) to each question: • Mean Reciprocal Rank scoring: • 1, 0.5, 0.33, 0.25, 0.2, 0 (for 1st, 2nd, 3rd, 4th, 5th, rest); • Starting 2002, a single exact answer was required based on the notion of confidence;

TREC QA – Answer Ranking Q: When did Lincoln die? A: • During the civil war • In the spring time • at a theatre • April 15, 1865 *** • In April; 1965 • MRR =1/4 = 0.25

The TREC QA Text Collection • 3GB of text; • News articles from various sources: • New York Times (1998 – 2000); • AP newswire (1998 – 2000); • Xinhua News Agency newswire (1998 – 2000); • Requires Information Retrieval techniques before applying NLP; • Other information sources can be used;

State-of-the-art in Open-domain QA • The best performing systems: 70% of the questions are answered; • Various approaches: • Knowledge-rich (Texas system): • http://www.languagecomputer.com • Pattern-matching (MIT): • http://start.csail.mit.edu • Statistical QA (AnswerBus): • http://www.answerbus.com

General Approach • Question analysis and parsing • Query formulation for an IR system (search engine) • Retrieval of ranked documents • Retrieval of passages • Application of NLP techniques • Snippet ranking based on NLP processing

The Architecture of a Generic QA System answers Passage Retrieval Answer Extr. Question Process. question query Document Retrieval The examples in the next slides are for the LCC’s QA system;

Question Processing • Captures the semantics of the question; • Tasks: • Determine the question type • Determining the answer type • Extract keywords from the question and formulate a query

Question and Potential Answer Types • What is the question looking for? • Who, where, when, how many, etc: • For factoid questions, the answers usually ask for a somewhat predictable set of categories • Who questions ask for PERSON /ORG: • Who invented the telegraph? • Who sells the most hybrid cars? • Where questions ask for LOCATION; • Generally, systems rely on a set of Named Entities relatively easy to extract

Question and Potential Answer Types • However, not always obvious: • Which president went to war with Mexico? • What type of car is most reliable? • How tall is Mt. Everest?

Taxonomy of Potential Answer Types • Contains ~9000 concepts reflecting expected answer types • Merges named entities with the WordNet hierarchy LCC’s taxonomy;

Potential Answer Type Detection • Most systems use a look-up table (simple manual patterns) and/or supervised machine learning to determine the right answer type for a question. • It is very important later on for answer matching.

Keyword Selection • List of keywords in the question to help in finding relevant texts • Some systems expanded them with lexical/semantic alternations for better matching: • inventor -> invent • have been sold -> sell • dog -> animal

Keyword Selection Examples:

Keyword Selection Some QA systems focus on some words based on importance: • non-stopwords in quotations • all NNP words in recognized NEs • all complex nominals (plus adjectives) • all other nouns • all verbs (don’t focus on tense) • potential answer type

The Architecture of a Generic QA System answers Passage Retrieval Answer Extr. Question Process. question query Document Retrieval The examples in the next slides are from the LCC’s QA system;

Passage Retrieval • Extracts passages that contain all selected keywords • Passage quality based on loops: • In the first iteration use the first 6 keyword selection heuristics • If no. passages < a threshold  query is too strict  drop a keyword • If no. passages > a threshold  query is too relaxed add a keyword

Ranking of Passages • Use keyword windows • Eg: Q has {k1, k2, k3, k4} keywords, and in a passage: • k1 and k2 are matched twice, k3 is matched once, and k4 is not matched, the following windows are constructed: Window 1 Window 2 k1 k2 k3 k2 k1 k1 k2 k3 k2 k1 Window 3 Window 4 k1 k2 k3 k2 k1 k1 k2 k3 k2 k1

Passage Scoring • Function of: • The number of question keywords recognized in the same sequence in the window • The number of keywords that separate the most distant keywords in the window • The number of unmatched keywords in the window

The Architecture of a Generic QA System answers Passage Retrieval Answer Extr. Question Process. question query Document Retrieval The examples in the next slides are from the LCC’s QA system;

Answer Extraction Q: Name the first private citizen to fly in space. • Answer type: PERSON • Text passage: “Among them was Christa McAuliffe, the first private citizen to fly in space. Karen Allen, best known for her starring role in “Raiders of the Lost Ark”, plays McAuliffe. Brian Kerwin is featured as shuttle pilot MikeSmith...”

Ranking Candidate Answers Q: Name the first private citizen to fly in space. • Answer type: Person • Text passage: “Among them was Christa McAuliffe, the first private citizen to fly in space. Karen Allen, best known for her starring role in “Raiders of the Lost Ark”, plays McAuliffe. Brian Kerwin is featured as shuttle pilot MikeSmith...” • Best candidate answer: Christa McAuliffe

Answer Ranking Features • Number of question terms matched in the answer passage • Number of question terms matched in the same phrase as the candidate answer • Number of question terms matched in the same sentence as the candidate answer • Number of question terms matched, separated from the candidate answer by at most three words and one comma • Number of terms occurring in the same order in the answer passage as in the question • Average distance from candidate answer to question term matches

Evaluation • In Q/A the most frequent metric is • Mean Reciprocal Rank You’re allowed to return N answers. Your score is based on 1/Rank of the first right answer. Averaged over all the questions you answer.

LCC/UTD/SMU

Limitations • Where do lobsters like to live? • on a Canadian airline • Where are zebras most likely found? • near dumps • in the dictionary • Why can't ostriches fly? • Because of American economic sanctions • What’s the population of Mexico? • Three • What can trigger an allergic reaction? • ..something that can trigger an allergic reaction

A Broad Classification of Questions • Yes/No questions • Least commonly occurring type • Factoid questions • Who? What? Where? When? • Easy to evaluate • Usually based on named entities • List questions • List of partial answers (usually facts);

Answer Fusion Knowledge intensive applications Q: What software products does Microsoft sell? (Girju 2001)

A Broad Classification of Questions • Definition questions • Harder to evaluate • Q: Who is Aaron Copland? • A: American composer or civil rights advocate • Complex questions: • Why? How? Etc. • The answer can be expressed in several ways; • The answer may be scattered among several paragraphs/documents; • The hardest type to process/analyze;

Complex Question Answering Q: “Why do robins sing in Spring?” (Girju 2002) • A1: Causation (What is the cause?) “.. increases in day length trigger hormonal action.” • A2: Development (How does it develop?) “.. they have learned the songs from their fathers and neighbors.” • A3: Origin (How did it evolve?) “The song evolved as a means of communication early in the avian lineage.” • A4: Function/Purpose (What is the function?) “Robins sing in spring to attract mates.”

Conclusion • Question answering is an exciting research area! • Somewhere between Information Retrieval and Natural Language Processing • A real-world application of NLP technologies • Open-domain QA is possible but there is a lot of room for improvement

Thank you!

Open-Domain Question Answering

Open-Domain Question Answering

Presentation Transcript

Question-Answering

PSSA Prep Answering an Open-Ended Question

Question Answering

Question AnswerinG

The Informative Role of WordNet in Open-Domain Question Answering

Open Domain Question Answering: Techniques, Resources and Systems

Open-Domain Question Answering

Structured Use of External Knowledge for Event-based Open Domain Question Answering

Question Answering Tutorial

Question Answering Technologies

Question Answering

Question Answering

Exploring the Performance of Boolean Retrieval Strategies for Open Domain Question Answering

Question Answering

Question Answering

Automatic Answer Validation in Open -Domain Question Answering

Question Answering

Question Answering

Question Answering

Question Answering (Open-Domain) (modified lecture from E. Riloff’s webpage)