Answering Questions by Computer

Answering Questions by Computer

Terminology – Question Type • Question Type: an idiomatic categorization of questions for purposes of distinguishing between different processing strategies and/or answer formats • E.g. TREC2003 • FACTOID: “How far is it from Earth to Mars?” • LIST: “List the names of chewing gums” • DEFINITION: “Who is Vlad the Impaler?” • Other possibilities: • RELATIONSHIP: “What is the connection between Valentina Tereshkova and Sally Ride?” • SUPERLATIVE: “What is the largest city on Earth?” • YES-NO: “Is Saddam Hussein alive?” • OPINION: “What do most Americans think of gun control?” • CAUSE&EFFECT: “Why did Iraq invade Kuwait?” • …

Terminology – Answer Type • Answer Type: the class of object (or rhetorical type of sentence) sought by the question. E.g. • PERSON (from “Who …”) • PLACE (from “Where …”) • DATE (from “When …”) • NUMBER (from “How many …”) • … but also • EXPLANATION (from “Why …”) • METHOD (from “How …”) • … • Answer types are usually tied intimately to the classes recognized by the system’s Named Entity Recognizer.

Terminology – Question Focus • Question Focus: The property or entity that is being sought by the question. • E.g. • “In whatstateis the Grand Canyon?” • “What is thepopulationof Bulgaria?” • “Whatcolouris a pomegranate?”

Terminology – Question Topic • Question Topic: the object (person, place, …) or event that the question is about. The question might well be about a property of the topic, which will be the question focus. • E.g. “What is the height of Mt. Everest?” • height is the focus • Mt. Everest is the topic

Terminology – Candidate Passage • Candidate Passage: a text passage (anything from a single sentence to a whole document) retrieved by a search engine in response to a question. • Depending on the query and kind of index used, there may or may not be a guarantee that a candidate passage has any candidate answers. • Candidate passages will usually have associated scores, from the search engine.

Terminology – Candidate Answer • Candidate Answer: in the context of a question, a small quantity of text (anything from a single word to a sentence or bigger, but usually a noun phrase) that is of the same type as the Answer Type. • In some systems, the type match may be approximate, if there is the concept of confusability. • Candidate answers are found in candidate passages • E.g. • 50 • Queen Elizabeth II • September 8, 2003 • by baking a mixture of flour and water

Terminology – Authority List • Authority List (or File): a collection of instances of a class of interest, used to test a term for class membership. • Instances should be derived from an authoritative source and be as close to complete as possible. • Ideally, class is small, easily enumerated and with members with a limited number of lexical forms. • Good: • Days of week • Planets • Elements • Good statistically, but difficult to get 100% recall: • Animals • Plants • Colours • Problematic • People • Organizations • Impossible • All numeric quantities • Explanations and other clausal quantities

Essence of Text-based QA (Single source answers) • Need to find a passage that answers the question. • Find a candidate passage (search) • Check that semantics of passage and question match • Extract the answer

Ranking Candidate Answers Q066: Name the first private citizen to fly in space. • Answer type: Person • Text passage: “Among them was Christa McAuliffe, the first private citizen to fly in space. Karen Allen, best known for her starring role in “Raiders of the Lost Ark”, plays McAuliffe. Brian Kerwin is featured as shuttle pilot MikeSmith...”

Answer Extraction • Also called Answer Selection/Pinpointing • Given a question and candidate passages, the process of selecting and ranking candidate answers. • Usually, candidate answers are those terms in the passages which have the same answer type as that generated from the question • Ranking the candidate answers depends on assessing how well the passage context relates to the question • 3 Approaches: • Heuristic features • Shallow parse fragments • Logical proof

Features for Answer Ranking • Number of question terms matched in the answer passage • Number of question terms matched in the same phrase as the candidate answer • Number of question terms matched in the same sentence as the candidate answer • Flag set to 1 if the candidate answer is followed by a punctuation sign • Number of question terms matched, separated from the candidate answer by at most three words and one comma • Number of terms occurring in the same order in the answer passage as in the question • Average distance from candidate answer to question term matches SIGIR ‘01

Heuristics for Answer Ranking in the Lasso System • Same_Word_Sequence_score – number of words from the question that are recognized in the same sequence in the passage. • Punctuation_sign_score – a flag set to 1 if the candidate answer is followed by a punctuation sign • Comma_3_word_score – measure the number of question words that follow the candidate, if the candidate is followed by a coma. • Same_parse_subtree_score – number of question words found in the parse sub-tree of the answer • Same_sentence_score – number of question words found in the answer’s sentence. • Distance – score – adds the distance (measured in number of words) between the answer candidate and the other keywords in the window.

Heuristics for Answer Ranking in the Lasso Systemcontinued • Finally..

Evaluation • Evaluation of this kind of system is usually based on some kind of TREC-like metric. • In Q/A the most frequent metric is • Mean reciprocal rank You’re allowed to return N answers. Your score is based on 1/Rank of the first right answer. Averaged over all the questions you answer.

Answer Types and Modifiers Name 5 French Cities • Most likely there is no type for “French Cities” • So will look for CITY • include “French/France” in bag of words, and hope for the best • include “French/France” in bag of words, retrieve documents, and look for evidence (deep parsing, logic) • use high-precision Language Identification on results • If you have a list of French cities, could either • Filter results by list • Use Answer-Based QA (see later) • Use longitude/latitude information of cities and countries

Answer Types and Modifiers Name a female figure skater • Most likely there is no type for “female figure skater” • Most likely there is no type for “figure skater” • Look for PERSON, with query terms {figure, skater} • What to do about “female”? Two approaches. • Include “female” in the bag-of-words. • Relies on logic that if “femaleness” is an interesting property, it might well be mentioned in answer passages. • Does not apply to, say “singer”. • Leave out “female” but test candidate answers for gender. • Needs either an authority file or a heuristic test. • Test may not be definitive.

Part II - Specific Approaches • By Genre • Statistical QA • Pattern-based QA • Web-based QA • Answer-based QA (TREC only) • By System • SMU • LCC • USC-ISI • Insight • Microsoft • IBM Statistical • IBM Rule-based

Statistical QA • Use statistical distributions to model likelihoods of answer type and answer • E.g. IBM (Ittycheriah, 2001) – see later section

Pattern-based QA • For a given question type, identify the typical syntactic constructions used in text to express answers to such questions • Typically very high precision, but a lot of work to get decent recall

Web-Based QA • Exhaustive string transformations • Brill et al. 2002 • Learning • Radev et al. 2001

Answer-Based QA • Problem: Sometimes it is very easy to find an answer to a question using resource A, but the task demands that you find it in resource B. • Solution: First find the answer in resource A, then locate the same answer, along with original question terms, in resource B. • Artificial problem, but real for TREC participants.

Answer-Based QA • Web-Based solution: When a QA system looks for answers within a relatively small textual collection, the chance of finding strings/sentences that closely match the question string is small. However, when a QA system looks for strings/sentences that closely match the question string on the web, the chance of finding correct answer is much higher. Hermjakob et al. 2002 • Why this is true: • The Web is much larger than the TREC Corpus (3,000 : 1) • TREC questions are generated from Web logs, and the style of language (and subjects of interest) in these logs are more similar to the Web content than to newswire collections.

Answer-Based QA • Database/Knowledge-base/Ontology solution: • When question syntax is simple and reliably recognizable, can express as a logical form • Logical form represents entire semantics of question, and can be used to access structured resource: • WordNet • On-line dictionaries • Tables of facts & figures • Knowledge-bases such as Cyc • Having found answer • construct a query with original question terms + answer • Retrieve passages • Tell Answer Extraction the answer it is looking for

Approaches of Specific Systems • SMU Falcon • LCC • USC-ISI • Insight • Microsoft • IBM Note: Some of the slides and/or examples in these sections are taken from papers or presentations from the respective system authors

SMU Falcon Harabagiu et al. 2000

SMU Falcon • From question, dependency structure called question semantic form is created • Query is Boolean conjunction of terms • From answer passages that contain at least one instance of answer type, generate answer semantic form • 3 processing loops: • Loop 1 • Triggered when too few or too many passages are retrieved from search engine • Loop 2 • Triggered when question semantic form and answer semantic form cannot be unified • Loop 3 • Triggered when unable to perform abductive proof of answer correctness

SMU Falcon • Loops provide opportunities to perform alternations • Loop 1: morphological expansions and nominalizations • Loop 2: lexical alternations – synonyms, direct hypernyms and hyponyms • Loop 3: paraphrases • Evaluation (Pasca & Harabagiu, 2001). Increase in accuracy in 50-byte task in TREC9 • Loop 1: 40% • Loop 2: 52% • Loop 3: 8% • Combined: 76%

LCC • Moldovan & Rus, 2001 • Uses Logic Prover for answer justification • Question logical form • Candidate answers in logical form • XWN glosses • Linguistic axioms • Lexical chains • Inference engine attempts to verify answer by negating question and proving a contradiction • If proof fails, predicates in question are gradually relaxed until proof succeeds or associated proof score is below a threshold.

LCC: Lexical Chains Q:1518 What year did Marco Polo travel to Asia? Answer:Marco polo divulged the truth after returning in 1292 from his travels, which included several months on Sumatra Lexical Chains: (1) travel_to:v#1 -> GLOSS -> travel:v#1 -> RGLOSS -> travel:n#1 (2) travel_to#1 -> GLOSS -> travel:v#1 -> HYPONYM -> return:v#1 (3) Sumatra:n#1 -> ISPART -> Indonesia:n#1 -> ISPART -> Southeast _Asia:n#1 -> ISPART -> Asia:n#1 Q:1570 What is the legal age to vote in Argentina? Answer:Voting is mandatory for all Argentines aged over 18. Lexical Chains: (1) legal:a#1 -> GLOSS -> rule:n#1 -> RGLOSS -> mandatory:a#1 (2) age:n#1 -> RGLOSS -> aged:a#3 (3) Argentine:a#1 -> GLOSS -> Argentina:n#1

LCC: Logic Prover • Question • Which company created the Internet Browser Mosaic? • QLF: (_organization_AT(x2) ) & company_NN(x2) & create_VB(e1,x2,x6) & Internet_NN(x3) & browser_NN(x4) & Mosaic_NN(x5) & nn_NNC(x6,x3,x4,x5) • Answer passage • ... Mosaic , developed by the National Center for Supercomputing Applications ( NCSA ) at the University of Illinois at Urbana - Champaign ... • ALF: ... Mosaic_NN(x2) & develop_VB(e2,x2,x31) & by_IN(e2,x8) & National_NN(x3) & Center_NN(x4) & for_NN(x5) & Supercomputing_NN(x6) & application_NN(x7) & nn_NNC(x8,x3,x4,x5,x6,x7) & NCSA_NN(x9) & at_IN(e2,x15) & University_NN(x10) & of_NN(x11) & Illinois_NN(x12) & at_NN(x13) & Urbana_NN(x14) & nn_NNC(x15,x10,x11,x12,x13,x14) & Champaign_NN(x16) ... • Lexical Chains develop <-> make and make <->create • exists x2 x3 x4 all e2 x1 x7 (develop_vb(e2,x7,x1) <-> make_vb(e2,x7,x1) & something_nn(x1) & new_jj(x1) & such_jj(x1) & product_nn(x2) & or_cc(x4,x1,x3) & mental_jj(x3) & artistic_jj(x3) & creation_nn(x3)). • all e1 x1 x2 (make_vb(e1,x1,x2) <-> create_vb(e1,x1,x2) & manufacture_vb(e1,x1,x2) & man-made_jj(x2) & product_nn(x2)). • Linguistic axioms • all x0 (mosaic_nn(x0) -> internet_nn(x0) & browser_nn(x0))

USC-ISI • Textmap system • Ravichandran and Hovy, 2002 • Hermjakob et al. 2003 • Use of Surface Text Patterns • When was X born -> • Mozart was born in 1756 • Gandhi (1869-1948) Can be captured in expressions • <NAME> was born in <BIRTHDATE> • <NAME> (<BIRTHDATE> - • These patterns can be learned

USC-ISI TextMap • Use bootstrapping to learn patterns. • For an identified question type (“When was X born?”), start with known answers for some values of X • Mozart 1756 • Gandhi 1869 • Newton 1642 • Issue Web search engine queries (e.g. “+Mozart +1756” ) • Collect top 1000 documents • Filter, tokenize, smooth etc. • Use suffix tree constructor to find best substrings, e.g. • Mozart (1756-1791) • Filter • Mozart (1756- • Replace query strings with e.g. <NAME> and <ANSWER> • Determine precision of each pattern • Find documents with just question term (Mozart) • Apply patterns and calculate precision

USC-ISI TextMap • Finding Answers • Determine Question type • Perform IR Query • Do sentence segmentation and smoothing • Replace question term by question tag • i.e. replace Mozart with <NAME> • Search for instances of patterns associated with question type • Select words matching <ANSWER> • Assign scores according to precision of pattern

Insight • Soubbotin, 2002. Soubbotin & Soubbotin, 2003. • Performed very well in TREC10/11 • Comprehensive and systematic use of “Indicative patterns” • E.g. • cap word; paren; 4 digits; dash; 4 digits; paren matches • Mozart (1756-1791) • The patterns are broader than named entities • “Semantics in syntax” • Patterns have intrinsic scores (reliability), independent of question

Insight • Patterns with more sophisticated internal structure are more indicative of answer • 2/3 of their correct entries in TREC10 were answered by patterns • E.g. • a == {countries} • b == {official posts} • w == {proper names (first and last)} • e == {titles or honorifics} • Patterns for “Who is the President (Prime Minister) of given country? • abeww • ewwdb,a • b,aeww • Definition questions: (A is primary query term, X is answer) • <A; comma; [a/an/the]; X; [comma/period]> • For: “Moulin Rouge, a cabaret” • <X; [comma]; [also] called; A [comma]> • For: “naturally occurring gas called methane” • <A; is/are; [a/an/the]; X> • For: “Michigan’s state flower is the apple blossom”

Insight • Emphasis on shallow techniques, lack of NLP • Look in vicinity of text string potentially matching pattern for “zeroing” – e.g. for occupational roles: • Former • Elect • Deputy • Negation • Comments: • Relies on redundancy of large corpus • Works for factoid question types of TREC-QA – not clear how it extends • Not clear how they match questions to patterns • Named entities within patterns have to be recognized

Microsoft • Data-Intensive QA. Brill et al. 2002 • “Overcoming the surface string mismatch between the question formulation and the string containing the answer” • Approach based on the assumption/intuition that someone on the Web has answered the question in the same way it was asked. • Want to avoid dealing with: • Lexical, syntactic, semantic relationships (bet. Q & A) • Anaphora resolution • Synonymy • Alternate syntax • Indirect answers • Take advantage of redundancy on Web, then project to TREC corpus (Answer-based QA)

Microsoft AskMSR • Formulate multiple queries – each rewrite has intrinsic score. E.g. for “What is relative humidity?” • [“+is relative humidity”, LEFT, 5] • [“relative +is humidity”, RIGHT, 5] • [“relative humidity +is”, RIGHT, 5] • [“relative humidity”, NULL, 2] • [“relative” AND “humidity”, NULL, 1] • Get top 100 documents from Google • Extract n-grams from document summaries • Score n-grams by summing the scores of the rewrites it came from • Use tiling to merge n-grams • Search for supporting documents in TREC corpus

Microsoft AskMSR • Question is: “What is the rainiest place on Earth” • Answer from Web is: “Mount Waialeale” • Passage in TREC corpus is: “… In misty Seattle, Wash., last year, 32 inches of rain fell. Hong Kong gets about 80 inches a year, and even Pago Pago, noted for its prodigious showers, gets only about 196 inches annually. (The titleholder, according to the National Geographic Society, is Mount Waialeale in Hawaii, where about 460 inches of rain falls each year.) …” • Very difficult to imagine getting this passage by other means

IBM Statistical QA (Ittycheriah, 2001) q = question a = answer c = “correctness” e = answer type p(c|q,a) = Se p(c,e|q,a) = Se p(c|e,q,a) p(e|q,a) • ATM predicts, from the question and a proposed answer, the answer type they both satisfy • Given a question, an answer, and the predicted answer type, ASM seeks to model the correctness of this configuration. • Distributions are modelled using a maximum entropy formulation • Training data = human judgments • For ATM, 13K questions annotated with 31 categories • For ASM, ~ 5K questions from TREC plus trivia p(e|q,a)is the answer type model (ATM) p(c|e,q,a)is the answer selection model (ASM)

IBM Statistical QA (Ittycheriah) • Question Analysis (by ATM) • Selects one out of 31 categories • Search • Question expanded by Local Context Analysis • Top 1000 documents retrieved • Passage Extraction: Top 100 passages that: • Maximize question word match • Have desired answer type • Minimize dispersion of question words • Have similar syntactic structure to question • Answer Extraction: • Candidate answers ranked using ASM

IBM Rule-based Predictive Annotation (Prager 2000, Prager 2003) • Want to make sure passages retrieved by search engine have at least one candidate answer • Recognize that candidate answer is of correct answer type which corresponds to a label (or several) generated by Named Entity Recognizer • Annotate entire corpus and index semantic labels along with text • Identify answer types in questions and include corresponding labels in queries

SPORT$ have be invent by PERSON$ baseball Doubleday IBM PIQUANT Predictive Annotation – • E.g.: Question is “Who invented baseball?” • “Who” can map to PERSON$ or ORGANIZATION$ • Suppose we assume only people invent things (it doesn’t really matter). • So “Who invented baseball?” -> {PERSON$ invent baseball} Consider text“… but its conclusion was based largely on the recollections of a man named Abner Graves, an elderly mining engineer, who reported that baseball had been "invented" by Doubleday between 1839 and 1841. ”

SPORT$ have be invent by PERSON$ baseball Doubleday IBM PIQUANT Predictive Annotation – • Previous example • “Who invented baseball?” -> {PERSON$ invent baseball} • However, same structure is equally effective at answering • “What sport did Doubleday invent?”->{SPORT$ invent Doubleday}

IBM Rule-Based Handling Subsumption & Disjunction • If an entity is of a type which has a parent type, then how is annotation done? • If a proposed answer type has a parent type, then what answer type should be used? • If an entity is ambiguous then what should the annotation be? • If the answer type is ambiguous, then what should be used? Guidelines: • If an entity is of a type which has a parent type, then how is annotation done? • If a proposed answer type has a parent type, then what answer type should be used? • If an entity is ambiguous then what should the annotation be? • If the answer type is ambiguous, then what should be used?

Subsumption & Disjunction • Consider New York City – both a CITY and a PLACE • To answer “Where did John Lennon die?”, it needs to be a PLACE • To answer “In what city is the Empire State Building?”, it needs to be a CITY. • Do NOT want to do subsumption calculation in search engine • Two scenarios 1. Expand Answer Type and use most specific entity annotation 1A { (CITY PLACE) John_Lennon die} matches CITY 1B {CITY Empire_State_Building} matches CITY Or 2. Use most specific Answer Type and multiple annotations of NYC 2A {PLACE John_Lennon die} matches (CITY PLACE) 2B {CITY Empire_State_Building} matches (CITY PLACE) • Case 2 preferred for simplicity, because disjunction in #1 should contain all hyponyms of PLACE, while disjunction in #2 should contain all hypernyms of CITY • Choice #2 suggests can use disjunction in answer type to represent ambiguity: • “Who invented the laser”->{(PERSON ORGANIZATION) invent laser}

Clausal classes • Any structure that can be recognized in text can be annotated. • Quotations • Explanations • Methods • Opinions • … • Any semantic class label used in annotation can be indexed, and hence used as a target of search: • What did Karl Marx say about religion? • Why is the sky blue? • How do you make bread? • What does Arnold Schwarzenegger think about global warming? • …

Named Entity Recognition

IBM Predictive Annotation – Improving Precision at no cost to Recall • E.g.: Question is “Where is Belize?” • “Where” can map to (CONTINENT$, WORLDREGION$, COUNTRY$, STATE$, CITY$, CAPITAL$, LAKE$, RIVER$ … ). • But we know Belize is a country. • So “Where is Belize?” -> {(CONTINENT$ WORLDREGION$) Belize} • Belize occurs 1068 times in TREC corpus • Belize and PLACE$ co-occur in only 537 sentences • Belize and CONTINENT$ or WORLDREGION$ co-occur in only 128 sentences

Answering Questions by Computer

Answering Questions by Computer

Presentation Transcript

Answering exam questions

Answering new questions

ANSWERING MORE QUESTIONS

Answering Questions

Answering Questions

Answering Aural Questions

Answering Essay Questions

Answering Essay Questions

Answering Difficult Questions

Answering Aural Questions

Answering exam questions

Answering Short Questions

Answering Essential Questions

Answering research questions

Answering Portuguese Questions

Answering Questions

Answering Questions

Answering Tough Questions

Answering Conclusion Questions

Answering Questions

Answering English Questions by Computer

ANSWERING MORE QUESTIONS