SIMS 290-2: Applied Natural Language Processing

SIMS 290-2: Applied Natural Language Processing Marti Hearst November 17, 2004

Today • Using WordNet in QA • Other Resources in QA • Semantic Reasoning in QA • Definition Questions • Complex Questions

Question Answering • Captures the semantics of the question by recognizing • expected answer type (i.e., its semantic category) • relationship between the answer type and the question concepts/keywords • The Q/A process: • Question processing – Extract concepts/keywords from question • Passage retrieval – Identify passages of text relevant to query • Answer extraction – Extract answer words from passage • Relies on standard IR and IE Techniques • Proximity-based features • Answer often occurs in text near to question keywords • Named-entity Recognizers • Categorize proper names into semantic types (persons, locations, organizations, etc) • Map semantic types to question types (“How long”, “Who”, “What company”) Adapted from slide by Shauna Eggers

The Importance of NER • The results of the past 5 TREC evaluations of QA systems indicate that current state-of-the-art QA is determined by the recognition of Named Entities • In TREC 2003 the LCC QA system extracted 289 correct answers for factoid questions • The Name Entity Recognizer was responsible for 234 of them Adapted from slide by Harabagiu and Narayanan

The Special Case of Names Questions asking for names of authored works Adapted from slide by Harabagiu and Narayanan

Problems • NE assumes all answers are named entities • Oversimplifies the generative power of language! • What about: “What kind of flowers did Van Gogh paint?” • Does not account well for morphological, lexical, and semantic alternations • Question terms may not exactly match answer terms; connections between alternations of Q and A terms often not documented in flat dictionary • Example: “When was Berlin’s Brandenburger Tor erected?”  no guarantee to match built • Recall suffers Adapted from slide by Shauna Eggers

LCC Approach:WordNet to the rescue! • WordNet can be used to inform all three steps of the Q/A process 1. Answer-type recognition (Answer Type Taxonomy) 2. Passage Retrieval (“specificity”constraints) 3. Answer extraction (recognition of keyword alternations) • Using WN’s lexico-semantic info: Examples • “What kind of flowers did Van Gogh paint?” • Answer-type recognition: need to know (a) answer is a kind of flower, and (b) sense of the word flower • WordNet encodes 470 hyponyms of flower sense #1, flowers as plants • Nouns from retrieved passages can be searched against these hyponyms • “When was Berlin’s Brandenburger Tor erected?” • Semantic alternation: erect is a hyponym of sense #1 of build Adapted from slide by Shauna Eggers

WN for Answer Type Recognition • Encodes 8707 English concepts to help recognize expected answer type • Mapping to parts of Wordnet done by hand • Can connect to Noun, Adj, and/or Verb subhierarchies

WN in Passage Retrieval • Identify relevant passages from text • Extract keywords from the question, and • Pass them to the retrieval module • “Specificity” – filtering question concepts/keywords • Focuses search, improves performance and precision • Question keywords can be omitted from the search if they are too general • Specificity calculated by counting the hyponyms of a given keyword in WordNet • Count ignores proper names and same-headed concepts • Keyword is thrown out if count is above a given threshold (currently 10) Adapted from slide by Shauna Eggers

WN in Answer Extraction If keywords alone cannot find an acceptable answer, look for alternations in WordNet! Adapted from slide by Shauna Eggers

Evaluation • Paşca/Harabagiu (NAACL’01 Workshop) measured approach using TREC-8 and TREC-9 test collections • WN contributions to Answer Type Recognition • Count number of questions for which acceptable answers were found; 3GB text collection, 893 questions Adapted from slide by Shauna Eggers

Evaluation • WN contributions to Passage Retrieval Impact of keyword alternations Impact of specificity knowledge Adapted from slide by Shauna Eggers

Going Beyond Word Matching • Use techniques from artificial intelligence to try to draw inferences from the meanings of the words • This is a highly unusual and ambitious approach. • Surprising it works at all! • Requires huge amounts of hand-coded information • Uses notions of proofs and inference from logic • All birds fly. Robins are birds. Thus, robins fly. • forall(X): bird(X) -> fly(x) • forall(X,Y): student(X), enrolled(X,Y) -> school(Y)

Inference via a Logic Prover • The LCC system attempts inference to justify an answer • Its inference engine is a kind of funny middle ground between logic and pattern matching • But quite effective: 30% improvement • Q: When was the internal combustion engine invented? • A: The first internal-combustion engine was built in 1867. • invent -> create_mentally -> create -> build Adapted from slides by Manning, Harabagiu, Kusmerick, ISI

COGEX • World knowledge from: • WordNet glosses converted to logic forms in the eXtended WordNet (XWN) project • Lexical chains • game:n#3  HYPERNYM  recreation:n#1  HYPONYM  sport:n#1 • Argentine:a#1  GLOSS  Argentina:n#1 • NLP axioms to handle complex NPs, coordinations, appositions, equivalence classes for prepositions etcetera • Named-entity recognizer • John Galt  HUMAN • A relaxation mechanism is used to iteratively uncouple predicates, remove terms from LFs. The proofs are penalized based on the amount of relaxation involved. Adapted from slide by Surdeanu and Pasca

Logic Inference Example • “How hot does the inside of an active volcano get?” • get(TEMPERATURE, inside(volcano(active))) • “lava fragments belched out of the mountain were as hot as 300 degrees Fahrenheit” • fragments(lava, TEMPERATURE(degrees(300)), belched(out, mountain)) • volcano ISA mountain • lava ISPARTOF volcano -> lava inside volcano • fragments of lava HAVEPROPERTIESOF lava • The needed semantic information is in WordNet definitions, and was successfully translated into a form that was used for rough ‘proofs’ Adapted from slides by Manning, Harabagiu, Kusmerick, ISI

Axiom Creation XWN Axioms A major source of world knowledge is a general purpose knowledge base of more than 50,000 parsed and disambiguated WordNet glosses that are transformed into logical form for use during the course of a proof. Gloss: Kill is to cause to die Logical Form: kill_VB_1(e1,x1,x2) -> cause_VB_1(e1,x1,x3) & to_TO(e1,e2) & die_VB_1(e2,x2,x4) Adapted from slide by Harabagiu and Narayanan

Lexical Chains Lexical Chains Lexical chains provide an improved source of world knowledge by supplying the Logic Prover with much needed axioms to link question keywords with answer concepts. Question: How were biological agents acquired by bin Laden? Answer: On 8 July 1998 , the Italian newspaper Corriere della Serra indicated that members of The World Front for Fighting Jews and Crusaders , which was founded by Bin Laden , purchasedthree chemical and biological_agent production facilities in Lexical Chain: ( v - buy#1, purchase#1 ) HYPERNYM ( v - get#1, acquire#1 ) Adapted from slide by Harabagiu and Narayanan

Axiom Selection Lexical chains and the XWN knowledge base work together to select and generate the axioms needed for a successful proof when all the keywords in the questions are not found in the answer. Question: How did Adolf Hitler die? Answer: … Adolf Hitler committed suicide … The following Lexical Chain is detected: ( n - suicide#1, self-destruction#1, self-annihilation#1 ) GLOSS ( v - kill#1 ) GLOSS ( v - die#1, decease#1, perish#1, go#17, exit#3, pass_away#1, expire#2, pass#25 ) 2 The following axioms are loaded into the Prover: exists x2 all e1 x1 (suicide_nn(x1) -> act_nn(x1) & of_in(x1,e1) & kill_vb(e1,x2,x2)). exists x3 x4 all e2 x1 x2 (kill_vb(e2,x1,x2) -> cause_vb_2(e1,x1,x3) & to_to(e1,e2) & die_vb(e2,x2,x4)). Adapted from slide by Harabagiu and Narayanan

LCC System Refecences • The previous set of slides drew information from these sources: • The Informative Role of WordNet in Open-Domain Question Answering, Pasca and Harabagiu, WordNet and Other Lexical Resources, NAACL 2001 Workshop • Pasca and Harabagiu, High Performance Question/Answering, SIGIR’01 • Moldovan, Clark, Harabagiu, Maiorano: COGEX: A Logic Prover for Question Answering. HLT-NAACL 2003 • Moldovan, Pasca, Harabagiu, and Surdeanu: Performance issues and error analysis in an open-domain question answering system. ACM Trans. Inf. Syst. 21(2): 133-154 (2003) • Harabagiu and Maiorano, Abductive Processes for Answer Justification, AAAI Spring Symposium on Mining Answers from Texts and Knowledge Bases, 2002

Incorporating Resources • For 29% of TREC questions the LCC QA system relied on an off-line taxonomy with semantic classes such as: • Disease • Drugs • Colors • Insects • Games

Incorporating Resources • How well can we do just using existing resources? • Method: • Used the 2,393 TREC questions and answer keys • Determined if it would be possible for an algorithm to locate the target answer from the resource. • So a kind of upper bound. Lita L.V., Hunt W., Nyberg E. Resource Analysis for Question Answering. ACL 2004.

Gazetteers • Resources: • CIA World Factbook • www.astronomy.com • www.50states.com • Method: • “Can be directly answered” • (Not explained further) • Results: • High precision, low recall Lita L.V., Hunt W., Nyberg E. Resource Analysis for Question Answering. ACL 2004.

WordNet • Resource: • WordNet glosses, synonyms, hypernyms, hyponyms • Method: • Question terms and phrases extracted and looked up. • If answer key matched any of these WordNet resources, then considered found. • Thus, measuring an upper bound. • Results: • About 27% can in principle be answered from WN alone Lita L.V., Hunt W., Nyberg E. Resource Analysis for Question Answering. ACL 2004.

Definition Resources • Resources: • Wikipedia • Google’s define operator • Method: • Formulate a query from n-grams extracted from each question. • Results: • Encyclopedia most helpful • TREC-12 had fewer define q’s so less benefit. Lita L.V., Hunt W., Nyberg E. Resource Analysis for Question Answering. ACL 2004.

Web Pages • Resource: • The Web • Method: • Questions tokenized and stopwords removed. • Keywords “used” (no further details) to retrieve 100 docs via Google API. • Results: • A relevant doc is found somewhere in the results for nearly 50% of the q’s. Lita L.V., Hunt W., Nyberg E. Resource Analysis for Question Answering. ACL 2004.

Web N-grams • Resource: • The Web • Method: • Retrieved top 5, 10, 15, 20, 50, and 100 docs via web query (no details provided) via Google API • Extracted the most frequent 50 n-grams (up to trigrams) • (not clear if using full text or summaries only) • Results: • The correct answer is found in the top 50 n-grams more than 50% of the time. Lita L.V., Hunt W., Nyberg E. Resource Analysis for Question Answering. ACL 2004.

Using Machine Learning in QA • The following slides are based on: • Ramakrishnan, Chakrabarti, Paranjpe, Bhattacharyya, Is Question Answering an Acquired Skill? WWW’04

Learning Answer Type Mapping • Idea: use machine learning techniques to automatically determine answer types and query terms from questions. • Two types of answer types: • Surface patterns • Infinite set, so can’t be covered by a lexicon • DATES NUMBERS PERSON NAMES LOCATIONS • “at DD:DD” “in the ‘DDs” “in DDDD” “Xx+ said” • Can also associate with synset [date#n#7] • WordNet synsets • Consider: “name an animal that sleeps upright” • Answer: “horse” • Ramakrishnan et al., Is Question Answering an Acquired Skill? WWW’04

Determining Answer Types • The hard ones are “what” and “which” questions. • Two useful heuristics: • If the head of the NP appearing before the auxiliary or main verb is not a wh-word, mark this as an a-type clue • Otherwise, the head of the NP appearing after the auxiliary/main verb is an atype clue. • Ramakrishnan et al., Is Question Answering an Acquired Skill? WWW’04

Learning Answer Types • Given a QA pair (q, a): • (“name an animal that sleeps upright”, “horse”) • (1a) See which atype(s) “horse” can map to • (1b) Look up the hypernyms of “horse” -> S • (2a) Record the k words to the right of the q-word • (2b) For each of these k words, look up their synsets • An, animal, that • (2c) Increment the counts for those synsets that also appear in S • Do significance testing • Compare synset frequencies against a background set • Retain only those that are significantly associated with the question word more so than in general (chi-square) • Ramakrishnan et al., Is Question Answering an Acquired Skill? WWW’04

Learning Answer Tyeps • Ramakrishnan et al., Is Question Answering an Acquired Skill? WWW’04

Learning to Choose Query Terms • Which words from the question to use in the query? • A tradeoff between precision and recall. • Example: • “Tokyo is the capital of which country?” • Want to use “Tokyo” verbatim • Probably “capital” as well • But maybe not “country”; maybe “nation” or maybe this word won’t appear in the retrieved passage at all. • Also, “country” corresponds to the answer type, so probably we don’t want to require it to be in the answer text. • Ramakrishnan et al., Is Question Answering an Acquired Skill? WWW’04

Learning to Choose Query Terms • Features: • POS assigned to word and immediate neighbors • Starts with uppercase letter • Is a stopword • IDF score • Is an answer-type for this question • Ambiguity indicators: • # of possible WordNet senses (NumSense) • # of other WordNet synsets that describe this sense • E.g., for “buck”: stag, deer, doe • (NumLemma) • Learner: • J48 decision tree worked best • Ramakrishnan et al., Is Question Answering an Acquired Skill? WWW’04

Learning to Choose Query Terms • Results • WordNet ambiguity indicators were very helpful • Raised accuracy from 71-73% to 80% • Atype flag improved accuracy from 1-3%

Learning to Score Passages • Given a question, and answer, & a passage (q, a, r) • Assign +1 if r contains a • Assign –1 otherwise • Features: • Do selected terms s from q appear in r? • Does r have an answer zone a that does not s? • Are the distances between tokens in a and s small? • Does a have a strong WordNet similarity with q’s answer type? • Learner: • Use logistic regression, since it produces a ranking rather than a hard classification into +1 or –1 • Produces a continuous estimate between 0 and 1 • Ramakrishnan et al., Is Question Answering an Acquired Skill? WWW’04

Learning to Score Passages • Results: • F-scores are low (.33 - .56) • However, reranking greatly improves the rank of the corresponding passages. • Eliminates many non-answers, pushing better passages towards the top.

Learning to Score Passages • Ramakrishnan et al., Is Question Answering an Acquired Skill? WWW’04

Computing WordNet Similarity • Path-based similarity measures are not all that good in WordNet • 3 hops from entity to artifact • 3 hops from mammal to elephant • An alternative: • Given a target synset t and an answer synset a: • Measure the overlap of nodes on the path • from t to all noun roots and • from a to all noun roots • Algorithm for computing similarity of t to a: • If t is not a hypernym of a: assign 0 • Else collect the set of hypernym synsets of t and a • Call them Ht and Ha • Compute the Jaccard overlap • |Ht Intersect Ha| / |Ht Union Ha| • Ramakrishnan et al., Is Question Answering an Acquired Skill? WWW’04

Computing WordNet Similarity entity • Algorithm for computing similarity of t to a: • |Ht Intersect Ha| / |Ht Union Ha| |Ht Intersect Ha| object living thing |Ht Union Ha| organism Ht = mammal, Ha = elephant 7/10 = .7 Ht = animal, Ha = elephant 5/10 = .5 Ht = animal, Ha = mammal 4/7 = .57 Ht = mammal, Ha = fox 7/11 = .63 animal chordate vertebrate mammal placental mammal proboscidean elephant • Ramakrishnan et al., Is Question Answering an Acquired Skill? WWW’04

System Extension: Definition Questions • Definition questions ask about the definition or description of a concept: • Who is John Galt? • What is anorexia nervosa? • Many “information nuggets” are acceptable answers • Who is George W. Bush? • … George W. Bush, the 43rd President of the United States… • George W. Bush defeated Democratic incumbentAnn Richards to become the 46th Governor of the State of Texas… • Scoring • Any information nugget is acceptable • Precision score over all information nuggets Adapted from slide by Surdeanu and Pasca

What <be> a <QP> ? Who <be> <QP> ? example: “Who is Zebulon Pike?” <QP>, the <AP> <QP> (a <AP>) <AP HumanConcept> <QP> example: “explorer Zebulon Pike” Question patterns Answer patterns Definition Detection with Pattern Matching Adapted from slide by Surdeanu and Pasca

Answer Detection with Concept Expansion • Enhancement for Definition questions • Identify terms that are semantically related to the phrase to define • Use WordNet hypernyms (more general concepts) Adapted from slide by Surdeanu and Pasca

Online QA Examples • Examples (none work very well) • AnswerBus: • http://www.answerbus.com • Ionaut: • http://www.ionaut.com:8400/ • LCC: • http://www.languagecomputer.com/demos/question_answering/index.html Adapted from slides by Manning, Harabagiu, Kushmeric, and ISI

What about Complex Questions?

SIMS 290-2: Applied Natural Language Processing