Answering questions by computer
1 / 87

Answering Questions by Computer - PowerPoint PPT Presentation

  • Uploaded on

Answering Questions by Computer. Terminology – Question Type. Question Type : an idiomatic categorization of questions for purposes of distinguishing between different processing strategies and/or answer formats E.g. TREC2003 FACTOID: “How far is it from Earth to Mars?”

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Answering Questions by Computer' - patrick-mullins

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Terminology question type
Terminology – Question Type

  • Question Type: an idiomatic categorization of questions for purposes of distinguishing between different processing strategies and/or answer formats

    • E.g. TREC2003

      • FACTOID: “How far is it from Earth to Mars?”

      • LIST: “List the names of chewing gums”

      • DEFINITION: “Who is Vlad the Impaler?”

    • Other possibilities:

      • RELATIONSHIP: “What is the connection between Valentina Tereshkova and Sally Ride?”

      • SUPERLATIVE: “What is the largest city on Earth?”

      • YES-NO: “Is Saddam Hussein alive?”

      • OPINION: “What do most Americans think of gun control?”

      • CAUSE&EFFECT: “Why did Iraq invade Kuwait?”

Terminology answer type
Terminology – Answer Type

  • Answer Type: the class of object (or rhetorical type of sentence) sought by the question. E.g.

    • PERSON (from “Who …”)

    • PLACE (from “Where …”)

    • DATE (from “When …”)

    • NUMBER (from “How many …”)

    • but also

    • EXPLANATION (from “Why …”)

    • METHOD (from “How …”)

  • Answer types are usually tied intimately to the classes recognized by the system’s Named Entity Recognizer.

Terminology question focus
Terminology – Question Focus

  • Question Focus: The property or entity that is being sought by the question.

  • E.g.

    • “In whatstateis the Grand Canyon?”

    • “What is thepopulationof Bulgaria?”

    • “Whatcolouris a pomegranate?”

Terminology question topic
Terminology – Question Topic

  • Question Topic: the object (person, place, …) or event that the question is about. The question might well be about a property of the topic, which will be the question focus.

  • E.g. “What is the height of Mt. Everest?”

    • height is the focus

    • Mt. Everest is the topic

Terminology candidate passage
Terminology – Candidate Passage

  • Candidate Passage: a text passage (anything from a single sentence to a whole document) retrieved by a search engine in response to a question.

  • Depending on the query and kind of index used, there may or may not be a guarantee that a candidate passage has any candidate answers.

  • Candidate passages will usually have associated scores, from the search engine.

Terminology candidate answer
Terminology – Candidate Answer

  • Candidate Answer: in the context of a question, a small quantity of text (anything from a single word to a sentence or bigger, but usually a noun phrase) that is of the same type as the Answer Type.

    • In some systems, the type match may be approximate, if there is the concept of confusability.

  • Candidate answers are found in candidate passages

  • E.g.

    • 50

    • Queen Elizabeth II

    • September 8, 2003

    • by baking a mixture of flour and water

Terminology authority list
Terminology – Authority List

  • Authority List (or File): a collection of instances of a class of interest, used to test a term for class membership.

  • Instances should be derived from an authoritative source and be as close to complete as possible.

  • Ideally, class is small, easily enumerated and with members with a limited number of lexical forms.

  • Good:

    • Days of week

    • Planets

    • Elements

  • Good statistically, but difficult to get 100% recall:

    • Animals

    • Plants

    • Colours

  • Problematic

    • People

    • Organizations

  • Impossible

    • All numeric quantities

    • Explanations and other clausal quantities

Essence of text based qa
Essence of Text-based QA

(Single source answers)

  • Need to find a passage that answers the question.

    • Find a candidate passage (search)

    • Check that semantics of passage and question match

    • Extract the answer

Ranking candidate answers
Ranking Candidate Answers

Q066: Name the first private citizen to fly in space.

  • Answer type: Person

  • Text passage:

    “Among them was Christa McAuliffe, the first private citizen to fly in space. Karen Allen, best known for her starring role in “Raiders of the Lost Ark”, plays McAuliffe. Brian Kerwin is featured as shuttle pilot MikeSmith...”

Answer extraction
Answer Extraction

  • Also called Answer Selection/Pinpointing

  • Given a question and candidate passages, the process of selecting and ranking candidate answers.

  • Usually, candidate answers are those terms in the passages which have the same answer type as that generated from the question

  • Ranking the candidate answers depends on assessing how well the passage context relates to the question

  • 3 Approaches:

    • Heuristic features

    • Shallow parse fragments

    • Logical proof

Features for answer ranking
Features for Answer Ranking

  • Number of question terms matched in the answer passage

  • Number of question terms matched in the same phrase as the candidate answer

  • Number of question terms matched in the same sentence as the candidate answer

  • Flag set to 1 if the candidate answer is followed by a punctuation sign

  • Number of question terms matched, separated from the candidate answer by at most three words and one comma

  • Number of terms occurring in the same order in the answer passage as in the question

  • Average distance from candidate answer to question term matches


Heuristics for answer ranking in the lasso system
Heuristics for Answer Ranking in the Lasso System

  • Same_Word_Sequence_score – number of words from the question that are recognized in the same sequence in the passage.

  • Punctuation_sign_score – a flag set to 1 if the candidate answer is followed by a punctuation sign

  • Comma_3_word_score – measure the number of question words that follow the candidate, if the candidate is followed by a coma.

  • Same_parse_subtree_score – number of question words found in the parse sub-tree of the answer

  • Same_sentence_score – number of question words found in the answer’s sentence.

  • Distance – score – adds the distance (measured in number of words) between the answer candidate and the other keywords in the window.


  • Evaluation of this kind of system is usually based on some kind of TREC-like metric.

  • In Q/A the most frequent metric is

    • Mean reciprocal rank

      You’re allowed to return N answers. Your score is based on 1/Rank of the first right answer.

      Averaged over all the questions you answer.

Answer types and modifiers
Answer Types and Modifiers

Name 5 French Cities

  • Most likely there is no type for “French Cities”

  • So will look for CITY

    • include “French/France” in bag of words, and hope for the best

    • include “French/France” in bag of words, retrieve documents, and look for evidence (deep parsing, logic)

    • use high-precision Language Identification on results

    • If you have a list of French cities, could either

      • Filter results by list

      • Use Answer-Based QA (see later)

    • Use longitude/latitude information of cities and countries

Answer types and modifiers1
Answer Types and Modifiers

Name a female figure skater

  • Most likely there is no type for “female figure skater”

  • Most likely there is no type for “figure skater”

  • Look for PERSON, with query terms {figure, skater}

  • What to do about “female”? Two approaches.

    • Include “female” in the bag-of-words.

      • Relies on logic that if “femaleness” is an interesting property, it might well be mentioned in answer passages.

      • Does not apply to, say “singer”.

    • Leave out “female” but test candidate answers for gender.

      • Needs either an authority file or a heuristic test.

      • Test may not be definitive.

Part ii specific approaches
Part II - Specific Approaches

  • By Genre

    • Statistical QA

    • Pattern-based QA

    • Web-based QA

    • Answer-based QA (TREC only)

  • By System

    • SMU

    • LCC

    • USC-ISI

    • Insight

    • Microsoft

    • IBM Statistical

    • IBM Rule-based

Statistical qa
Statistical QA

  • Use statistical distributions to model likelihoods of answer type and answer

  • E.g. IBM (Ittycheriah, 2001) – see later section

Pattern based qa
Pattern-based QA

  • For a given question type, identify the typical syntactic constructions used in text to express answers to such questions

  • Typically very high precision, but a lot of work to get decent recall

Web based qa
Web-Based QA

  • Exhaustive string transformations

    • Brill et al. 2002

  • Learning

    • Radev et al. 2001

Answer based qa
Answer-Based QA

  • Problem: Sometimes it is very easy to find an answer to a question using resource A, but the task demands that you find it in resource B.

  • Solution: First find the answer in resource A, then locate the same answer, along with original question terms, in resource B.

  • Artificial problem, but real for TREC participants.

Answer based qa1
Answer-Based QA

  • Web-Based solution:

When a QA system looks for answers within a relatively small textual collection, the chance of finding strings/sentences that closely match the question string is small. However, when a QA system looks for strings/sentences that closely match the question string on the web, the chance of finding correct answer is much higher.

Hermjakob et al. 2002

  • Why this is true:

    • The Web is much larger than the TREC Corpus (3,000 : 1)

    • TREC questions are generated from Web logs, and the style of language (and subjects of interest) in these logs are more similar to the Web content than to newswire collections.

Answer based qa2
Answer-Based QA

  • Database/Knowledge-base/Ontology solution:

    • When question syntax is simple and reliably recognizable, can express as a logical form

    • Logical form represents entire semantics of question, and can be used to access structured resource:

      • WordNet

      • On-line dictionaries

      • Tables of facts & figures

      • Knowledge-bases such as Cyc

    • Having found answer

      • construct a query with original question terms + answer

      • Retrieve passages

      • Tell Answer Extraction the answer it is looking for

Approaches of specific systems
Approaches of Specific Systems

  • SMU Falcon

  • LCC


  • Insight

  • Microsoft

  • IBM

Note: Some of the slides and/or examples in these sections are taken from papers or presentations from the respective system authors

Smu falcon
SMU Falcon

Harabagiu et al. 2000

Smu falcon1
SMU Falcon

  • From question, dependency structure called question semantic form is created

  • Query is Boolean conjunction of terms

  • From answer passages that contain at least one instance of answer type, generate answer semantic form

  • 3 processing loops:

  • Loop 1

    • Triggered when too few or too many passages are retrieved from search engine

  • Loop 2

    • Triggered when question semantic form and answer semantic form cannot be unified

  • Loop 3

    • Triggered when unable to perform abductive proof of answer correctness

Smu falcon2
SMU Falcon

  • Loops provide opportunities to perform alternations

  • Loop 1: morphological expansions and nominalizations

  • Loop 2: lexical alternations – synonyms, direct hypernyms and hyponyms

  • Loop 3: paraphrases

  • Evaluation (Pasca & Harabagiu, 2001). Increase in accuracy in 50-byte task in TREC9

    • Loop 1: 40%

    • Loop 2: 52%

    • Loop 3: 8%

    • Combined: 76%

Answering questions by computer

  • Moldovan & Rus, 2001

  • Uses Logic Prover for answer justification

    • Question logical form

    • Candidate answers in logical form

    • XWN glosses

    • Linguistic axioms

    • Lexical chains

  • Inference engine attempts to verify answer by negating question and proving a contradiction

  • If proof fails, predicates in question are gradually relaxed until proof succeeds or associated proof score is below a threshold.

Lcc lexical chains
LCC: Lexical Chains

Q:1518 What year did Marco Polo travel to Asia?

Answer:Marco polo divulged the truth after returning in 1292 from his travels, which included several months on Sumatra

Lexical Chains:

(1) travel_to:v#1 -> GLOSS -> travel:v#1 -> RGLOSS -> travel:n#1

(2) travel_to#1 -> GLOSS -> travel:v#1 -> HYPONYM -> return:v#1

(3) Sumatra:n#1 -> ISPART -> Indonesia:n#1 -> ISPART ->

Southeast _Asia:n#1 -> ISPART -> Asia:n#1

Q:1570 What is the legal age to vote in Argentina?

Answer:Voting is mandatory for all Argentines aged over 18.

Lexical Chains: (1) legal:a#1 -> GLOSS -> rule:n#1 -> RGLOSS -> mandatory:a#1

(2) age:n#1 -> RGLOSS -> aged:a#3

(3) Argentine:a#1 -> GLOSS -> Argentina:n#1

Lcc logic prover
LCC: Logic Prover

  • Question

    • Which company created the Internet Browser Mosaic?

    • QLF: (_organization_AT(x2) ) & company_NN(x2) & create_VB(e1,x2,x6) & Internet_NN(x3) & browser_NN(x4) & Mosaic_NN(x5) & nn_NNC(x6,x3,x4,x5)

  • Answer passage

    • ... Mosaic , developed by the National Center for Supercomputing Applications ( NCSA ) at the University of Illinois at Urbana - Champaign ...

    • ALF: ... Mosaic_NN(x2) & develop_VB(e2,x2,x31) & by_IN(e2,x8) & National_NN(x3) & Center_NN(x4) & for_NN(x5) & Supercomputing_NN(x6) & application_NN(x7) & nn_NNC(x8,x3,x4,x5,x6,x7) & NCSA_NN(x9) & at_IN(e2,x15) & University_NN(x10) & of_NN(x11) & Illinois_NN(x12) & at_NN(x13) & Urbana_NN(x14) & nn_NNC(x15,x10,x11,x12,x13,x14) & Champaign_NN(x16) ...

  • Lexical Chains develop <-> make and make <->create

    • exists x2 x3 x4 all e2 x1 x7 (develop_vb(e2,x7,x1) <-> make_vb(e2,x7,x1) & something_nn(x1) & new_jj(x1) & such_jj(x1) & product_nn(x2) & or_cc(x4,x1,x3) & mental_jj(x3) & artistic_jj(x3) & creation_nn(x3)).

    • all e1 x1 x2 (make_vb(e1,x1,x2) <-> create_vb(e1,x1,x2) & manufacture_vb(e1,x1,x2) & man-made_jj(x2) & product_nn(x2)).

  • Linguistic axioms

    • all x0 (mosaic_nn(x0) -> internet_nn(x0) & browser_nn(x0))

Usc isi

  • Textmap system

    • Ravichandran and Hovy, 2002

    • Hermjakob et al. 2003

  • Use of Surface Text Patterns

  • When was X born ->

    • Mozart was born in 1756

    • Gandhi (1869-1948)

      Can be captured in expressions

    • <NAME> was born in <BIRTHDATE>

    • <NAME> (<BIRTHDATE> -

  • These patterns can be learned

Usc isi textmap

  • Use bootstrapping to learn patterns.

  • For an identified question type (“When was X born?”), start with known answers for some values of X

    • Mozart 1756

    • Gandhi 1869

    • Newton 1642

  • Issue Web search engine queries (e.g. “+Mozart +1756” )

  • Collect top 1000 documents

  • Filter, tokenize, smooth etc.

  • Use suffix tree constructor to find best substrings, e.g.

    • Mozart (1756-1791)

  • Filter

    • Mozart (1756-

  • Replace query strings with e.g. <NAME> and <ANSWER>

  • Determine precision of each pattern

    • Find documents with just question term (Mozart)

    • Apply patterns and calculate precision

Usc isi textmap1

  • Finding Answers

    • Determine Question type

    • Perform IR Query

    • Do sentence segmentation and smoothing

    • Replace question term by question tag

      • i.e. replace Mozart with <NAME>

    • Search for instances of patterns associated with question type

    • Select words matching <ANSWER>

    • Assign scores according to precision of pattern


  • Soubbotin, 2002. Soubbotin & Soubbotin, 2003.

  • Performed very well in TREC10/11

  • Comprehensive and systematic use of “Indicative patterns”

  • E.g.

    • cap word; paren; 4 digits; dash; 4 digits; paren


    • Mozart (1756-1791)

  • The patterns are broader than named entities

  • “Semantics in syntax”

  • Patterns have intrinsic scores (reliability), independent of question


  • Patterns with more sophisticated internal structure are more indicative of answer

  • 2/3 of their correct entries in TREC10 were answered by patterns

  • E.g.

    • a == {countries}

    • b == {official posts}

    • w == {proper names (first and last)}

    • e == {titles or honorifics}

    • Patterns for “Who is the President (Prime Minister) of given country?

      • abeww

      • ewwdb,a

      • b,aeww

  • Definition questions: (A is primary query term, X is answer)

    • <A; comma; [a/an/the]; X; [comma/period]>

    • For: “Moulin Rouge, a cabaret”

    • <X; [comma]; [also] called; A [comma]>

    • For: “naturally occurring gas called methane”

    • <A; is/are; [a/an/the]; X>

    • For: “Michigan’s state flower is the apple blossom”


  • Emphasis on shallow techniques, lack of NLP

  • Look in vicinity of text string potentially matching pattern for “zeroing” – e.g. for occupational roles:

    • Former

    • Elect

    • Deputy

    • Negation

  • Comments:

    • Relies on redundancy of large corpus

    • Works for factoid question types of TREC-QA – not clear how it extends

    • Not clear how they match questions to patterns

    • Named entities within patterns have to be recognized


  • Data-Intensive QA. Brill et al. 2002

  • “Overcoming the surface string mismatch between the question formulation and the string containing the answer”

  • Approach based on the assumption/intuition that someone on the Web has answered the question in the same way it was asked.

  • Want to avoid dealing with:

    • Lexical, syntactic, semantic relationships (bet. Q & A)

    • Anaphora resolution

    • Synonymy

    • Alternate syntax

    • Indirect answers

  • Take advantage of redundancy on Web, then project to TREC corpus (Answer-based QA)

Microsoft askmsr
Microsoft AskMSR

  • Formulate multiple queries – each rewrite has intrinsic score. E.g. for “What is relative humidity?”

    • [“+is relative humidity”, LEFT, 5]

    • [“relative +is humidity”, RIGHT, 5]

    • [“relative humidity +is”, RIGHT, 5]

    • [“relative humidity”, NULL, 2]

    • [“relative” AND “humidity”, NULL, 1]

  • Get top 100 documents from Google

  • Extract n-grams from document summaries

  • Score n-grams by summing the scores of the rewrites it came from

  • Use tiling to merge n-grams

  • Search for supporting documents in TREC corpus

Microsoft askmsr1
Microsoft AskMSR

  • Question is: “What is the rainiest place on Earth”

  • Answer from Web is: “Mount Waialeale”

  • Passage in TREC corpus is: “… In misty Seattle, Wash., last year, 32 inches of rain fell. Hong Kong gets about 80 inches a year, and even Pago Pago, noted for its prodigious showers, gets only about 196 inches annually. (The titleholder, according to the National Geographic Society, is Mount Waialeale in Hawaii, where about 460 inches of rain falls each year.) …”

  • Very difficult to imagine getting this passage by other means

Ibm statistical qa ittycheriah 2001
IBM Statistical QA (Ittycheriah, 2001)

q = question

a = answer

c = “correctness”

e = answer type

p(c|q,a) = Se p(c,e|q,a)

= Se p(c|e,q,a) p(e|q,a)

  • ATM predicts, from the question and a proposed answer, the answer type they both satisfy

  • Given a question, an answer, and the predicted answer type, ASM seeks to model the correctness of this configuration.

  • Distributions are modelled using a maximum entropy formulation

  • Training data = human judgments

    • For ATM, 13K questions annotated with 31 categories

    • For ASM, ~ 5K questions from TREC plus trivia

p(e|q,a)is the answer type model (ATM)

p(c|e,q,a)is the answer selection model (ASM)

Ibm statistical qa ittycheriah
IBM Statistical QA (Ittycheriah)

  • Question Analysis (by ATM)

    • Selects one out of 31 categories

  • Search

    • Question expanded by Local Context Analysis

    • Top 1000 documents retrieved

  • Passage Extraction: Top 100 passages that:

    • Maximize question word match

    • Have desired answer type

    • Minimize dispersion of question words

    • Have similar syntactic structure to question

  • Answer Extraction:

    • Candidate answers ranked using ASM

Ibm rule based
IBM Rule-based

Predictive Annotation (Prager 2000, Prager 2003)

  • Want to make sure passages retrieved by search engine have at least one candidate answer

  • Recognize that candidate answer is of correct answer type which corresponds to a label (or several) generated by Named Entity Recognizer

  • Annotate entire corpus and index semantic labels along with text

  • Identify answer types in questions and include corresponding labels in queries

Ibm piquant










Predictive Annotation –

  • E.g.: Question is “Who invented baseball?”

  • “Who” can map to PERSON$ or ORGANIZATION$

  • Suppose we assume only people invent things (it doesn’t really matter).

  • So “Who invented baseball?” -> {PERSON$ invent baseball}

Consider text“… but its conclusion was based largely on the recollections of a man named Abner Graves, an elderly mining engineer, who reported that baseball had been "invented" by Doubleday between 1839 and 1841. ”

Ibm piquant1










Predictive Annotation –

  • Previous example

    • “Who invented baseball?” -> {PERSON$ invent baseball}

  • However, same structure is equally effective at answering

    • “What sport did Doubleday invent?”->{SPORT$ invent Doubleday}

Ibm rule based1
IBM Rule-Based

Handling Subsumption & Disjunction

  • If an entity is of a type which has a parent type, then how is annotation done?

  • If a proposed answer type has a parent type, then what answer type should be used?

  • If an entity is ambiguous then what should the annotation be?

  • If the answer type is ambiguous, then what should be used?


  • If an entity is of a type which has a parent type, then how is annotation done?

  • If a proposed answer type has a parent type, then what answer type should be used?

  • If an entity is ambiguous then what should the annotation be?

  • If the answer type is ambiguous, then what should be used?

Subsumption disjunction
Subsumption & Disjunction

  • Consider New York City – both a CITY and a PLACE

    • To answer “Where did John Lennon die?”, it needs to be a PLACE

    • To answer “In what city is the Empire State Building?”, it needs to be a CITY.

    • Do NOT want to do subsumption calculation in search engine

  • Two scenarios

    1. Expand Answer Type and use most specific entity annotation

    1A { (CITY PLACE) John_Lennon die} matches CITY

    1B {CITY Empire_State_Building} matches CITY


    2. Use most specific Answer Type and multiple annotations of NYC

    2A {PLACE John_Lennon die} matches (CITY PLACE)

    2B {CITY Empire_State_Building} matches (CITY PLACE)

  • Case 2 preferred for simplicity, because disjunction in #1 should contain all hyponyms of PLACE, while disjunction in #2 should contain all hypernyms of CITY

  • Choice #2 suggests can use disjunction in answer type to represent ambiguity:

    • “Who invented the laser”->{(PERSON ORGANIZATION) invent laser}

Clausal classes
Clausal classes

  • Any structure that can be recognized in text can be annotated.

    • Quotations

    • Explanations

    • Methods

    • Opinions

  • Any semantic class label used in annotation can be indexed, and hence used as a target of search:

    • What did Karl Marx say about religion?

    • Why is the sky blue?

    • How do you make bread?

    • What does Arnold Schwarzenegger think about global warming?

Answering questions by computer

Predictive Annotation – Improving Precision at no cost to Recall

  • E.g.: Question is “Where is Belize?”


  • But we know Belize is a country.

  • So “Where is Belize?” -> {(CONTINENT$ WORLDREGION$) Belize}

    • Belize occurs 1068 times in TREC corpus

    • Belize and PLACE$ co-occur in only 537 sentences

    • Belize and CONTINENT$ or WORLDREGION$ co-occur in only 128 sentences

Virtual annotation prager 2001
Virtual Annotation (Prager 2001)

  • Use WordNet to find all candidate answers (hypernyms)

  • Use corpus co-occurrence statistics to select “best” ones

    • Rather like approach to WSD by Mihalcea and Moldovan (1999)

Natural categories
Natural Categories

  • “Basic Objects in Natural Categories” Rosch et al. (1976)

  • According to psychological testing, these are categorization levels of intermediate specificity that people tend to use in unconstrained settings.

What can we conclude
What can we conclude?

  • There are descriptive terms that people are drawn to use naturally.

  • We can expect to find instances of these in text, in the right contexts.

  • These terms will serve as good answers.

Virtual annotation cont
Virtual Annotation (cont.)

  • Find all parents of query term in WordNet

  • Look for co-occurrences of query term and parent in text corpus

  • Expect to find snippets such as: “… meerkats and other Y …”

  • Many different phrasings are possible, so we just look for proximity, rather than parse.

  • Scoring:

    • Count co-occurrences of each parent with search term, and divide by level number (only levels >= 1), generating Level-Adapted Count (LAC).

    • Exclude very highest levels (too general).

    • Select parent with highest LAC plus any others with LAC within 20%.

Sample answer passages
Sample Answer Passages

Use Answer-based QA to locate answers

“What is a nematode?” ->

“Such genes have been found in nematode worms but not yet in higher animals.”

“What is a meerkat?” ->

“South African golfer Butch Kruger had a good round going in the central Orange Free State trials, until a mongoose-like animalgrabbed his ball with its mouth and dropped down its hole. Kruger wrote on his card: "Meerkat."”

Use of cyc as sanity checker
Use of Cyc as Sanity Checker

  • Cyc: Large Knowledge-base and Inference engine (Lenat 1995)

  • A post-hoc process for

    • Rejecting “insane” answers

      • How much does a grey wolf weigh?

        • 300 tons

    • Boosting confidence for “sane” answers

  • Sanity checker invoked with

    • Predicate, e.g. “weight”

    • Focus, e.g. “grey wolf”

    • Candidate value, e.g. “300 tons”

  • Sanity checker returns

    • “Sane”: + or – 10% of value in Cyc

    • “Insane”: outside of the reasonable range

      • Plan to use distributions instead of ranges

    • “Don’t know”

  • Confidence score highly boosted when answer is “sane”

Cyc sanity checking example
Cyc Sanity Checking Example

  • Trec11 Q: “What is the population of Maryland?”

  • Without sanity checking

    • PIQUANT’s top answer: “50,000”

    • Justification: “Maryland’s population is 50,000 and growing rapidly.”

    • Passage discusses an exotic species “nutria”, not humans

  • With sanity checking

    • Cyc knows the population of Maryland is 5,296,486

    • It rejects the top “insane” answers

    • PIQUANT’s new top answer: “5.1 million” with very high confidence


  • Process the question by…

    • Forming a search engine query from the original question

    • Detecting the answer type

  • Get some results

  • Extract answers of the right type based on

    • How often they occur

Step 1 rewrite the questions
Step 1: Rewrite the questions

  • Intuition: The user’s question is often syntactically quite close to sentences that contain the answer

    • Where istheLouvreMuseumlocated?

      • TheLouvreMuseumislocated in Paris

    • Who createdthecharacterofScrooge?

      • Charles DickenscreatedthecharacterofScrooge.

Query rewriting
Query rewriting

Classify question into seven categories

  • Who is/was/are/were…?

  • When is/did/will/are/were …?

  • Where is/are/were …?

    a. Hand-crafted category-specific transformation rules

    e.g.: For where questions, move ‘is’ to all possible locations

    Look to the right of the query terms for the answer.

    “Where is the Louvre Museum located?”

     “is the Louvre Museum located”

     “the is Louvre Museum located”

     “the Louvre is Museum located”

     “the Louvre Museum is located”

     “the Louvre Museum located is”

Step 2 query search engine
Step 2: Query search engine

  • Send all rewrites to a Web search engine

  • Retrieve top N answers (100-200)

  • For speed, rely just on search engine’s “snippets”, not the full text of the actual document

Step 3 gathering n grams
Step 3: Gathering N-Grams

  • Enumerate all N-grams (N=1,2,3) in all retrieved snippets

  • Weight of an n-gram: occurrence count, each weighted by “reliability” (weight) of rewrite rule that fetched the document

    • Example: “Who created the character of Scrooge?”

      Dickens 117

      Christmas Carol 78

      Charles Dickens 75

      Disney 72

      Carl Banks 54

      A Christmas 41

      Christmas Carol 45

      Uncle 31

Step 4 filtering n grams
Step 4: Filtering N-Grams

  • Each question type is associated with one or more “data-type filters” = regular expressions for answer types

  • Boost score of n-grams that match the expected answer type.

  • Lower score of n-grams that don’t match.

Step 5 tiling the answers
Step 5: Tiling the Answers





merged, discard old n-grams

Charles Dickens


Mr Charles

Score 45

Mr Charles Dickens


  • Standard TREC contest test-bed (TREC 2001): 1M documents; 900 questions

    • Technique does ok, not great (would have placed in top 9 of ~30 participants)

    • But with access to the Web… They do much better, would have come in second on TREC 2001


  • In many scenarios (e.g., monitoring an individuals email…) we only have a small set of documents

  • Works best/only for “Trivial Pursuit”-style fact-based questions

  • Limited/brittle repertoire of

    • question categories

    • answer data types/filters

    • query rewriting rules

Isi surface patterns approach
ISI: Surface patterns approach

  • Use of Characteristic Phrases

  • "When was <person> born”

    • Typical answers

      • "Mozart was born in 1756.”

      • "Gandhi (1869-1948)...”

    • Suggests phrases like

      • "<NAME> was born in <BIRTHDATE>”

      • "<NAME> ( <BIRTHDATE>-”

    • as Regular Expressions can help locate correct answer

Use pattern learning
Use Pattern Learning

  • Example:

    • “The great composer Mozart (1756-1791) achieved fame at a young age”

    • “Mozart (1756-1791) was a genius”

    • “The whole world would always be indebted to the great music of Mozart (1756-1791)”

  • Longest matching substring for all 3 sentences is "Mozart (1756-1791)”

  • Suffix tree would extract "Mozart (1756-1791)" as an output, with score of 3

  • Reminiscent of IE pattern learning

  • Pattern learning cont
    Pattern Learning (cont.)

    • Repeat with different examples of same question type

      • “Gandhi 1869”, “Newton 1642”, etc.

    • Some patterns learned for BIRTHDATE

      • a. born in <ANSWER>, <NAME>

      • b. <NAME> was born on <ANSWER> ,

      • c. <NAME> ( <ANSWER> -

      • d. <NAME> ( <ANSWER> - )


    • 6 different Q types

      • from Webclopedia QA Typology (Hovy et al., 2002a)

        • BIRTHDATE

        • LOCATION

        • INVENTOR

        • DISCOVERER

        • DEFINITION

        • WHY-FAMOUS

    Experiments pattern precision
    Experiments: pattern precision

    • BIRTHDATE table:

      • 1.0 <NAME> ( <ANSWER> - )

      • 0.85 <NAME> was born on <ANSWER>,

      • 0.6 <NAME> was born in <ANSWER>

      • 0.59 <NAME> was born <ANSWER>

      • 0.53 <ANSWER> <NAME> was born

      • 0.50 - <NAME> ( <ANSWER>

      • 0.36 <NAME> ( <ANSWER> -


    • 1.0 <ANSWER> invents <NAME>

    • 1.0 the <NAME> was invented by <ANSWER>

    • 1.0 <ANSWER> invented the <NAME> in

  • Experiments cont
    Experiments (cont.)


      • 1.0 when <ANSWER> discovered <NAME>

      • 1.0 <ANSWER>'s discovery of <NAME>

      • 0.9 <NAME> was discovered by <ANSWER> in


    • 1.0 <NAME> and related <ANSWER>

    • 1.0 form of <ANSWER>, <NAME>

    • 0.94 as <NAME>, <ANSWER> and

  • Experiments cont1
    Experiments (cont.)


      • 1.0 <ANSWER> <NAME> called

      • 1.0 laureate <ANSWER> <NAME>

      • 0.71 <NAME> is the <ANSWER> of


    • 1.0 <ANSWER>'s <NAME>

    • 1.0 regional : <ANSWER> : <NAME>

    • 0.92 near <NAME> in <ANSWER>

  • Depending on question type, get high MRR (0.6–0.9), with higher results from use of Web than TREC QA collection

  • Shortcomings extensions
    Shortcomings & Extensions

    • Need for POS &/or semantic types

      • "Where are the Rocky Mountains?”

      • "Denver's new airport, topped with white fiberglass cones in imitation of the Rocky Mountains in the background , continues to lie empty”

      • <NAME> in <ANSWER>

  • NE tagger &/or ontology could enable system to determine "background" is not a location

  • Shortcomings cont
    Shortcomings... (cont.)

    • Long distance dependencies

      • "Where is London?”

      • "London, which has one of the most busiest airports in the world, lies on the banks of the river Thames”

      • would require pattern like:<QUESTION>, (<any_word>)*, lies on <ANSWER>

    • Abundance & variety of Web data helps system to find an instance of patterns w/o losing answers to long distance dependencies

    Shortcomings cont1
    Shortcomings... (cont.)

    • System currently has only one anchor word

      • Doesn't work for Q types requiring multiple words from question to be in answer

        • "In which county does the city of Long Beach lie?”

        • "Long Beach is situated in Los Angeles County”

        • required pattern:<Q_TERM_1> is situated in <ANSWER> <Q_TERM_2>

    • Does not use case

      • "What is a micron?”

      • "...a spokesman for Micron, a maker of semiconductors, said SIMMs are..."

  • If Micron had been capitalized in question, would be a perfect answer

  • Qa typology from isi usc
















































    QA Typology from ISI (USC)

    • Typology of typical Q forms—94 nodes (47 leaf nodes)

    • Analyzed 17,384 questions (from

    Question answering example
    Question Answering Example

    • How hot does the inside of an active volcano get?

    • get(TEMPERATURE, inside(volcano(active)))

    • “lava fragments belched out of the mountain were as hot as 300 degrees Fahrenheit”

    • fragments(lava, TEMPERATURE(degrees(300)),

      belched(out, mountain))

      • volcano ISA mountain

      • lava ISPARTOF volcano lava inside volcano

      • fragments of lava HAVEPROPERTIESOF lava

    • The needed semantic information is in WordNet definitions, and was successfully translated into a form that was used for rough ‘proofs’


    • AskMSR: Question Answering Using the Worldwide Web

      • Michele Banko, Eric Brill, Susan Dumais, Jimmy Lin


      • In Proceedings of 2002 AAAI SYMPOSIUM on Mining Answers from Text and Knowledge Bases, March 2002 

    • Web Question Answering: Is More Always Better?

      • Susan Dumais, Michele Banko, Eric Brill, Jimmy Lin, Andrew Ng


    • D. Ravichandran and E.H. Hovy. 2002. Learning Surface Patterns for a Question Answering System.ACL conference, July 2002.

    Harder questions
    Harder Questions

    • Factoid question answering is really pretty silly.

    • A more interesting task is one where the answers are fluid and depend on the fusion of material from disparate texts over time.

      • Who is Condoleezza Rice?

      • Who is Mahmoud Abbas?

      • Why was Arafat flown to Paris?