Learning Surface Text Patterns for a Question Answering System

Learning Surface Text Patterns for a Question Answering System Deepak Ravichandran Eduard Hovy Information Sciences Institute University of Southern California

From Proceedings of the ACL Conference, 2002

Goal • Explore power of surface text patterns for open-domain QA systems

Why This Paper • Fall 2001 NLP project - QA system

Winning Team • Matt Myers & Henry Longmore • "If we were asked to design another question answering system, we would keep the same basic system as a foundation. We would then use more patterns and variations of patterns in the NE recognizer. We would use Machine Learning techniques, particularly for learning patterns for the NE recognizer."

Meanwhile, back at the batcave... • Automatic learning of surface text patterns for open-domain question answering

Recent Open Domain Systems • External knowledge, tools • Named Entity taggers • WordNet • parsers • hand-tagged corpora • ontology lists

Recent O-D Systems (cont.) • Recent TREC-10 evaluation • winning system used just 1 resource • extensive list of surface patterns • surprised many

Basic Idea • Investigate potential of surface patterns • Learn patterns • Measure accuracy

Characteristic Phrases • "When was <person> born” • Typical answers • "Mozart was born in 1756.” • "Gandhi (1869-1948)...” • Suggests phrases like • "<NAME> was born in <BIRTHDATE>” • "<NAME> ( <BIRTHDATE>-” • as Regular Expressions can help locate correct answer

Auto-learn Patterns from Web • Tagged corpus using AltaVista • Hand-crafted examples of each question type • Bootstrapping to build large tagged corpus as in Information Extraction (Riloff, 96) • Abundance of data on web - reliable statistical estimates

The System • Assume sentence is a simple sequence of words • Search for repeated word orderings • Evidence for useful answer phrases

System (cont.) • Suffix trees to extract substrings of optimal length • Suffix trees from Computational Biology (Gusfield, 97) • Used to detect DNA sequences • Linear time on size of corpus • Don't restrict length of substrings

Pattern Learning Algorithm • Select example for question type • BIRTHYEAR questions select "Mozart 1756” • "Mozart" is question term • "1756" is answer term • Submit Q & A terms to AltaVista • Require both terms to be present

Pattern Learning (cont.) • Download top 1000 documents returned • Apply sentence breaker to documents • Keep only those sentences with both terms present

Pattern Learning (cont.) • Terms can be present in various forms • e.g. Mozart as: • Wolfgang Amadeus Mozart • Mozart, Wolfgang Amadeus • Amadeus Mozart • Mozart

Pattern Learning (cont.) • Specify ways in which Q term and A term can be specified in text • Easy to do for BIRTHDATE • Not so for Q types like DEFINITION • Many acceptable answers, all answers need to be used to ensure high confidence in precision

Pattern Learning (cont.) • Process (tokenize, smooth whitespace, remove tags, etc.) • simplify input for egrep (or other regular expression tool) • Pass sentence through suffix tree constructor • finds substrings (and counts) of all lengths

Pattern Learning (cont.) • Example: • “The great composer Mozart (1756-1791) achieved fame at a young age” • “Mozart (1756-1791) was a genius” • “The whole world would always be indebted to the great music of Mozart (1756-1791)” • Longest matching substring for all 3 sentences is "Mozart (1756-1791)” • Suffix tree would extract "Mozart (1756-1791)" as an output, with score of 3

Pattern Learning (cont.) • Filter phrases in suffix tree • Keep phrases containing Q & A terms • Replace question term with <NAME> • Replace answer term with <ANSWER>

Pattern Learning (cont.) • Repeat with different examples of same question type • “Gandhi 1869”, “Newton 1642”, etc. • Some patterns learned for BIRTHDATE • a. born in <ANSWER>, <NAME> • b. <NAME> was born on <ANSWER> , • c. <NAME> ( <ANSWER> - • d. <NAME> ( <ANSWER> - )

Pattern Learning (last one!) • Strings partly overlapping (c & d) saved separately • Separate counts of occurrence frequencies • Can distinguish (in this case) between pattern for person still living (d) and more general pattern (c)

Calculate Precision • Submit query to AltaVista using only Q term ("Mozart") • Download top 1000 returned documents • Segment into sentences as in pattern learning algorithm • Keep sentences containing Q term

Calculate Precision (cont.) • For each pattern learned, check presence of pattern in sentence • pattern with <ANSWER> tag matched by any word • pattern with <ANSWER> tag matched by correct A term • Mozart was born in <ANY_WORD> • Mozart was born in 1756

Calculate Precision (cont.) • Calculate precision of each pattern • P = Ca/Co • Ca = total # of patterns w/answer term present • Co = total # of patterns w/answer term replaced by any word • Keep only patterns matching sufficient # of examples (e.g. >5)

Calculate Precision (cont.) • Obtain table of Regular Expression patterns • 1 table per question type • Precision of pattern • precision is probability pattern containing answer • principle of maximum likelihood estimation

Calculate Precision (cont.) • BIRTHDATE table: • 1.0 <NAME> ( <ANSWER> - ) • 0.85 <NAME> was born on <ANSWER>, • 0.6 <NAME> was born in <ANSWER> • 0.59 <NAME> was born <ANSWER> • 0.53 <ANSWER> <NAME> was born • 0.50 - <NAME> ( <ANSWER> • 0.36 <NAME> ( <ANSWER> -

Calculate Precision (cont.) • Good range of patterns obtained with as few as 10 examples • Rather long list difficult to come up with manually • Largest number of examples the system required to get a good range of patterns?

Calculate Precision (cont.) • Precision of patterns learned from one QA-pair calculated for other examples of same question type • Helps eliminate dubious patterns • Contents of two or more sites are the same • Same document appears in search engine output for learning & precision stages

Finding Answers • To new questions! • Use existing QA system (Hovy et al., 2002b;2001) • Determine type of new question • Identify Question term

Finding Answers (cont.) • Create query from Q term & do IR • use answer document corpus such as TREC-10 or web search • Segment returned documents into sentences & process as before • Replace Q term by Q tag • e.g. <NAME> in case of BIRTHYEAR type

Finding Answers (cont.) • Using pattern table developed for Q type, search for presence of each pattern • Select words matching <ANSWER> as potential answer • Sort answers by pattern's precision scores • Discard duplicate answers (string compare) • Return top 5

Experiments • 6 different Q types • from Webclopedia QA Typology (Hovy et al., 2002a) • BIRTHDATE • LOCATION • INVENTOR • DISCOVERER • DEFINITION • WHY-FAMOUS

Experiments (cont.) • (BIRTHYEAR - previously shown) • INVENTOR • 1.0 <ANSWER> invents <NAME> • 1.0 the <NAME> was invented by <ANSWER> • 1.0 <ANSWER> invented the <NAME> in • all have precision of 1.0

Experiments (cont.) • DISCOVERER • 1.0 when <ANSWER> discovered <NAME> • 1.0 <ANSWER>'s discovery of <NAME> • 0.9 <NAME> was discovered by <ANSWER> in • DEFINITION • 1.0 <NAME> and related <ANSWER> • 1.0 form of <ANSWER>, <NAME> • 0.94 as <NAME>, <ANSWER> and

Experiments (cont.) • WHY-FAMOUS • 1.0 <ANSWER> <NAME> called • 1.0 laureate <ANSWER> <NAME> • 0.71 <NAME> is the <ANSWER> of • LOCATION • 1.0 <ANSWER>'s <NAME> • 1.0 regional : <ANSWER> : <NAME> • 0.92 near <NAME> in <ANSWER>

Experiments (cont.) • For each Q type, extract questions from TREC-10 set • Run through testing phase (precision) • Two sets of experiments

Experiments (cont.) • Set one • TREC corpus is input • IR done by IR component of their QA system (Lin, 2002) • Set two • Web is input • IR performed by AltaVista

Results • Measured by Mean Reciprocal Rank (?) • TREC • Question type # of Q's MRR • BIRTHYEAR 8 0.48 • INVENTOR 6 0.17 • DISCOVERER 4 0.13 • DEFINITION 102 0.34 • WHY-FAMOUS 3 0.33 • LOCATION 16 0.75

Results (cont.) • Web • Q type # of Q’s MRR • BIRTHYEAR 8 0.69 • INVENTOR 6 0.58 • DISCOVERER 4 0.88 • DEFINITION 102 0.39 • WHY-FAMOUS 3 0.00 • LOCATION 16 0.86

Results (cont.) • System performs better on web data than on TREC corpus • Abundant web data makes it easier for system to locate answers with high precision scores • TREC corpus does not have enough candidate answers with high precision score • must settle for answers from low precision patterns • WHY-FAMOUS exception - may be due to small # of test Q's

Shortcomings & Extensions • Need for POS &/or semantic types • "Where are the Rocky Mountains?” • "Denver's new airport, topped with white fiberglass cones in imitation of the Rocky Mountains in the background , continues to lie empty” • <NAME> in <ANSWER> • NE tagger &/or ontology could enable system to determine "background" is not a location

Shortcomings... (cont.) • DEFINITION Q's - match term too general, though correct technically • "What is nepotism?” • <ANSWER>, <NAME> • "...in the form of widespread bureaucratic abuses: graft, nepotism...” • "What is sonar?” • <NAME> and related <ANSWER> • "...while its sonar and related underseas systems are built...”

Shortcomings... (cont.) • Long distance dependencies • "Where is London?” • "London, which has one of the most busiest airports in the world, lies on the banks of the river Thames” • would require pattern like:<QUESTION>, (<any_word>)*, lies on <ANSWER> • Abundance & variety of Web data helps system to find an instance of patterns w/o losing answers to long distance dependencies

Shortcomings... (cont.) • More info in patterns regarding length of expected answer phrase • Searches in range of 50 bytes of answer phrase to capture pattern • fails under some conditions • "When was Lyndon B. Johnson born?” • "...lost to democratic Sen. Lyndon B. Johnson, who ran for both re-election and the vice presidency” • <NAME> <ANSWER> -

Shortcomings... (cont.) • Lacks info that <ANSWER> in this case should be exactly replaced by 1 word • Could extend system to search for answer in range of 1-2 chunks • basic English phrases, NP, VP, PP, etc.

Shortcomings... (cont.) • System doesn't work for Q types requiring multiple words from question to be in answer • "In which county does the city of Long Beach lie?” • "Long Beach is situated in Los Angeles County” • required pattern:<QUESTION_TERM_1> situated in <ANSWER> <QUESTION_TERM_2>

Shortcomings... (cont.) • Performance of system depends greatly on having only 1 anchor word • Multiple anchor points • would help eliminate candidate answers • require all anchor words be present in candidate answer sentence

Shortcomings... (cont.) • Does not use case • "What is micron?” • "...a spokesman for Micron, a maker of semiconductors, said SIMMs are..." • If Micron had been capitalized in question, would be a perfect answer

Shortcomings... (cont.) • Canonicalization of words • BIRTHDATE for Gandhi: 1869; Oct. 2, 1869; 2nd October 1869; October 2 1869; 02 October 1869; etc. • Use date tagger to cluster all variations and tag with same term • Extend idea to smooth out variations in Q term for names: Gandhi, Mahatma Gandhi, Mohandas Karamchand Gandhi, etc.

Learning Surface Text Patterns for a Question Answering System