Scaling Answer Type Detection to Large Hierarchies

Scaling Answer Type Detection to Large Hierarchies Kirk Roberts and Andrew Hickl{kirk,andy}@languagecomputer.com May 29, 2008

Introduction • Work in factoid question-answering (Q/A) has long leveraged answer type detection(ATD) systems in order to identify the semantic class (or answer type) of the entities, words, or phrases most likely to correspond to the exact answer of a question. Answer Type Hierarchy(ATH) Who wears #23 for the Los Angeles Galaxy? Human Organization Individual Group Actor Artist Award Athlete Baseball Player SoccerPlayer CricketPlayer

Introduction • Work in factoid question-answering (Q/A) has long leveraged answer type detection(ATD) systems in order to identify the semantic class (or answer type) of the entities, words, or phrases most likely to correspond to the exact answer of a question.  Answer Type Hierarchy(ATH) Who wears #23 for the Los Angeles Galaxy?  Human Organization Individual Group  Actor Artist Award Athlete  Baseball Player SoccerPlayer CricketPlayer

Answer Types and Entity Types • While articulated ATHs clearly have value for question-answering applications, most work in ATD has been limited by the number of types recognized by current named entity recognition systems. • ACE Guidelines: ~35 entity types • Typical Commercial Offering: ~50 entity types • LCC’s CiceroLite™: > 350 entity types • But are more types really better? Or do they make for a tougher learning problem?

Four Challenges • Challenge 1: Can we organize a large entity hierarchy into an answer type hierarchy? • Challenge 2: Can we reliably annotate questions with answer types? • Challenge 3: Can we learn models for performing fine-grained ATD? • Challenge 4: How do we incorporate fine-grained ATD into a Q/A system?

Challenge 1: Creating an Answer Type Hierarchy • First (published) Answer Type Hierarchy: (Li and Roth 2002, et seq.): • 2 Tiered Structure: 6 “Coarse”Answer Types, ~50 “Fine” Answer Types LAND VEHICLE (7): Automobile, Truck, Mass Transport, Train, Military Vehicle, Industrial Vehicle WATER VEHICLE (4): Ships, Submarines, Civilian Watercraft, Other Watercraft AIR VEHICLE (4): Commercial Airliner, Military Plane, Other Aircraft, Blimp SPACE VEHICLE (3): Spacecraft, Satellite, Fictional Spacecraft • Questions to answer: • Why not just use an entity hierarchy as the answer type hierarchy? • What are the right set of leaf nodes for an ATH? • What are the right set of non-terminals for an ATH?

Why not just use the entity hierarchy? • Short answer: Entity hierarchies aren’t organized according to the potential information needs expressed by natural language questions. • Entity Types are semantic categories assigned to phrases found in text: • David Beckham was born on February 9, 1976. [ENTITY TYPE: DATE] • David Beckham was 33 years oldin 2008. [ENTITY TYPE: AGE] • David Beckham (1976 - ) plays for the LA Galaxy. [ENTITY TYPE: YEAR_RANGE] • David Beckham is one year older than Luis Figo. [ENTITY TYPE: RELATIVE_AGE] • David Beckham, 33, was scratched by Capello. [ENTITY TYPE: GENERIC_NUMBER] • David Beckham has been living for 33 years. [ENTITY TYPE: DURATION] • Answer Types are semantic categories sought by a question: • How old is David Beckham? • Answer Type is AGE, but valid entity types include: • AGE • RELATIVE_AGE • GENERIC_NUMBER • DURATION • DATE / YEAR_RANGE

Constructing an ATH from an ETH • Step 1:Initialize. • Create the initial ATH as a direct clone of the existing ETH • Step 2: Consolidate Similar Nodes. • Combine similar nodes under “abstract” parent nodes corresponding to a possible Q-stem • SOCCER PLAYER, BASEBALL PLAYER, CRICKET PLAYER  ATHLETE (Which player?) • CITY, FACILITY, GEOPOLITICAL ENTITY  LOCATION (Where?) • POEM, BOOK, MOVIE, GOVERNMENT DOCUMENT  AUTHORED_WORK (What work?) • Step 3: Separate Existing Nodes into Subtypes. • Create multiple answer types for a single entity type when it belongs under different parents • AIRPORT  AIRPORT_LOC and AIRPORT_ORG • Step 4:Repeat (as necessary). • Perform Step 2 and Step 3 until all “merge-able” types are included in ATH

Resultant Answer Type Hierarchy • 11coarseanswer types (UIUC Hierarchy: 6) • human, location, numeric, abbreviation, entity, complex, work, temporal, title, contact-info, other-value* • 296fine types (UIUC Hierarchy: ~50) • Examples: casino, museum, city, country, state, actor, baseball player, military person, company, university, baseball team, island, planet, river, album, song, book, wrestler, soccer player, space location, moon, etc. • Average Depth: 3.8 levels • Average Number of “Sisters”: 4.2 nodes * - Corresponds to UIUC Coarse Type

Four Challenges • Challenge 1: Can we organize a large entity hierarchy into an answer type hierarchy? • Challenge 2: Can we reliably annotate questions with answer types? • Challenge 3: Can we learn models for performing fine-grained ATD? • Challenge 4: How do we incorporate fine-grained ATD into a Q/A system?

Annotation Methodology • We experimented with three different annotation methodologies: • Method #1:One Pass • A traditional one-passannotation where the annotator assigned the final “fine” AT • Method #2: Two Passes • Annotators first select a “coarse” answer type, then select a “fine” answer type • Methodology #3: Multiple Passes • Annotators annotate each question according to each decision point in the hierarchy • Annotators can STOP annotation at any level in the hierarchy SOCCER PLAYER HUMAN SOCCER PLAYER ATHLETE HUMAN SOCCER PLAYER INDIVIDUAL

One-Pass Annotation Who is the owner of the Los Angeles Galaxy? Coarse Types Named Value Complex Abbrev Human Location Work Other Individual Group Organization Fine Types ….. Actor Artist Athlete Coach Writer

Two-Pass Annotation Who is the owner of the Los Angeles Galaxy? Coarse Types Named Value Complex Abbrev Human Location Work Other Individual Group Organization Fine Types ….. Actor Artist Athlete Coach Writer

Multi-Pass Annotation Who is the owner of the Los Angeles Galaxy? Named Value Complex Abbrev Human Location Work Other STOP Individual Group Organization ….. Actor Artist Athlete Coach Writer

Annotating Questions • Annotated a corpus of 10,000 questions using all three different annotation methods • UIUC Set: Factoid and Definition Questions (Li and Roth 2002) • How many villi are found in the small intestine? • Web Crawl Set: “What” questions taken from on-line FAQs • What is the e-mail address for the mayor of Miami? • Ferret Log: Factoid and Complex Questions taken from previous experiments with LCC’s Ferret question-answering system (Hickl et al. 2006) • What is the relationship between Iran and Hezbollah? • What power plants are in Baluchestan? UIUC Train & Test 5,952 Web Crawl 3,485 FERRET Log 563 Total 10,000

Experimental Methodology • Manually annotated 10,000 factoid questions • “Warm-up” Set: 1000 Questions (Method 1) • Method 2: 4000 Questions • Method 3: 4000 Questions • “Cool Down” Set: 1000 Questions (Method 1) • Each set annotated by 2 different pairs of annotators • Annotators tasked with annotating 1K questions per session • Differences between annotators resolved after each session; differences between annotator pairs were resolved after all questions were annotated • Average agreement between individuals per session: Method 1 Method 2 Method 3 47.4% (initial 1K)72.3%(final 1K) 86.2%(Coarse) 85.3% 84.7% 91.5% 79.4%(Fine) 97.0% 99.8%

Four Challenges • Challenge 1: Can we organize a large entity hierarchy into an answer type hierarchy? • Challenge 2: Can we reliably annotate questions with answer types? • Challenge 3:Can we learn models for performing fine-grained ATD? • Challenge 4: How do we incorporate fine-grained ATD into a Q/A system?

Performing Answer Type Detection • Heuristic (Harabagiu et al. 2001, Harabagiu et al. 2002): • Used lexicosemantic features (e.g. WordNet synsets) to map between question terms and answer types • Performance dependent on number of synset mappings • “Flat” Classification: • Classifier used to directly map to one of n fine answer types • Performance degrades as n increases • (Pure) “Hierarchical” Classification: (Li and Roth 02, Das et al. 05, Hickl et al. 06) • Recursively identifies best “child” node for each answer type • Only the children of the current type are considered as outcomes at every branching point in the hierarchy • Proceeds until no more branching points, or until a STOP type has been selected • “Hierarchical” Classifier + Heuristics (Hickl et al. 07, Roberts & Hickl 08) • Uses classifier to identify best “child” node for selected sets of answer types • Uses heuristics to map to some terminal nodes • Proceeds until: • No more branching points • No heuristics available for mapping to finer types • A STOP type has been selected

Performance of ATD • Compared performance of 3 classification-based approaches: • “Flat” Classification • Pure “Hierarchical” Classification • “Hierarchical” Classification + Heuristics • All approaches trained / tested on same questions: • UIUC Hierarchy: 4000 train / 2000 test • LCC Hierarchy: 8000 train / 2000 test

Four Challenges • Challenge 1: Can we organize a large entity hierarchy into an answer type hierarchy? • Challenge 2: Can we reliably annotate questions with answer types? • Challenge 3: Can we learn models for performing fine-grained ATD? • Challenge 4:How do we incorporate fine-grained ATD into a Q/A system?

Architecture of Ferret • We used a “baseline” version of LCC’s question-answering system, Ferret (Hickl et al. 2006) in order to evaluate the impact that an expanded ATH could have on Q/A performance. DocumentRetrieval Question Processing ATD Answer Extraction Answer Ranking Answer Validation PassageRetrieval

Impact on Question-Answering • Used Ferret on a set of 188 factoid questions taken from the past TREC QA evaluations which had known answer types in both the UIUC and LCC ATHs • Document Collection: AQUAINT-2 Newswire Corpus (2 GB) • Answers judged by hand based on TREC QA keys • Question considered to be answered correctly (“Top 1”) if valid answer returned in first position • Question considered answered correctly (“Top 5”) if answer returned in any of the top 5 answers returned by system

Conclusions • Annotated a corpus of more than 10,000 factoid questions with appropriate “fine” answer types with nearly 90% inter-annotator agreement • Constructed classifier-based ATD models capable of associating questions with their appropriate answer type with nearly 90% accuracy • Incorporated new ATD system into a baseline Q/A system; showed improvement of more than 10% over system using previous ATH

Talk Overview • Introduction • Four Challenges: • Challenge 1: Organization.Can we organize a large entity hierarchy into a workable answer type hierarchy? • Challenge 2: Annotation.Can we reliably annotate questions with fine-grained types from a large ATH?What’s the best way to perform annotation? • Challenge 3: Learning.Can we learn models for performing fine-grained ATD?How do they compare with current ATD models? • Challenge 4: Implementation.How do we incorporate ATD into a Q/A system (without sacrificing performance)? • Conclusions

Scaling Answer Type Detection to Large Hierarchies

Scaling Answer Type Detection to Large Hierarchies

Presentation Transcript

Hierarchies (Trees)

Super Scaling PROOF to very large clusters

Large-Scale Copy Detection

Memory hierarchies

Memory Hierarchies

Memory Hierarchies

SCALING ACTIVITY DISCOVERY AND RECOGNITION TO LARGE, COMPLEX DATASETS

Memory Hierarchies

Slide with Large Type Subtitle

Scaling in Biomolecular Solvation Are Proteins Large?

Short Answer type Questions:

Large Lump Detection by SVM

Type Hierarchies

BotGraph: Large Scale Spamming Botnet Detection

BotGraph: Large Scale Spamming Botnet Detection

Face Detection Using Large Margin Classifiers

Hierarchies

Protocol Hierarchies

Large Wood Dryer-Drum Type

A salient answer for any leakage detection

Memory hierarchies