Scaling answer type detection to large hierarchies
Download
1 / 24

Scaling Answer Type Detection to Large Hierarchies - PowerPoint PPT Presentation


  • 414 Views
  • Uploaded on

Scaling Answer Type Detection to Large Hierarchies Kirk Roberts and Andrew Hickl {kirk,andy}@languagecomputer.com May 29, 2008 Introduction

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Scaling Answer Type Detection to Large Hierarchies' - bernad


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Scaling answer type detection to large hierarchies l.jpg

Scaling Answer Type Detection to Large Hierarchies

Kirk Roberts and Andrew Hickl{kirk,andy}@languagecomputer.com

May 29, 2008


Introduction l.jpg
Introduction

  • Work in factoid question-answering (Q/A) has long leveraged answer type detection(ATD) systems in order to identify the semantic class (or answer type) of the entities, words, or phrases most likely to correspond to the exact answer of a question.

Answer Type Hierarchy(ATH)

Who wears #23 for the Los Angeles Galaxy?

Human

Organization

Individual

Group

Actor

Artist

Award

Athlete

Baseball Player

SoccerPlayer

CricketPlayer


Introduction3 l.jpg
Introduction

  • Work in factoid question-answering (Q/A) has long leveraged answer type detection(ATD) systems in order to identify the semantic class (or answer type) of the entities, words, or phrases most likely to correspond to the exact answer of a question.

Answer Type Hierarchy(ATH)

Who wears #23 for the Los Angeles Galaxy?

Human

Organization

Individual

Group

Actor

Artist

Award

Athlete

Baseball Player

SoccerPlayer

CricketPlayer


Answer types and entity types l.jpg
Answer Types and Entity Types

  • While articulated ATHs clearly have value for question-answering applications, most work in ATD has been limited by the number of types recognized by current named entity recognition systems.

    • ACE Guidelines: ~35 entity types

    • Typical Commercial Offering: ~50 entity types

    • LCC’s CiceroLite™: > 350 entity types

  • But are more types really better? Or do they make for a tougher learning problem?


Four challenges l.jpg
Four Challenges

  • Challenge 1: Can we organize a large entity hierarchy into an answer type hierarchy?

  • Challenge 2: Can we reliably annotate questions with answer types?

  • Challenge 3: Can we learn models for performing fine-grained ATD?

  • Challenge 4: How do we incorporate fine-grained ATD into a Q/A system?


Challenge 1 creating an answer type hierarchy l.jpg
Challenge 1: Creating an Answer Type Hierarchy

  • First (published) Answer Type Hierarchy: (Li and Roth 2002, et seq.):

    • 2 Tiered Structure: 6 “Coarse”Answer Types, ~50 “Fine” Answer Types

LAND VEHICLE (7): Automobile, Truck, Mass Transport, Train, Military Vehicle, Industrial Vehicle

WATER VEHICLE (4): Ships, Submarines, Civilian Watercraft, Other Watercraft

AIR VEHICLE (4): Commercial Airliner, Military Plane, Other Aircraft, Blimp

SPACE VEHICLE (3): Spacecraft, Satellite, Fictional Spacecraft

  • Questions to answer:

    • Why not just use an entity hierarchy as the answer type hierarchy?

    • What are the right set of leaf nodes for an ATH?

    • What are the right set of non-terminals for an ATH?


Why not just use the entity hierarchy l.jpg
Why not just use the entity hierarchy?

  • Short answer: Entity hierarchies aren’t organized according to the potential information needs expressed by natural language questions.

  • Entity Types are semantic categories assigned to phrases found in text:

    • David Beckham was born on February 9, 1976. [ENTITY TYPE: DATE]

    • David Beckham was 33 years oldin 2008. [ENTITY TYPE: AGE]

    • David Beckham (1976 - ) plays for the LA Galaxy. [ENTITY TYPE: YEAR_RANGE]

    • David Beckham is one year older than Luis Figo. [ENTITY TYPE: RELATIVE_AGE]

    • David Beckham, 33, was scratched by Capello. [ENTITY TYPE: GENERIC_NUMBER]

    • David Beckham has been living for 33 years. [ENTITY TYPE: DURATION]

  • Answer Types are semantic categories sought by a question:

    • How old is David Beckham?

      • Answer Type is AGE, but valid entity types include:

        • AGE

        • RELATIVE_AGE

        • GENERIC_NUMBER

        • DURATION

        • DATE / YEAR_RANGE


Constructing an ath from an eth l.jpg
Constructing an ATH from an ETH

  • Step 1:Initialize.

    • Create the initial ATH as a direct clone of the existing ETH

  • Step 2: Consolidate Similar Nodes.

    • Combine similar nodes under “abstract” parent nodes corresponding to a possible Q-stem

      • SOCCER PLAYER, BASEBALL PLAYER, CRICKET PLAYER  ATHLETE (Which player?)

      • CITY, FACILITY, GEOPOLITICAL ENTITY  LOCATION (Where?)

      • POEM, BOOK, MOVIE, GOVERNMENT DOCUMENT  AUTHORED_WORK (What work?)

  • Step 3: Separate Existing Nodes into Subtypes.

    • Create multiple answer types for a single entity type when it belongs under different parents

      • AIRPORT  AIRPORT_LOC and AIRPORT_ORG

  • Step 4:Repeat (as necessary).

    • Perform Step 2 and Step 3 until all “merge-able” types are included in ATH


Resultant answer type hierarchy l.jpg
Resultant Answer Type Hierarchy

  • 11coarseanswer types (UIUC Hierarchy: 6)

    • human, location, numeric, abbreviation, entity, complex, work, temporal, title, contact-info, other-value*

  • 296fine types (UIUC Hierarchy: ~50)

    • Examples: casino, museum, city, country, state, actor, baseball player, military person, company, university, baseball team, island, planet, river, album, song, book, wrestler, soccer player, space location, moon, etc.

  • Average Depth: 3.8 levels

  • Average Number of “Sisters”: 4.2 nodes

* - Corresponds to UIUC Coarse Type


Four challenges10 l.jpg
Four Challenges

  • Challenge 1: Can we organize a large entity hierarchy into an answer type hierarchy?

  • Challenge 2: Can we reliably annotate questions with answer types?

  • Challenge 3: Can we learn models for performing fine-grained ATD?

  • Challenge 4: How do we incorporate fine-grained ATD into a Q/A system?


Annotation methodology l.jpg
Annotation Methodology

  • We experimented with three different annotation methodologies:

    • Method #1:One Pass

      • A traditional one-passannotation where the annotator assigned the final “fine” AT

    • Method #2: Two Passes

      • Annotators first select a “coarse” answer type, then select a “fine” answer type

    • Methodology #3: Multiple Passes

      • Annotators annotate each question according to each decision point in the hierarchy

      • Annotators can STOP annotation at any level in the hierarchy

SOCCER PLAYER

HUMAN

SOCCER PLAYER

ATHLETE

HUMAN

SOCCER PLAYER

INDIVIDUAL


One pass annotation l.jpg
One-Pass Annotation

Who is the owner of the Los Angeles Galaxy?

Coarse Types

Named

Value

Complex

Abbrev

Human

Location

Work

Other

Individual

Group

Organization

Fine Types

…..

Actor

Artist

Athlete

Coach

Writer


Two pass annotation l.jpg
Two-Pass Annotation

Who is the owner of the Los Angeles Galaxy?

Coarse Types

Named

Value

Complex

Abbrev

Human

Location

Work

Other

Individual

Group

Organization

Fine Types

…..

Actor

Artist

Athlete

Coach

Writer


Multi pass annotation l.jpg
Multi-Pass Annotation

Who is the owner of the Los Angeles Galaxy?

Named

Value

Complex

Abbrev

Human

Location

Work

Other

STOP

Individual

Group

Organization

…..

Actor

Artist

Athlete

Coach

Writer


Annotating questions l.jpg
Annotating Questions

  • Annotated a corpus of 10,000 questions using all three different annotation methods

  • UIUC Set: Factoid and Definition Questions (Li and Roth 2002)

    • How many villi are found in the small intestine?

  • Web Crawl Set: “What” questions taken from on-line FAQs

    • What is the e-mail address for the mayor of Miami?

  • Ferret Log: Factoid and Complex Questions taken from previous experiments with LCC’s Ferret question-answering system (Hickl et al. 2006)

    • What is the relationship between Iran and Hezbollah?

    • What power plants are in Baluchestan?

UIUC Train & Test

5,952

Web Crawl

3,485

FERRET Log

563

Total

10,000


Experimental methodology l.jpg
Experimental Methodology

  • Manually annotated 10,000 factoid questions

    • “Warm-up” Set: 1000 Questions (Method 1)

    • Method 2: 4000 Questions

    • Method 3: 4000 Questions

    • “Cool Down” Set: 1000 Questions (Method 1)

  • Each set annotated by 2 different pairs of annotators

  • Annotators tasked with annotating 1K questions per session

  • Differences between annotators resolved after each session; differences between annotator pairs were resolved after all questions were annotated

  • Average agreement between individuals per session:

Method 1

Method 2

Method 3

47.4%

(initial 1K)72.3%(final 1K)

86.2%(Coarse)

85.3%

84.7%

91.5%

79.4%(Fine)

97.0%

99.8%


Four challenges17 l.jpg
Four Challenges

  • Challenge 1: Can we organize a large entity hierarchy into an answer type hierarchy?

  • Challenge 2: Can we reliably annotate questions with answer types?

  • Challenge 3:Can we learn models for performing fine-grained ATD?

  • Challenge 4: How do we incorporate fine-grained ATD into a Q/A system?


Performing answer type detection l.jpg
Performing Answer Type Detection

  • Heuristic (Harabagiu et al. 2001, Harabagiu et al. 2002):

    • Used lexicosemantic features (e.g. WordNet synsets) to map between question terms and answer types

    • Performance dependent on number of synset mappings

  • “Flat” Classification:

    • Classifier used to directly map to one of n fine answer types

    • Performance degrades as n increases

  • (Pure) “Hierarchical” Classification: (Li and Roth 02, Das et al. 05, Hickl et al. 06)

    • Recursively identifies best “child” node for each answer type

    • Only the children of the current type are considered as outcomes at every branching point in the hierarchy

    • Proceeds until no more branching points, or until a STOP type has been selected

  • “Hierarchical” Classifier + Heuristics (Hickl et al. 07, Roberts & Hickl 08)

    • Uses classifier to identify best “child” node for selected sets of answer types

    • Uses heuristics to map to some terminal nodes

    • Proceeds until:

      • No more branching points

      • No heuristics available for mapping to finer types

      • A STOP type has been selected


Performance of atd l.jpg
Performance of ATD

  • Compared performance of 3 classification-based approaches:

    • “Flat” Classification

    • Pure “Hierarchical” Classification

    • “Hierarchical” Classification + Heuristics

  • All approaches trained / tested on same questions:

    • UIUC Hierarchy: 4000 train / 2000 test

    • LCC Hierarchy: 8000 train / 2000 test


Four challenges20 l.jpg
Four Challenges

  • Challenge 1: Can we organize a large entity hierarchy into an answer type hierarchy?

  • Challenge 2: Can we reliably annotate questions with answer types?

  • Challenge 3: Can we learn models for performing fine-grained ATD?

  • Challenge 4:How do we incorporate fine-grained ATD into a Q/A system?


Architecture of ferret l.jpg
Architecture of Ferret

  • We used a “baseline” version of LCC’s question-answering system, Ferret (Hickl et al. 2006) in order to evaluate the impact that an expanded ATH could have on Q/A performance.

DocumentRetrieval

Question Processing

ATD

Answer Extraction

Answer Ranking

Answer Validation

PassageRetrieval


Impact on question answering l.jpg
Impact on Question-Answering

  • Used Ferret on a set of 188 factoid questions taken from the past TREC QA evaluations which had known answer types in both the UIUC and LCC ATHs

    • Document Collection: AQUAINT-2 Newswire Corpus (2 GB)

    • Answers judged by hand based on TREC QA keys

    • Question considered to be answered correctly (“Top 1”) if valid answer returned in first position

    • Question considered answered correctly (“Top 5”) if answer returned in any of the top 5 answers returned by system


Conclusions l.jpg
Conclusions

  • Annotated a corpus of more than 10,000 factoid questions with appropriate “fine” answer types with nearly 90% inter-annotator agreement

  • Constructed classifier-based ATD models capable of associating questions with their appropriate answer type with nearly 90% accuracy

  • Incorporated new ATD system into a baseline Q/A system; showed improvement of more than 10% over system using previous ATH


Talk overview l.jpg
Talk Overview

  • Introduction

  • Four Challenges:

    • Challenge 1: Organization.Can we organize a large entity hierarchy into a workable answer type hierarchy?

    • Challenge 2: Annotation.Can we reliably annotate questions with fine-grained types from a large ATH?What’s the best way to perform annotation?

    • Challenge 3: Learning.Can we learn models for performing fine-grained ATD?How do they compare with current ATD models?

    • Challenge 4: Implementation.How do we incorporate ATD into a Q/A system (without sacrificing performance)?

  • Conclusions


ad