Integrating Robust Semantics, Event
This presentation is the property of its rightful owner.
Sponsored Links
1 / 33

Vasileios Hatzivassiloglou, Kathleen R. McKeown Columbia University PowerPoint PPT Presentation


  • 78 Views
  • Uploaded on
  • Presentation posted in: General

Integrating Robust Semantics, Event Detection , Information Fusion, and Summarization for Multimedia Question Answering. Vasileios Hatzivassiloglou, Kathleen R. McKeown Columbia University Dan Jurafsky, Wayne H. Ward, James H. Martin University of Colorado. Our Focus.

Download Presentation

Vasileios Hatzivassiloglou, Kathleen R. McKeown Columbia University

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Vasileios hatzivassiloglou kathleen r mckeown columbia university

Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering

Vasileios Hatzivassiloglou, Kathleen R. McKeown

Columbia University

Dan Jurafsky, Wayne H. Ward, James H. Martin

University of Colorado

AQUAINT One Year PI Meeting – December 2002


Our focus

Our Focus

  • Distinguish between questions answerable with

    • Unique facts (TREC-like)

    • Facts but not absolute facts; depend on

      • source; perspective; time

    • Opinions / subjective answers

    • Long answers

      • definitions; biographies; summaries

AQUAINT One Year PI Meeting – December 2002


Research goals

Research Goals

  • Technology for answering complex questions

    • Combine information from multiple sources

    • Combine information across events and time

    • Plan and generate answers

  • Domain independent semantic processing

    • Represent entities and relations in a general way

  • Dialogue interface to Q&A system

    • Context management

    • Clarification and follow-up

AQUAINT One Year PI Meeting – December 2002


Architecture

Specialized language model

Local collections, TREC

Semantic parser

MG

Recognized question

Answer extraction and combination

Spoken question

Question classification

Web

Speech recognition

Google

Query manager

Recognition feedback

Long answers

Information fusion

Answer strategy selector

Event detection

Context/dialog manager

Short answers

Typed question

Answer planning

Learned answer plans

Architecture

AQUAINT One Year PI Meeting – December 2002


Progress in the first year

Progress in the first year

  • Revised architecture and APIs

  • Integrated system implemented

  • System components prototyped

    • Baseline Q&A system for short answers

    • Semantic Parser

    • Event Detection

    • Definitions

    • Opinion Recognizer and Classifier (partially)

  • Questions and answers of different types collected

  • Participated in TREC and Definition and Opinion evaluations

AQUAINT One Year PI Meeting – December 2002


Goals for the next six months

Goals for the next six months

  • Complete opinion module

  • Use event information in answer planning

  • Add prototype module for biographies

  • Integrate semantic labels into answer analysis

  • Incorporate initial context management module

  • Process spoken questions

  • Use collected data on questions with multiple or long answers

AQUAINT One Year PI Meeting – December 2002


Two research presentations

Two Research Presentations

  • Producing answers for definition questions (Columbia)

  • Semantic parsing for answering complex questions (Colorado)

AQUAINT One Year PI Meeting – December 2002


Producing answers for definition questions

Producing Answers for Definition Questions

Department of Computer Science

Columbia University

Vasileios Hatzivassiloglou, Kathleen R. McKeown, Pablo A. Duboue, Elena Filatova, Sasha J. Blair-Goldensohn, Gabriel Illouz, Rebecca J. Passonneau, Andrew Hazen Schlaikjer, Hong Yu

AQUAINT One Year PI Meeting – December 2002


Target task

Target Task

  • “Encyclopedic” definitions of rich terms

    • What is the Hindu Kush?

    • What are radiofrequency weapons?

    • What is a Loya Jirga?

    • What is the Iraqi presidential guard?

    • What is anthrax?

AQUAINT One Year PI Meeting – December 2002


Target input target output

Target Input / Target Output

Output Definitional Answer:

Input Question: What is a Loya Jirga?

"Loya Jirga" is a Pashto phrase meaning grand council. It is a forum that is unique to Afghanistan in which, traditionally, tribal elders - Pashtuns, Tajiks, Hazaras and Uzbeks - have come together to settle affairs of the nation or rally behind a cause.

A Special Independent Commission for the Convening of the Loya Jirga (loya jirga commission"), required by the Bonn Agreement, was appointed in January. Its task was to establish rules and procedures for the loya jirga, define a process for the selection of delegates, and ensure the adequate representation of women, minorities, scholars, and representatives of civil society groups. The selection of the delegates for the loya jirga began on April 15.

If we analyse Afghan history, most of the great events, particularly the making of governments and the announcements of wars of independence , have been determined and happened because of Loya Jirga. Even their empires in the Subcontinent were established, maintained and replaced through the Jirgas, either the Loya Jirga or smaller ones compounded of the tribe of the king and other allied tribes.

IR Process

Doc 1

Doc 2

Doc 3

Doc

Doc n

AQUAINT One Year PI Meeting – December 2002


Overview of the approach

Overview of the Approach

  • Dynamically created definitions

    • Take advantage of new/evolving knowledge sources

    • Allow us to define new/evolving terms

    • Can be tailored to user model (e.g., expert vs. novice)

  • Predicate-based analysis and fusion

    • Information Extraction: Use strong cross-domain similarity in what types of information are “definitional”

    • Definition Presentation: use similarity-based summarization and fusion techniques to combine information from heterogeneous documents

AQUAINT One Year PI Meeting – December 2002


Current predicates

Current Predicates

Predicate

Example

Explicit Synonym

The Loya Jirga, or Grand Council, is usually held in a large open space such as a tent.

Etymology

Loya Jirga means "grand council" in Pashto, one of the country’s most widely spoken languages.

Genus, Species

A Loya Jirga is atraditional Afghan decision-makingassembly.

History

The loya jirga has served throughout history to legitimize government decisions in the eyes of the people.

Cause-Effect

The tradition of Loya Jirga is responsible for many of the important decisions in Afghan government.

Target Partition

(two instances)

The Loya Jirga of 1987was vastly more successful than the previous one, which was held in 1980.

AQUAINT One Year PI Meeting – December 2002


From predicates to definition

From Predicates to Definition

  • Automatically identify predicate instances in text

    • Surface Variation

    • Semantic Similarity

  • Compile predicates into a summary definition answer

    • Grouping, ordering

    • Fusion within predicate

      • E.g., fusing all “genus-species” sections with a common genus

AQUAINT One Year PI Meeting – December 2002


Identifying predicate instances in text

Genus

Species

forum

(in which), traditionally, tribal elders - Pashtuns, Tajiks, Hazaras … (Rule 1)

(that is) unique to Afghanistan (Rule 2)

Identifying Predicate Instances in Text

Question: What is a Loya Jirga?

IR process

Doc 1

Doc 2

Doc 3

  • Patterns for “Genus-Species”:

  • TERM is a GENUS in which SPECIES.

  • … TERM, a GENUSthat is SPECIES.

  • TERM is not only aGENUS, it is aSPECIESone at that.

Doc 6

Loya Jirga is a forum in which, traditionally, tribal elders - Pashtuns, Tajiks, Hazaras and Uzbeks - have come together to settle affairs of the nation or rally behind a cause.

In recent times, Loya Jirgas …

Doc 6

Loya Jirgais aforumin which, traditionally, tribal elders - Pashtuns, Tajiks, Hazaras and Uzbeks - have come together to settle affairs of the nation or rally behind a cause.

In recent times, Loya Jirgas …

Doc 13

You may not have heard of the Loya Jirga, aforumthat isunique to Afghanistan.

However, this fine document will explain…

Doc 13

You may not have heard of the Loya Jirga, a forum that is unique to Afghanistan.

However, this fine doc will explain…

Matches

AQUAINT One Year PI Meeting – December 2002


From predicate instances to summary

From Predicate Instances to Summary

Predicate Instance 1

  • Apply methods for:

  • Grouping

  • Ordering

  • Summary / Fusion

Predicate Instance 2

Predicate Instance n

Etymology

“Loya Jirga” is a Pashto phrase meaning grand council. It is a forum that is unique to Afghanistan in which, traditionally, tribal elders - Pashtuns, Tajiks, Hazaras and Uzbeks - have come together to settle affairs of the nation or rally behind a cause.

A Special Independent Commission for the Convening of the Loya Jirga (“loya jirga commission”), required by the Bonn Agreement, was appointed in January. Its task was to establish rules and procedures for the loya jirga, define a process for the selection of delegates, and ensure the adequate representation of women, minorities, scholars, and representatives of civil society groups. The selection of the delegates for the loya jirga began on April 15.

If we analyse Afghan history, most of the great events, particularly the making of governments and the announcements of wars of independence , have been determined and happened because of Loya Jirga. Even their empires in the Subcontinent were established, maintained and replaced through the Jirgas, either the Loya Jirga or smaller ones compounded of the tribe of the king and other allied tribes.

Doc 2

History

Doc m

Genus-Species

Doc n

Genus-Species

AQUAINT One Year PI Meeting – December 2002


Data collection

Data collection

  • Create a predicate set which has validity and relevance across term domains (status: working set complete)

  • Select training terms from several semantic domains and collect set of documents for each term (status: complete – approximately 50 terms)

  • Hand-mark in documents predicates for training (status: in progress)

AQUAINT One Year PI Meeting – December 2002


Implementation progress

Implementation Progress

  • Predicate identification

    • Use marked documents to learn patterns which identify predicates in text (status: in progress – currently focusing on “definitional” and “genus-species”)

  • Building summaries

    • Evaluate different methods of combining predicate information into summaries (status: currently only baseline method)

AQUAINT One Year PI Meeting – December 2002


Current system

Current System

  • Attempts to identify four individual predicates

    • Definitional: Large sections of “definitional text”

    • Genus-Species Phrase: Sentence containing both genus and species

    • Genus: Part of phrase indicating the “family” of the term

    • Species: Part of phrase indicating distinguishing characteristics of the term

  • Creates definition from predicate instances by baseline method

    • Simple presentation of Genus-Species information

    • Apply similarity-based summary techniques to remaining “definitional” block predicates

AQUAINT One Year PI Meeting – December 2002


A preliminary evaluation

A Preliminary Evaluation

  • 25 definitional/biography questions supplied by NIST (AQUAINT evaluation)

  • Produced answers between 1 and 32 sentences long (13.4 sentences on average)

  • Between 6% and 100% of the answer sentences are judged by us as relevant (average 53%)

  • On average, 15% of the relevant information is redundant

AQUAINT One Year PI Meeting – December 2002


Future directions

Future Directions

  • Identify larger predicate set

  • Evaluate different methods of combining predicate instances into definitions

  • Propagate predicate information to user interface of Q&A system

    • e.g., suggest query for “other terms with same genus”

AQUAINT One Year PI Meeting – December 2002


Semantic parsing for answering complex questions

Semantic Parsing for Answering Complex Questions

Center for Spoken Language ResearchUniversity of Colorado, Boulder

Dan Jurafsky, Wayne Ward, James Martin, Sameer Pradhan, Valerie Krugler,Steven Bethard, Ashley Thornton,Kadri Hacioglu, Honglin Sun, Huishin Tseng

AQUAINT One Year PI Meeting – December 2002


Uses of semantic annotation

Uses of Semantic Annotation

  • Question Classification

  • Document Re-ranking

  • Answer Extraction

  • Event Detection

  • Fusion

  • Opinion Questions

AQUAINT One Year PI Meeting – December 2002


Thematic roles

Thematic Roles

  • Currently there are 18

    • Agent

    • Cause

    • Degree

    • Experiencer

    • Force

    • Goal

    • Location

    • Manner

    • Path

– Patient

– Percept

– Proposition

– Result

– Source

– State

– Temporal

– Topic

– Null

AQUAINT One Year PI Meeting – December 2002


Parser improvements

Parser Improvements

  • Added backoff for statistics

    • Backoff from target to cluster

    • Ordered Frame Element Group stats to unordered

  • Combine constituents to remove ambiguity

    • Some constituents overlap

  • Train in additional FrameNet data

  • Performance on TREC data

    • Baseline: 35% Precision 38% Recall

    • Modified: 42% Precision 50% Recall

AQUAINT One Year PI Meeting – December 2002


Parse accuracy

Parse Accuracy

  • BNC Corpus

    • 80.4 % – Classification accuracy on known boundaries for frame specific roles

    • 82.1% for classification of thematic roles

    • Integrated boundary and labeling using FEGs

      • 70.1% – 74.0% (FE: Recall – Precision)

      • 61.2% – 64.6% (Labeled: Recall – Precision)

  • TREC Corpus

    • 42% – 50% (Recall – Precision) on Thematic Roles

AQUAINT One Year PI Meeting – December 2002


Trec error analysis

TREC Error Analysis

  • Areas that may have helped missed questions:

    • Semantic Expansion: 44.4%

    • Text Search Patterns: 22.2%

    • Shallow Semantic Parsing: 14.0%

    • Additional Named Entities: 9.9%

    • Improved IR Techniques: 9.4%

    • Deep Semantic Parsing/Inference: 5.8%

AQUAINT One Year PI Meeting – December 2002


Semantic expansion

Semantic Expansion

  • Pseudo-Feedback: Use Pseudo-Feedback to generate list of potential terms to add to query

  • WordNet: Use Wordnet to check each word in the query and the expansion list against all others for shared synset membership

  • Query Expansion: Add terms with shared synsets to the new, expanded query, along with other terms in the synset.

  • Benefits: This Feedback + Wordnet method provides two sources for potential expansion terms, while providing a check against each source.

AQUAINT One Year PI Meeting – December 2002


Thematic role search pattern

Thematic Role Search Pattern

  • Question (TREC-9): What is the purpose of a car bra?

  • Required Answer Type: RESULT

    • Initial search:

      • [instrument car bra] [result X]

    • Back-off search:

      • [instrument bra] [result X]

AQUAINT One Year PI Meeting – December 2002


Semantically parsed returns

Semantically Parsed Returns

  • The information we have so far:

    • TR Answer Type: RESULT

    • TR Search Patterns:

      • [instrument bra] [result X]

  • Semantic parse of correct return:

    • Made of radar-absorbing carbon fibers, [instrument the bra] [target enables] [agent a car] [result to fool police radar until it reaches close range, permitting drivers to spot the trap and slow down before their speed is clocked.]

AQUAINT One Year PI Meeting – December 2002


Current state of semantic parsing

Current State of Semantic Parsing

Most Significant Current Sources of Error:

  • Labeling TEMPORAL and LOCATION Thematic Roles

    • Potential Solution: Pre-tagging these Named Entities with Identifinder

    • Potential Problem: Identifinder will not have any knowledge of which NE’s apply to which targets

  • Non-Core Argument Roles

    • Potential Solution: Augment training data with more peripherally related thematic roles

    • Potential Problem: Augmenting the data by hand would be time consuming; augmenting the data automatically would be error-prone

AQUAINT One Year PI Meeting – December 2002


Additional training data

Additional Training Data

  • FrameNet - over 100,000 additional sentences

  • PropBank (Palmer et al)

    • WSJ portion of TreeBank 3, 1 million words, 3,000 verbs

    • Map PropBank roles to Thematic Roles

AQUAINT One Year PI Meeting – December 2002


Propbank

PropBank

  • Example: Acquire

    • Arg0: Agent, entity acquiring something

    • Arg1: Thing acquired

    • Arg2: Seller

    • Arg3: Price paid

    • Arg4: Benefactive

      New England Electric will acquire PS of New Hampshire

Arg0

Rel

Arg1

AQUAINT One Year PI Meeting – December 2002


Plans and milestones

Plans and Milestones

  • Expand coverage / accuracy for FrameNet parser

  • Develop HMM semantic parser

  • Statistical question classifier

  • Expand answer patterns

  • Spoken dialogue interface

  • Improved query expansion

AQUAINT One Year PI Meeting – December 2002


  • Login