Multi-Perspective Question Answering - PowerPoint PPT Presentation

Multi perspective question answering l.jpg
Download
1 / 65

Multi-Perspective Question Answering ARDA NRRC Summer 2002 Workshop Janyce Wiebe Eric Breck Chris Buckley Claire Cardie Paul Davis Bruce Fraser Diane Litman David Pierce Ellen Riloff Theresa Wilson Participants Finding and organizing opinions in the world press and other text Problem

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Multi-Perspective Question Answering

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Multi perspective question answering l.jpg

Multi-Perspective Question Answering

ARDA NRRC Summer 2002 Workshop


Participants l.jpg

Janyce Wiebe

Eric Breck

Chris Buckley

Claire Cardie

Paul Davis

Bruce Fraser

Diane Litman

David Pierce

Ellen Riloff

Theresa Wilson

Participants


Problem l.jpg

Finding and organizing opinions in the world press and other text

Problem


Our work will support l.jpg

Our Work will Support

  • Finding a range of opinions expressed on a particular topic, event, issue

  • Clustering opinions and their sources

    • Attitude (positive, negative, uncertain)

    • Basis for opinion (supporting beliefs, experiences)

    • Expressive style (sarcastic, vehement, neutral)

  • Building perspective profiles of individuals and groups over many documents and topics


Task annotation l.jpg

Manual annotation scheme for linguistic expressions of opinions

“It is heresy,” said Cao. “The `Shouters’ claim

they are biggerthan Jesus.”

Task: Annotation

(writer,Cao)

(writer,Cao,Shouters)

(writer,Cao)

(writer,Cao)


Task annotation6 l.jpg

(writer,FM)

(writer,FM,FM)

(writer,FM)

(writer,FM,FM,SD)

(writer,FM)

(writer,FM)

Task: Annotation

The Foreign Ministry said Thursday that it was “surprised, to put it mildly”

by the U.S. State Department’s criticism of Russia’s human rights

record and objected in particular to the “odious” section on Chechnya.


Task conceptualization l.jpg

Task: Conceptualization

  • Various ways perspective is manifested in language

  • Implications for higher-level tasks


Task automate manual annotations l.jpg

Task: Automate Manual Annotations

  • Machine learning

  • Identification of opinionated phrases, sources of opinions, …


Task organizing perspective segments l.jpg

Task: Organizing Perspective Segments

  • Unsupervised clustering

  • Text features + features from the annotation scheme + higher-level features


Solution architecture l.jpg

Solution Architecture

Annotation Architecture

AnnotationTool

Learning Architecture

LearningAlgorithms

Trained Taggers

Application Architecture

DocumentRetrieval

PerspectiveTagging

SegmentClustering

Question

Other Taggers


Evaluation l.jpg

Evaluation

  • Exploratory manual clustering

  • Evaluation of automatic annotations against manual annotations

  • End-user evaluation of how well the system groups text segments into clusters of similar opinions about a given topic

  • Development of other end-user evaluation tasks


Example l.jpg

Example

The Annual Human Rights Report of the US State Department has been strongly criticized and condemned by many countries. Though the report has been made public for 10 days, its contents, which are inaccurate and lacking good will, continue to be commented on by the world media.

Many countries in Asia, Europe, Africa, and Latin America have rejected the content of the US Human Rights Report, calling it a brazen distortion of the situation, a wrongful and illegitimate move, and an interference in the internal affairs of other countries.

Recently, the Information Office of the Chinese People's Congress released a report on human rights in the United States in 2001, criticizing violations of human rights there. The report quoting data from the Christian Science Monitor, points out that the murder rate in the United States is 5.5 per 100,000 people. In the United States, torture and pressure to confess crime is common. Many people have been sentenced to death for crime they did not commit as a result of an unjust legal system. More than 12 million children are living below the poverty line. According to the report, one American woman is beaten every 15 seconds. Evidence show that human rights violations in the United States have been ignored for many years.


Example13 l.jpg

<writer>:fact

<writer> neg-attitude the report

<writer>:fact

Many countries in Asia, Europe, Africa, and Latin America neg-attitude the content

of the US Human Rights Report neg-attitude it

<writer>:fact

a report on

human rights in the United States in 2001 neg-attitude there

<writer>:fact

the report:fact

<writer>:fact

<writer>:subjective

<writer>:fact

<writer>:fact

the report:fact

<writer>:subjective

Example

The Annual Human Rights Report of the US State Department has been strongly criticized and condemned by many countries. Though the report has been made public for 10 days, its contents, which are inaccurate and lacking good will, continue to be commented on by the world media.

Many countries in Asia, Europe, Africa, and Latin America have rejected the content of the US Human Rights Report, calling it a brazen distortion of the situation, a wrongful and illegitimate move, and an interference in the internal affairs of other countries.

Recently, the Information Office of the Chinese People's Congress released a report on human rights in the United States in 2001, criticizing violations of human rights there. The report quoting data from the Christian Science Monitor, points out that the murder rate in the United States is 5.5 per 100,000 people. In the United States, torture and pressure to confess crime is common. Many people have been sentenced to death for crime they did not commit as a result of an unjust legal system. More than 12 million children are living below the poverty line. According to the report, one American woman is beaten every 15 seconds. Evidence show that human rights violations in the United States have been ignored for many years.


Example14 l.jpg

neg-attitude

<writer>:

<many countries>:

<Chinese HR report>:

<HR report>

subjectivity index: 4/10=40%

subjectivity index: 2/2=100%

subjectivity index: 1/3=33%

(medium)

expressive style: medium

expressive style: extreme

expressive style: medium

neg-attitude

(strong)

(medium)

<USA>

Example

neg-attitude


Support the following l.jpg

Support the following…

  • Describe the collective perspective w.r.t. issue/object presented in an individual article, across a set of articles, …

  • Describe the perspective of a particular writer/individual/government/news service w.r.t. issue/object in an individual article, across a set of articles, …

  • Create a perspective profile for agents, groups, news sources, etc.


Outline l.jpg

Outline

  • Annotation: Wiebe & Wilson

  • Conceptualization: Davis

  • Architecture: Pierce

  • End-user evaluation: Buckley


Annotation l.jpg

Annotation

  • Find opinions, evaluations, emotions, speculations (private states) expressed in language


Annotation18 l.jpg

Annotation

  • Explicit mentions of private states and speech events

    • The United States fears a spill-over from the anti-terrorist campaign

  • Expressive subjective elements

    • The part of the US human rights report about China is full of absurdities and fabrications.


Annotation19 l.jpg

Nested sources

(writer,Xirao-Nima,US)

(writer,Xirao-Nima)

“The report is full of absurdities,’’ he continued.

Annotation

“The US fears a spill-over’’, said Xirao-Nima, a professor

of foreign affairs at the central university for nationalities.


Annotation20 l.jpg

Annotation

  • Whether opinions or other private states are expressed in speech

  • Type of private state (negative evaluation, positive evaluation, …)

  • Object of positive or negative evaluation

  • Strengths of expressive elements and private states


Example21 l.jpg

“It is heresy,” said Cao. “The `Shouters’ claim

they are bigger than Jesus.”

(writer,Cao)

(writer,Cao,Shouters)

(writer,Cao)

(writer,Cao)

Example


Example22 l.jpg

(writer,FM)

(writer,FM,FM)

(writer,FM)

(writer,FM,FM,SD)

(writer,FM)

(writer,FM)

Example

The Foreign Ministry said Thursday that it was “surprised, to put it mildly”

by the U.S. State Department’s criticism of Russia’s human rights

record and objected in particular to the “odious” section on Chechnya.


Accomplishments l.jpg

Accomplishments

  • Fairly mature annotation scheme and instructions

  • Representation supporting manual annotation using GATE (Sheffield)

  • Annotation corpus

  • Significant training of 3 annotators

  • Participants understand the annotation scheme


Sample gate annotation l.jpg

Sample Gate Annotation


Conceptualization l.jpg

Conceptualization

  • Ideology, emotions, and opinions are reflected in language

  • Language gives us a means to track and assess perspective

  • Goal: create document to support workshop annotation and experiments, and to extend to future applications


Conceptualization part i theoretical background l.jpg

Conceptualization Part I:Theoretical Background

  • Types of perspective: attitudes (subjectivity), spatial, temporal, sociological, etc.

  • Focuses on subjectivity expressed linguistically (e.g., opinions: criticized an unfair election, emotions: applaudedthe election, speculations: probably will be elected)


Conceptualization subjectivity theoretical background continued l.jpg

Conceptualization: SubjectivityTheoretical Background (continued)

  • Sources have Attitudes about Objects: (writer, criticizes, election)

  • An ontology of attitudes leading to different types of private states (distinctions can range from identification, to positive and negative, to more fine-grained: reliability, source, assessment, necessity, etc.)

    This theoretical background informs the annotation strategy, experiments, and extensions


Conceptualization part ii looking to higher levels larger segments l.jpg

Conceptualization Part II:Looking to higher levels: larger segments

  • Subjectivity beyond the immediate occurrence of the segment:

    • sentence and paragraph level

    • document level

    • discourse and topic level


Conceptualization part iii looking forward applications l.jpg

Conceptualization Part III:Looking forward: applications

  • Track perspective over time (identify changes)

  • Identify ideology (subjective expressions taken as a unit may approximate ideology)

  • Cluster agents with similar ideologies

    (similar expressions of opinions may help group those on the same side)

  • Infer ideology from limited expressions of perspective (some subjectivity for a source may suggest opinions on other topics)


Architecture overview l.jpg

Architecture Overview

  • Solution architecture includes:

    • Application Architecture

      • supports high-level QA task

    • Annotation Architecture

      • supports document annotation

    • Learning Architecture

      • supports development of low- and mid-level system components via machine learning


Solution architecture31 l.jpg

Solution Architecture

AnnotationArchitecture

annotateddocuments

LearningArchitecture

automaticannotators

ApplicationArchitecture


Solution architecture32 l.jpg

Solution Architecture

Annotation Architecture

AnnotationTool

Learning Architecture

LearningAlgorithms

Trained Taggers

Application Architecture

DocumentRetrieval

PerspectiveTagging

DocumentClustering

Question

Other Taggers


Application architecture l.jpg

Application Architecture

Multi-perspective Classifiers

Document

Clustering

Documents

Annotation Database

Gate NE

CASS

Feature Generators


Annotation components l.jpg

Annotation Components

  • GATE’s ANNIE or MITRE Alembic

    • Tokenization, sentence-finding

    • Part-of-speech tagging

    • Name finding

    • Coreference resolution

  • CASS partial parser

  • SMART IR engine

  • Feature Generators


Learning architecture l.jpg

Learning Architecture

Evaluation

Training

Data

Weka Learner

Weka Learner

Annotation Database

Gate NE

CASS

Feature Generators


Learning tasks l.jpg

Learning: Tasks

  • Identify subjective phrases

  • Identify nested sources

  • Discriminate Facts and Views

  • Classify Opinion Strength


Learning features l.jpg

Learning: Features

  • Name recognition

  • Syntactic features

  • Lists of words

  • Contextual features

  • Density


Annotation architecture l.jpg

Annotation Architecture

TopicDocuments

GateAnnotationTool

HumanAnnotators

Gate XML

MPQADatabase


Annotation tool gate l.jpg

Annotation Tool (GATE)

  • Move headers and original markup to standoff annotation database

  • Initialize document annotations

    • Initial sources and speech events

  • Verify human annotations

    • Check id existence

    • Check attribute consistency


Data formats l.jpg

Data Formats

  • Gate XML Format

    • standoff

    • structured

  • MPQA Annotation Format

    • standoff

    • flat

  • Machine Learning Formats (e.g., ARFF)


Gate xml format l.jpg

Gate XML Format

<AnnotationType=“expressive-subjectivity”

StartNode=“215” EndNode=“228”>

<Feature>

<Name>strength</Name>

<Value>low</Value>

</Feature>

<Feature>

<Name>source</Name>

<Value>w,foreign-ministry</Value>

</Feature>

</Annotation>


Mpqa annotation format l.jpg

MPQA Annotation Format

idspantypenamecontent

42215,228stringMPQA-agentid=“foreign-ministry”


End user evaluation goal l.jpg

End-User Evaluation: Goal

  • Establish framework for evaluating tasks that would be of direct interest to analyst users

  • Do an example evaluation


Manual clustering l.jpg

Manual Clustering

  • Human exploratory effort

  • MPQA participants manually cluster documents from 1-2 topics

  • Analyze basis for cluster


User task topic l.jpg

User Task: Topic

  • U1: User states topic of interest and interacts with IR system

  • S1: System retrieves set of “relevant” documents along with their perspective annotations


Example topic l.jpg

Example: Topic

  • U1: 2002 election in Zimbabwe

  • S1: System returns

    • 03.47.06-11142 Mugabe confident of victory in

    • 04.33.07-17094 Mugabe victory leaves West in

    • 05.22.13-11526 Mugabe says he is wide awake

    • 06.21.57-1967 Mugabe predicts victory

    • 06.37.20-8125 Major deployment of troops

    • 06.47.23-22498 Zambia hails results


User task question l.jpg

User Task: Question

  • U2: User states particular perspective question on topic.

  • Question should

    • identify source type (eg, governments, individuals, writers) of interest.

    • Be a yes/no (or pro/con) question for now


Example question l.jpg

Example: Question

  • Give Range of perspective: national government,groups of governments:

  • Was the election process fair, valid, and free of voter intimidation?


User task question response l.jpg

User Task: Question Response

  • S2:System clusters documents

    • based on question,text,annotations

    • goal:group together documents with same answer and perspective (including expressive content).

    • System,for now, does not attempt to label each group with specific answers.

  • Target a small number of clusters (4?)


Example question response l.jpg

Example:Question Response

  • Cluster 1 <keywords>

    • 07.20.20-11694

    • 08.12.40-1611

    • 08.15.19-23507

    • 09.35.06-27851

    • 13.10.41-18948

  • Cluster 2 <keywords>

    • 12.08.27-27397

    • 13.44.36-19236

    • 04.33.07-17094

    • 05.22.13-11526

  • Cluster 3 <keywords>

    • 06.47.23-22498

    • 06.51.18-1222

    • 06.56.31-3120

    • 07.16.31-13271


User task cluster feature l.jpg

User Task: Cluster Feature

  • U3: User states constraints on clustered documents or segments.

    • These might be geographic, date, ideological, political, religous

  • S3: System shows subclusters or highlighted documents


Example cluster by features l.jpg

Example: Cluster by Features

  • U3: Highlight governments by regions

  • S3: System shows docs with African governments opinions in red, North American in blue, European in green, Asian in purple. Multicolored if docs have more than one source


User task results l.jpg

User Task: Results

  • U4: User gets impression (visual or statistical) whether constraints match clusters.

  • Easy visualization of exceptions


Example results l.jpg

Example: Results

  • User sees that the

    • Red docs (African) are mostly in one cluster,

    • Blue and green (NA and EU) in another

    • Purple docs are scattered in both clusters.


Document collection l.jpg

Document Collection

  • Large collection of 270,000 foreign news documents from June, 2001 to May, 2002

  • Almost all FBIS documents with a small number of other relevant docs.

  • From MITRE MiTAP system


Document collection features l.jpg

Document Collection Features

  • English Language

    • 60% FBIS translated

    • 40% source English

  • 20% TV/Radio

  • 5% Identified as editorials


Slide57 l.jpg

  • From day Tue Jan 22 20:13:06 2002

  • Received: from smtpsrv1.mitre.org …

  • From: FBIS@fbis.org

  • Date: 21 Jan 2002 00:00:00 (EST)

  • Subject: Vietnam Calls for Broader …

  • <?xml version="1.0"?>

  • <!DOCTYPE document

  • <document media_file="sep20020122000034n.html" media_type="text" scribe="Rough'n'Ready v1.1" title="Vietnam Calls for Broader Environmental Protection at ASEM Conference in Beijing" document_time="2002-01-21" create_time="2002-01-22T20:13:05" source="Worldwide Open-source News" description="Hanoi Voice of Vietnam News " reference="SEP20020122000034 Hanoi Voice of Vietnam News WWW-Text in Vietnamese 21 Jan 02">

  • <region>East Asia</region>

  • <region>China</region>

  • <subregion>Southeast Asia</subregion>

  • <subregion>China</subregion>

  • <country>Vietnam</country>

  • <country>China</country>

  • <section section_id="1">

  • <topics><topic>ENVIRONMENT</topic><topic>HEALTH</topic></topics>

  • <turn>

  • Vietnamese Minister of Science, Technology and Environment Chu Tuan Nha told the first Asia-Europe Meeting [ASEM] Environment Ministers' Meeting [ASEM EnMM] in Beijing recently that Vietnam always values environmental protection, including prevention of pollution or degradation, bio-diversity protection, and improvement of the environment in industrial zones and in both urban and rural areas.


Sample pure text l.jpg

Sample Pure Text

  • Vietnamese Minister of Science, Technology and Environment Chu Tuan Nha told the first Asia-Europe Meeting [ASEM] Environment Ministers' Meeting [ASEM EnMM] in Beijing recently that Vietnam always values environmental protection, including prevention of pollution or degradation, bio-diversity protection, and improvement of the environment in industrial zones and in both urban and rural areas.


Sample meta annotation l.jpg

Sample Meta-annotation

  • 10,0stringmeta_media_filesep20020122000034n.html

  • 20,0stringmeta_media_typetext

  • 30,0stringmeta_scribeRough'n'Ready v1.1

  • 40,0stringmeta_titleVietnam Calls for Broader Environmental Protection at ASEM Conference in Beijing

  • 50,0stringmeta_document_time2002-01-21

  • 60,0stringmeta_create_time2002-01-22T20:13:05

  • 70,0stringmeta_sourceWorldwide Open-source News

  • 80,0stringmeta_descriptionHanoi Voice of Vietnam News

  • 90,0stringmeta_referenceSEP20020122000034 Hanoi Voice of Vietnam News WWW-Text in Vietnamese 21 Jan 02

  • 10 0,0stringmeta_regionEast Asia

  • 11 0,0stringmeta_regionChina

  • 12 0,0stringmeta_subregionSoutheast Asia

  • 13 0,0stringmeta_subregionChina

  • 14 0,0stringmeta_countryVietnam

  • 15 0,0stringmeta_countryChina

  • 16 0,0stringmeta_topicENVIRONMENT

  • 17 0,0stringmeta_topicHEALTH


Topics l.jpg

Topics

  • About 12 Topic statements.

    • Clause or Sentence

  • 25-50 known relevant docs per topic, with manual perspective annotations.

  • 1-5 Questions per topic


Questions l.jpg

Questions

  • Type of Perspective

    • range of perspective,

    • strongly felt perspective,

    • identify all perspective

  • Issue

    • Direct information

    • Opinion evidence

  • Constraints (pinpoint discrepencies)


Evaluation on topic question l.jpg

Evaluation on Topic/Question

  • Artificially construct 75 doc retrieved set

    • Include the known (25-50) rel docs

    • Add top retrieved docs from SMART

  • System automatically annotates set

  • System clusters based on annotation.


Evaluation cont l.jpg

Evaluation (cont)

  • Evaluate homogeneity of clusters. Compare with

    • Base Case 1: Cluster docs into same number of clusters without any annotations

    • Base Case 2: Cluster docs into same number of clusters based on manual annotations.


Evaluation within workshop l.jpg

Evaluation Within Workshop

  • Evaluation through S2 only

    • No constraints, subclusters

  • Yes/No (Pro/Con) questions only


Current status l.jpg

Current Status

  • Document collection prepared, indexed

  • 8 topics (more coming)

    • 16 questions total

    • 10-40 rel docs per topic (more coming)


  • Login