Dialogue structure and pronoun resolution
Download
1 / 27

dialogue structure and pronoun resolution - PowerPoint PPT Presentation


  • 287 Views
  • Uploaded on

Dialogue Structure and Pronoun Resolution . Joel Tetreault and James Allen University of Rochester Department of Computer Science DAARC September 23, 2004. WELCOME TO DAARC!!!. Reference in Spoken Dialogue. Resolving anaphoric expressions correctly is critical in task-oriented domains

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'dialogue structure and pronoun resolution' - Anita


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Dialogue structure and pronoun resolution l.jpg

Dialogue Structure and Pronoun Resolution

Joel Tetreault and James Allen

University of Rochester

Department of Computer Science

DAARC

September 23, 2004



Reference in spoken dialogue l.jpg
Reference in Spoken Dialogue

  • Resolving anaphoric expressions correctly is critical in task-oriented domains

    • Makes conversation easier for humans

  • Reference resolution module provides feedback to other components in system

    • Ie. Incremental Parsing, Interpretation Module

  • Investigate how to improve RRM:

    • Discourse Structure could be effective in reducing search space of antecedents and improving accuracy (Grosz and Sidner, 1986)

    • Paucity of empirical work: Byron and Stent (1998), Eckert and Strube (2001), Byron (2002)


Slide4 l.jpg
Goal

  • To evaluate whether shallow approaches to dialogue structure can improve a reference resolution algorithm (LRC used as baseline model to augment)

  • Investigated two models:

    • Eckert &Strube (manual and automatic versions)

    • “Literal QUD” model (manual)


Outline l.jpg
Outline

  • Background

    • Dialogue Act synchronization (Eckert and Strube model)

    • QUD (Craige Roberts)

  • Monroe Corpus

  • Algorithm

  • Results

    • 3rd person pronoun evaluation

    • Dialogue Structure

  • Summary


Past approaches in structure and reference l.jpg
Past approaches in structure and reference

  • Veins: the nuclei of RST trees are the most salient discourse units, the entities in these units are this more salient than others

  • Tetreault (2003): Penn Treebank subset annotated with RST. Used G&S approximations to try to improve on LRC baseline.

    • Result: performed the same as baseline

    • Veins: decreased performance slightly

  • Problem: fine-grained approaches (RST) are difficult to annotate reliably and do in real-time.

  • Perhaps shallow approaches can work?


Literal qud l.jpg
literal QUD

  • Questions Under Discussion (Craige Roberts, Jonathan Ginzburg) – “what are we talking about?”: topics create discourse segments

  • Literally: questions or modals can be viewed as creating a discourse segment

  • Result – questions provide a shallow discourse structuring, and that maybe enough to improve performance, especially in a task-oriented domain

  • Entities in QUD main segment can be viewed as the topic

  • Segment closed when question is answered (use ack sequences, change in entities used)

  • only entities from answer and entities in question are accessible

  • Can be used in TRIPS to reduce search space of entities – set context size


Qud annotation scheme l.jpg
QUD Annotation Scheme

  • Annotate:

    • Start utterance

    • End utterance

    • Type (aside, repeated question, unanswered, open-ended, clarification)

  • Kappa (compared with reconciled data):


Example qud l.jpg
Example - QUD

utt06 U: Where is it?

utt07 U: Just a second

utt08 U: I can't find the Rochester airport

utt09 S: It's

--------------------------------------------------------

utt10 U: I think I have a disability with maps

utt11 U: Have I ever told you that before

utt12 S: It's located on brooks avenue

utt13 U: Oh thank you

utt14 S: Do you see it?

utt15 U: Yes

(QUD-entry

:start utt06

:end utt13

:type clarification)

(QUD-entry

:start utt10

:end utt11

:type aside)


Example qud utt10 11 processed l.jpg
Example - QUD (utt10-11 processed)

utt06 U: Where is it?

utt07 U: Just a second

utt08 U: I can't find the Rochester airport

utt09 S: It's

[utt10,11 removed]

--------------------------------------------------------

utt12 S: It's located on brooks avenue

utt13 U: Oh thank you

utt14 S: Do you see it?

utt15 U: Yes

(QUD-entry

:start utt06

:end utt13

:type clarification)

(QUD-entry

:start utt10

:end utt11

:type aside)


Example qud s13 processed l.jpg
Example - QUD (s13 processed)

[utt06-13 collapsed: {the Rochester airport, brooks avenue}]

--------------------------------------------------------

utt14 S: Do you see it?

utt15 U: Yes

(QUD-entry

:start utt06

:end utt13

:type clarification)


Qud issues l.jpg
QUD Issues

  • Issue 1: easy to detect Q’s (use Speech-Act information), but how do you know Q is answered?

  • Cue words, multiple acknowledgements, changes in entities discussed provide strong clues that question is finishing, but general questions such as “how are we going to do this?” can be ambiguous

  • Issue 2: what is more salient to a QUD pronoun – the QUD topic or a more recent entity?


Dialogue act segmentation l.jpg
Dialogue Act Segmentation

  • E&S: model to resolve all types of pronouns (3rd person and abstract) in spoken dialogue

  • Intuition: grounding is very important in spoken dialogue

  • Utterances that are not acknowledged by the listener may not be in common ground and thus not accessible to pronominal reference


Dialogue act segmentation14 l.jpg
Dialogue Act Segmentation

  • Each utterance marked as

    • (I): contains content (initiation), question

    • (A): acknowledgment

    • (C): combination of the above

    • (N): none of the above

  • Basic algorithm: utterances not ack’d or not in a string of I’s are removed from the discourse before next sentence is processed

  • Evaluation showed improvement for pronouns referring to abstract entities, and strong annotator reliability

  • Pronoun performance? Unclear, no comparison of measure without using DA model


Example da model l.jpg
Example – DA model

(I)

(N)

(I)

(N)

(I)

(I)

(I)

(A)

(I)

(A)

utt06 U: Where is it?

utt07 U: Just a second

utt08 U: I can't find the Rochester airport

utt09 S: It's

utt10 U: I think I have a disability with maps (removed)

utt11 U: Have I ever told you that before

utt12 S: It's located on brooks avenue

utt13 U: Oh thank you

utt14 S: Do you see it?

utt15 U: Yes


Parsing monroe domain l.jpg
Parsing Monroe Domain

  • Domain: Monroe Corpus of 20 transcriptions (Stent, 2001) of human subjects collaborating on Emergency Rescue 911 tasks

  • Each dialogue was at least 10 minutes long, and most were over 300 utterances long

  • Work presented here focuses on 5 of the dialogues (1756 utterances) (278 3rd person pronouns)

  • Goals: develop a corpus of sentences parsed with rich syntactic, semantic, discourse information to

  • Able to parse 5 dialogue sub-corpus with 84% accuracy

  • More details see ACL Discourse Annotation ‘04


Trips parser l.jpg
TRIPS Parser

  • Broad-coverage, deep parser

  • Uses bottom-up algorithm with CFG and domain independent ontology combined with a domain model

  • Flat, unscoped LF with events and labeled semantic roles based on FrameNet

  • Semantic information for noun phrases based on EuroWordNet


Parser information for reference l.jpg
Parser information for Reference

  • Rich parser output is helpful for discourse annotation and reference resolution:

    • Referring expressions identified (pronoun, NP, impros)

    • Verb roles and temporal information (tense, aspect) identified

    • Noun phrases have semantic information associated with them

    • Speech act information (question, acknowledgment)

    • Discourse markers (so, but)

    • Semi-automatic annotation increases reliability


Semantics example an ambulance l.jpg
Semantics Example: “an ambulance”

  • (TERM :VAR V213818

    :LF (A V213818 (:* LF::LAND-VEHICLE W::AMBULANCE)

    :INPUT (AN AMBULANCE))

    :SEM ($ F::PHYS-OBJ

    (SPATIAL-ABSTRACTION SPATIAL-POINT)

    (GROUP -)

    (MOBILITY LAND-MOVABLE)

    (FORM ENCLOSURE)

    (ORIGIN ARTIFACT)

    (OBJECT-FUNCTION VEHICLE)

    (INTENTIONAL -)

    (INFORMATION -)

    (CONTAINER (OR + -))

    (TRAJECTORY -)))


Reference annotation l.jpg
Reference Annotation

  • Annotated dialogues for reference w/undergraduate researchers (created a Java Tool: PronounTool)

  • Markables determined by LF terms

  • Identification numbers determined by :VAR field of LF term

  • Used stand-off file to encode what each pronoun refers to (refers-to) and the relation between pronoun and antecedent (relation)

  • Post-processing phase assigns an unique identification number to coreference chains

  • Also annotated coreference between definite noun phrases


Reference annotation21 l.jpg
Reference Annotation

  • Used slightly modified MATE scheme: pronouns divided into the following types:

    • IDENTITY (Coreference) (278)

      • Includes set constructions (6)

    • FUNCTIONAL (20)

    • PROPOSITON/D.DEXEIS (41)

    • ACTION/EVENT (22)

    • INDEXICAL (417)

    • EXPLETIVE (97)

    • DIFFICULT (5)


Lrc algorithm l.jpg
LRC Algorithm

  • LRC: modified centering algorithm (Tetreault ’01) that does not use Cb or transitions, but keeps a Cf-list (history) for each utterance

  • While processing utterance’s entities (left to right) do:

    Push entity onto Cf-list-new, for a pronoun p, attempt to resolve:

    • Search through Cf-list-new (l-to-r) taking the first candidate that meets gender, agreement, and binding and semantic feature constraints.

    • If none found, search past utterance’s Cf-lists starting from previous utterance to beginning of discourse

    • When p is resolved, push pronoun with semantic features from antecedent on to Cf-list-new

  • More details see SemDial ‘04


  • Lrc algorithm with structure info l.jpg
    LRC Algorithm with Structure Info

    • Augmented algorithm with extensions to handle QUD and E&S input

    • For QUD, at the start and end of processing an utterance, QUD’s are started (pushed on stack) or ended (entities are collapsed), so Cf-list history changes

    • For E&S, each utterance is assigned a DA code and then removed or kept depending on the next utterance (if it is an acknowledgement, or a series of I’s)



    Error analysis l.jpg
    Error Analysis

    • Though QUD and +sem baseline performed the same (89 errors), they each got 3 pronouns right the other did not

    • Baseline:

      • 3 collapsing nodes removes correct antecedent

    • QUD:

      • 2 right associated with blocking off aside

      • 1 associated with collapsing (intervening nodes blocked)

    • 15 pronouns, both got wrong, but made different predictions

    • Remaining 71, both made same error


    Issues l.jpg
    Issues

    • Structuring methods are probably more trouble than they are worth with the corpora available right now

    • Also only affect a few pronouns

    • Segment ends are least reliable

      • What constitutes an end?

      • 3 errors show either boundaries are marked incorrectly if pronouns are accessing elements in a “closed” DS

      • Or perhaps collapsing routine is too harsh

    • Small corpus size

      • Hard to draw definite conclusions given only 3 criss-crossed errors

      • need more data for statistical evaluations


    Issues27 l.jpg
    Issues

    • E&S Model has advantage over QUD of being easiest to automate, but fares worse since it takes into account a small window of utterances (extremely shallow)

    • QUD model can be semi-automated (detecting question starts is easy) but detecting ends and type are harder

    • QUD could definitely be improved by taking into account plan initiations and suggestions, instead of limiting to questions only, but tradeoff is reliability


    ad