Towards an integrated scheme for semantic annotation of multimodal dialogue data

Towards an integrated scheme for semantic annotation of multimodal dialoguedata Volha Petukhova and Harry Bunt

Motivation Several corpora with multimodal data transcriptions: AMI meeting corpus (http://www.amiproject.org); IFA Dialog Video corpus (http://www.fon.hum.uva.nl/ IFA-SpokenLanguageCorpora/IFADVcorpus); ISL meeting corpus (Burger et al., 2002)

Motivation • coding schemes for the analysis of nonverbal actions in terms of behavioural low-level features: • Facial Action Coding System (FACS) ; • Ham-NoSys • coding schemes of semantic andpragmatic information in visual expressions: • SmartKom Coding scheme(Steininger, 2001) • DIME-DAMSL(Pineta et al., 2005) • MUMIN annotation scheme (Allwood etal., 2004)

Motivation • the majority of these schemes are designed for a particularpurpose and are used solely by their creators (Dybkjær and Bernsen, 2002) • The AAMASworkshop ‘Towards a StandardMarkup Language forEmbodied Dialogue Acts’ in 2008 and 2009 • ISO project 24617-2 “Semantic annotationframework, Part 2: Dialogue acts”

Exploratory annotation study • DIT++ dialogue act annotation scheme (http://dit.uvt.nl/) • incorporates theoretical and empirical findings from otherapproaches (Petukhova & Bunt, 2009c and Bunt & Schiffrin, 2007) • describe not only task-oriented communicative actions,but also actions related to other communicative dimensions: Task, Auto-Feedback, Allo-Feedback, Turn Management, Time Management, Contact Management, Discourse Structuring, Social Obligation Management, Own Communication Management, Partner Communication Management • contains open classes, allowing suitable additions of those communicative functions which are specific for a certain modalityoffer flexible segmentationstrategies

Exploratory annotation study • Corpus material and annotations: two scenario-based dialogues witha total duration of 51 minutes from the AMI corpus • Tool: ANVIL (http://www.dfki.de/˜kipp/anvil) • Two annotations studies: (1) using only speech transcription and sound; (2) using speech transcription, sound and video provided with transcriptions of nonverbal signals (gaze, head, facial expression, posture orientation and hand movements).

Exploratory annotation study Transcriptions: • Verbal elements: manually produced orthographic transcriptions for each speaker, including word-level timings • Non-verbal elements: gaze direction; head movements; hand and arm gestures; eyebrow, eyes and lips movements; posture shifts; features: • form of movement (head: nod, shake, jerk; hands: pointing, shoulder-shrug, etc.; eyes: narrow, widen; lips: pout, compress, purse, flatten, (half)open, random moves); • direction (up, down, left, right, backward, forward); • trajectory (line, circle, arch); • size (large, small, medium, extra large); • speed (slow, medium, fast); • number of repetitions (up to 20 times); • FTO: difference between time that turn starts and moment that previous turn ends; • Duration Overall kappa = .76

Exploratory annotation study compared the annotations with respect to the numberand nature of • functional segments identified; (2) communicativefunctions altered; (3) communicative functionsspecified; and (4) communicative functions assigned to singlefunctional segments.

Results Nonverbal communicative behaviourmay serve fourpurposes: • emphasizing or articulating the semantic content of dialogueacts; • emphasizing or supporting the communicative functionsof synchronous verbal behaviour; • performing separate dialogue acts in parallel to whatis contributed by the partner; • expressing a separate communicative function in parallelto what the same speaker is expressing verbally.

Results • Full-fledged dialogue acts (20% new segments): • Feedback acts (68.5%): positive (65.3%), negative (3.2%): • Time Management (24.8%) • Turn Management(4.7%) • Discourse Structuring(2%)

Results are reflected in a significant majority of annotation schemes: We analyzed 18 well-known dialogue act annotation schemes: DAMSL, SWBD-DAMSL, LIRICS, DIT++, MRDA, Coconut, Verbmobil, HCRC MapTask, Linlin, TRAINS, AMI, SLSA, Alparon, C-Star, Primula, Matis, Chiba and SPAAC Feedback is not defined only in Linlin and Primula; Turn management acts are not defined in HCRC MapTask, Verbmobil, Linlin, Alparon and C-Star; Discourse Structuring is not defined in TRAINS and Alparon; and Time Management is not defined in MRDA, HCRC MapTask, Linlin, Maltus, Primula and Chiba.

Results • Communicative function alteration andspecification: • adjustment of the level of feedback (understanding vs agreement) • express degree of certainty about the validity of the proposition • reveal speaker’s attitude towards the addressee(-s), towards the content of what he is saying, or towards the actions he is considering to perform • signal speaker’s emotionalor cognitive state (Pavelin (2002): modalizers)

Results • Communicative function alteration andspecification: - no existing dialogue act annotationscheme deals with this type of information Proposal: a setof qualifiers that can be attached to communicative functionin order to describe the speaker’s behaviour more accurately

Results

Results • Multifunctionality in multimodal utterances: • A verbal functional segment has on average 1.3 communicativefunctions (also confirmed in Bunt, 2009) • multimodal segment has 1.4 functions on average • very often concerned with feedback and other interaction management dimensions such as Own Communication Management and Time Management; Task and Turn Management; Task and Discourse Structuring; Task and Allo-Feedback • Dialogue act taxonomies that take the multifunctionality of utterances into account such as DIT++, LIRICS, DAMSL, MRDA and Coconut, known as multidimensional dialogue act annotation schemes

Results Articulating semantic content(about 39%): They are relating tothe propositional or referential meaning of an utterance For example deictic gestures: wording:Press this little presentation hand: ........point................. pure semantic acts, as a rule do not have a communicativefunction on their own

Conclusions • Multidimensional schemes could be used for annotation of multimodal data (such as DIT++, LIRICS, DAMSL, MRDA and Coconut) • Extension is needed: with respect to uncertainty, speaker’s attitude and speaker’s emotions • Proposal: • Amused Suggestion’ and ’Uncertain Answer’ (undesirable) • better to have qualifiers that can be attached to communicative function

Towards an integrated scheme for semantic annotation of multimodal dialogue data

Towards an integrated scheme for semantic annotation of multimodal dialogue data

Presentation Transcript

Semantic Annotation of Corpora

Semantic Annotation – Week 3

Semantic annotation of a dialog corpus

Survey of Semantic Annotation Platforms

Towards an integrated Address System

Multimodal Dialogue Analysis

An Integrated Annotation DB in OntoNotes

Semantic Annotation for Semantic Indexing

RightField The Semantic Annotation of Experimental Data using Spreadsheets,

Towards an Understanding of Semantic Prosody

Semantic Annotation for Interlingual Representation of Mulilingual Texts

Towards an Integrated Funding Model

Data collection and Multimodal Annotation Tools

Semantic Web - Multimedia Annotation –

Semantic Annotation of Deep Web Resources

Dialogue Annotation at SUNY

Semantic Annotation in SALSA

An overview of EMMA—Extensible MultiModal Annotation

Towards multimodal dialogue games

Multimodal Semantic Indexing for Image Retrieval

CycL Semantic Annotation

Integrated Annotation for Biomedical IE