Computational Models of Discourse Analysis

Computational Models of Discourse Analysis Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute

Warm-Up • Look at the data and find places where you disagree with the tags. • Do the AMI tags “feel” like illocutionary acts to you? Why or why not? • How would you change the definitions of the codes to seem more like illocutionary acts?

Announcement • Please fill in the early course evaluation • http://www.surveymonkey.com/s/T3NCXWG

Chicken and Egg… Operationalization Computationalization

Reminder…

AMI Annotation Scheme

Student Comment • That being said, while looking through the conversations for this week I was struck by the fact that this scheme doesn't really cover anything related to roles or identities...or really context. The whole idea of 'figured worlds' and various discourses going on doesn't really exist. Granted, a similar thing could be said for speech acts and I'm having difficulty finding the words to distinguish these "dialogue acts" from speech acts. They seem rather similar to me.

Student Comment • I could not find a lot of evidence to suggest that a dialog act is synonymous with a speech act, but I can imagine that this system makes it a lot easier to find speech acts because the scope of data is now reduced to a sentence or long paragraph. For our purposes the resulting data would be useful to apply a further analysis to split the dialog acts into speech acts.

What is the goal of the DA coding scheme? • Provide information on the structure of the conversation • Capture speaker attitudes and intentions • Capture speaker roles • Capture level of involvement * If you just saw this description without having seen the coding manual or data, what would you expect about the relationship between their coding and what we have discussed from Gee, Martin & Rose, and Levinson?

Are DA tags really like illocutionary acts? * What’s the real distinction here?

Feature Extraction from Speech * Not used for DA recognition!

Factored Language Models • Language models predict the probability of sequences of words: P(w1,w2,w3,…wn) • A factorized language model computes this joint probability by multiplying the conditional probability for each word: P(wn|w1,w2,…wn-1)

Interpolated Factored Language Models • Rather than words a tokens, we can use feature bundles • previous word, DA, position in sequence (i.e., which block of 5 words) • Replace word level conditional probabilities with P(wn|wn-1,position,DA) • Interpolation allows us to simplify the model by dropping some of the complexity, thus making the models less sparse • i.e., each probability is computed based on more data, and is thus less idiosyncratic • Drop one feature in the bundle at a time in the order listed

Student Question • I can't quite tell from the paper, but it seems that they relied mostly on relatively plain word-level data - it isn't clear where or if the prosodic features were incorporated. • Not used for the DA recognition task as all (see p. 9)

Student Comment • As noted by the coding manual, the labels assigned to a given utterance are very dependent on the content of the utterances before them, and not just on the preceding label. • Was this information ever used?...

What does word position give us?Is it really just standing in for length?

Model Structure • Note that although this model jointly predicts segmentation and DAs, once the segmentation is done, the DA assignment is done again by a CRF classifier (exact results for DA classification not given) • Note that links that predict sequences of words, segmentation, and how all this is influenced by prosodic information can be enhanced with data from other corpora, but if those corpora aren’t annotated with the same DAs, then that data won’t help for predicting either sequences of DAs or association between words and DAs.

Results * Uni+POSbi+ SVM gives 49.1% error * What do we conclude from these results?

What about ordering information? (computed on one dialogue only) • Inform is the most common class (37.4%) • Next most frequent is Assess (18.5%) • With bigrams, if we look for conditional probabilities above 25% • The only case where the most likely next class is not Inform is Elicit-Assessment, which is followed by Assessment 36% of the time • It is followed by Inform 33% of the time • It only occurs about 1% of the time • Trigrams might be better, but this makes ordering information look pretty useless

Student Comment • Then they describe a whole bunch of things I don't understand and, after readings the summary and conclusion sections, I'm not even sure if these results are good compared to other work in this field, though they say their method improved recognition accuracy.

Student Comment • One of the things I found most confusing was the evaluation section. I maybe skimmed through too quickly, but I couldn't figure out:a) what kind of agreement do humans get in codingb) how well is the system performing compared to something which breaks up into acts randomly and assigns the a random category using one of the frequent categories?c) how well is the system doing compared to humansd) whether it was possible to ascertain which features were the most discriminating. And how various features were balanced (e.g does it make sense that probability according to the DA sequence model should be given the same weight as probability according to energy?)

Assignment 2 (not due til Feb23) • Look at the Maptask dataset and Negotiation coding that is provided • Think about what distinguishes the codes at a linguistic level • Do an error analysis on the dataset using a simple unigram baseline, and from that propose one or a few new types of features motivated by your linguistic understanding of the Negotiation framework • Due on Week 7 lecture 2 • Turn in data your feature extractors (documented code) and a formal write up of your experimentation • Have a 5 minute powerpoint presentation ready for class on Week 7 lecture 2

Questions?

Computational Models of Discourse Analysis