Computational Models of Discourse Analysis

Computational Models of Discourse Analysis Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute

Warm Up Discussion • Look at the analysis I have passed out • Note: inscribed sentiment is underlined and invoked sentiment is italicized and relatively frequent words that appear in either of these types of expressions have been marked in bold • Do you see any sarcastic comments here? What if any connection do you see between sentiment and sarcasm? • Keeping in mind the style of templates you read about, do you see any snippets of text in these examples that you think would make good templates?

Patterns I see • Inscribed sentiment • About 24% of words in underlined segments are relatively high frequency • 3 “useful” patterns out of 18 underlined portions • Examples: • Like • Good • More CW than CW • Invoked sentiment • About 39% of words were relatively high frequency • About 7 possibly useful patterns out of 17, but only 3 really look unambigous • Examples • CW like one • CW the CW of the CW • Makes little CW to CW to the CW • CW and CW of an CW • Like CW on CW • Leave you CW a little CW and CW • CW more like a CW

Unit 3 Plan • 3 papers we will discuss all give ideas for using context (at different grain sizes) • Local patterns without syntax • Using bootstrapping • Local patterns with syntax • Using a parser • Rhetorical patterns within documents • Using a statistical modeling technique • The first two papers introduce techniques that could feasibly be used in your Unit 3 assignment

Student Comment: Point of Discussion • To improve performance language technologies seem to approach the task in either one of two ways. First of all approaches attempt to generate a better abstract model that provides the translation mechanism between a string of terms (sentence) and our human mental model of sentiment in language. Alternatively some start with a baseline and try to find a corpus or dictionary of terms that provides evidence for sentiment. • Please clarify

Connection between Appraisal and Sarcasm • Student Comment: I’m not exactly sure how one would go about applying appraisal theory to something as elusive as sarcasm. A sarcastic example of invoked negative sentiment from Martin and White, p 72

Inscribed versus Invoked • Do we see signposts that tell us how to interpret invoked appraisals?

Overview of Approach • Start with small amount of labeled data • Generate patterns from examples • Select those that appear in training data more than once and don’t appear both in a 1 and a 5 labeled example • Expand data through search using examples from labeled data as queries (take top 50 snippet results) • Represent data in terms of templatized patterns • Modified kNN classification approach How could you do this with SIDE? 1 Build a feature extractor to generate the set of patterns 2 Use search to set up expanded set of data 3 Apply generated patterns to expanded set of data 4 Use kNN classification

Pattern Generation • Classify words into high frequency (HFW) versus content words (CW) • HFWs occur at least 100 times per million words • CW occur no more than 1000 times per million words • Also add [product], [company], [title] as additional HFWs • Constraints on patterns: • 2-6 HFWs, 1-6 slots for CWs, patterns start and end with HFWs Would Appraisal theory suggest other categories of words?

Expand Data: “Great for Insomniacs…” What could they have done instead?

Pattern Selection • Approach: Select those that appear in training data more than once and don’t appear both in a 1 and a 5 labeled example • Could have used an attribute selection technique like Chi-squared attribute evaluation • What do you see as the trade-offs between these approaches?

Representing Data as a Vector • Most of the features were from the generated patterns • Also included punctuation based features • Number of !, number of ?, number of quotes, number capitalized words • What other features would you use? • What modifications to feature weights would you propose?

Modified kNN • Is there a simpler approach? Weighted average so majority class matches count more.

Evaluation Baseline technique: Count as positive examples those that have a highly negative star rating but lots of positive words. Is this really a strong baseline? Look at the examples from the paper. • I am …rather wary of the effectiveness of their approach because it seems that they cherry picked a heuristic ‘star-sentiment’ baseline to compare their results to in table 3 but do not offer a similar baseline for table 2.

Evaluation • What do you conclude from this? • What surprises you?

Revisit: Overview of Approach • Start with small amount of labeled data • Generate patterns from examples • Select those that appear in training data more than once and don’t appear both in a 1 and a 5 labeled example • Expand data through search using examples from labeled data as queries (take top 50 snippet results) • Represent data in terms of templatized patterns • Modified kNN classification approach How could you do this with SIDE? 1 Build a feature extractor to generate the set of patterns 2 Use search to set up expanded set of data 3 Apply generated patterns to expanded set of data 4 Use kNN classification

What would it take to achieve inter-rater reliability? • You can find definitions and examples on the website, just like in the book, but it’s not enough… • Strategies • Simplify – are there distinctions that don’t buy us much anyway? • Add constraints • Identify boarderline cases • Use decision trees

What would it take to achieve inter-rater reliability? • Look at Beka and Elijah’s analyses in comparison with mine • What were our big disagreements? • How would we resolve them?

Questions?

Computational Models of Discourse Analysis