Unsupervised Learning of Narrative Event Chains

Unsupervised Learning of Narrative Event Chains Original paper by: Nate Chambers and Dan Jurafsky in ACL 2008 This presentation for discussion created by: Peter Clark (Jan 2009) Disclaimer: these slides are by Peter Clark, not the original authors, and thus represent a (possibly flawed) interpretation of the original work!

Why Scripts? • Essential for making sense of text • We typically match a narrative with expected “scripts” • to make sense of what’s happening • to fill in the gaps, fill in goals and purpose On November 26, the Japanese attack fleet of 33 warships and auxiliary craft, including 6 aircraft carriers, sailed from northern Japan for the Hawaiian Islands. It followed a route that took it far to the north of the normal shipping lanes. By early morning, December 7, 1941, the ships had reached their launch position, 230 miles north of Oahu. depart → travel → arrive

Scripts • Important/essential for NLP • But: expensive to build • Can we learn them from text? “John entered the restaurant. He sat down, and ordered a meal. He ate…” ? enter sit order eat

Our own (brief) attempt • Look at next events in 1GB corpus: "fly“ is followed by: ("fly" 362) ("say" 223) ("be" 179) ("have" 60) ("expect" 48) ("allow" 40) ("tell" 33) ("see" 30) ("go" 27) ("take" 27) ("make" 26) ("plan" 24) ("derive" 21) ("want" 19) ("schedule" 17) ("report" 16) ("declare" 15) ("give" 15) ("leave on" 15) "shoot“ is followed by: ("say" 121) ("be" 110) ("shoot" 103) ("wound" 58) ("kill" 30) ("die" 27) ("have" 23) ("tell" 23) ("fire" 15) ("refuse" 15) ("go" 13) ("think" 13) ("carry" 12) ("take" 12) ("come" 11) ("help" 10) ("run" 10) ("be arrested" 9) ("find" 9) "drive" is followed by: ("drive" 364) ("be" 354) ("say" 343) ("have" 71) ("continue" 47) ("see" 40) ("take" 32) ("make" 29) ("expect" 27) ("go" 24) ("show" 22) ("try" 19) ("tell" 18) ("think" 18) ("allow" 16) ("want" 15) ("come" 13) ("look" 13) ("close" 12) Some glimmers of hope, but not great… 

Andrew Gordon (2007) From gordon@ict.usc.edu Thu Sep 27 09:33:04 2007 …Recently I tried to apply language modeling techniques over event sequences in a billion words of narrative text extracted from Internet weblogs, and barely exceeded chance performance on some event-ordering evaluations….

Chambers and Jurafsky • Main insight: • Don’t look at all verbs, just look at those mentioning the “key player” – the protagonist – in the sequence • Capture some role relationships also: • Not just “push” → “fall”, but “push X” → “X fall” “An automatically learned Prosecution Chain. Arrows indicate the before relation.”

Approach • Stage 1: • find likelihood that one event+protagonist goes with another (or more) event+protagonist • NOTE: no ordering info • e.g., given: • “X pleaded”, what other event+protagonist occur with unusually high frequencies? • → “sentenced X”, “fined X”, “fired X” • Stage 2: • order the set of event+protagonist

The Training Data • Articles in the GigaWord corpus • For each article: • find all pairs of events (verbs) which have a shared argument • shared argument found by OpenNLP coreference • includes transitivity (X = Y, Y = Z, → X = Z) • add each pair to the database • events about John: {X enter, X sat, greet X} • events about the waiter: {X come, X greet} “John entered the restaurant. The waiter came over. John sat down, and the waiter greeted him…. X enter, X sat X enter, greet X X sat, greet X X come, X greet database of pairs cooccurring in the article

Stage 1 more generally:… Number(“X event1” AND “X event2”) Prob(“X event1” AND “X event2”) = Sumij Number(“X eventi” AND “X eventj”) PMI (“surprisingness”):… Prob(“X event1” AND “X event2”) PMI(“X event1”, “X event2”) = log Prob (“X event1”) Prob(“X event2”) = the “surprisingness” that the arg of event1 and event2 are coreferential • Given two events with a shared protagonist, do they occur “unusually often” in a corpus? “push X” & “X fall” number of times “push” and “fall” have been seen with these corefererring arguments probability of seeing “push” and “fall” with particular coreferring arguments = number of times any pair of verbs have been seen with any coreferring arguments

Can generalize: • PMI: given an event (+ arg), how “unusual” is it to see another event (+ same arg)? • Generalization: given N events (+ arg), how “unusual” to see another event (+ same arg)? • Thus: set

Evaluation: Cloze test • Fill in the blank… McCann threw two interceptions early. Toledo pulled McCann aside and told him he’d start. McCann quickly completed his first two passes. X throw pull X tell X X start X complete (note: a set, not list) ? pull X tell X X start X complete Cloze task: predict “?”

Results: • 69 articles, with >=5 protagonist+event in them • System produces ~9000 guesses at each “?”

Learning temporal ordering • Stage 1: add labels to corpus • Given: verb features (neighboring POS tags, neighboring axuiliaries and modals, WordNet synsets, etc.) • Assign: tense, grammatical aspect, aspectual class • [Aside: couldn’t a parser assign this directly?] • Using: SVM, trained on labeled data (TimeBank corpus) • Stage 2: learn before() classifier • Given: 2 events in a document sharing an argument • Assign: before() relation • Using: SVM, trained on labeled data (TimeBank expanded with transitivity rule • “X before Y and Y before Z → X before Z”) • A variety of features used, including whether e1 grammatically occurs before e2 in the text

Learning temporal ordering (cont) • Stage 3: • For all event pairs with shared arg in the main corpus • e.g., “push X”, “X fall” • count the number of before(e1,e2) vs. before(e2,e1) classifications, to get an overall ordering confidence

Evaluation • Test set: use same 69 documents • minus 6 which had no ordered events • Task: for each document a. manually label the before() relations b. generate a random ordering • Can system distinguish real from random order? • “Coherence” ≈ sum of confidences of before() labels on all event pairs in document • Confidence(e1→e2) = log(#before(e1,e2) - #before(e2,e1) # event+shared arg in doc: Not that impressive (?)

Agglomeration and scripts • How do we get scripts? • Could take a verb+arg, e.g., “arrest X” • Then look for the most likely 2nd verb+arg, eg “charge X” • Then the next most likely verb+arg, given these 2, eg “indict X” • etc. • Then: use ordering algorithm to produce ordering {arrest X} ↓ {arrest X, charge X} ↓ {arrest X, charge X, indict X} ↓ …

“Good” examples… • “Prosecution” (This was the initial seed. Agglomeration was stopped arbitrarily after 10 events, or when a cutoff for node inclusion was reached (whichever was first)).

Good examples… • “Employment” (dotted lines are incorrect “before” relations)

Nate Chambers’ suggested mode of use: • Given a set of events in a news article • Predict/fill in the missing events • → Do we really need scripts?

Many ways of referring to the same entity… • Less common style: • More common style: John went to a restaurant. John sat down. John ate. He paid… Nagumo's fleet assembled in the remote anchorage of Tankan Bay in the Kurile Islands and departed in strictest secrecy for Hawaii on 26 November 1941. The ships' route crossed the North Pacific and avoided normal shipping lanes. At dawn 7 December 1941, the Japanese task force had approached undetected to a point slightly more than 200 miles north of Oahu. Generally, there are a lot of entities doing a lot of things! From natec@stanford.edu Tue Dec 16 12:48:58 2008 …Even with the protagonist idea, it is still difficult to name the protagonist himself as many different terms are used. Naming the other non-protagonist roles is even more sparse. I'm experiencing the same difficulties. My personal thought is that we should not aim to fill the role with one term, but a set of weighted terms. This may be a set of related nouns, or even a set of unrelated nouns with their own preference weights.

Also: many ways of describing the same event! • Different levels of detail, different viewpoints: • The planes destroyed the ships • The planes dropped bombs, which destroyed the ships • The bombs exploded, destroying the ships • The Japanese destroyed the ships • Different granularities: • Planes attacked • Two waves of planes attacked • 353 dive-bombers and torpedo planes attacked

Summary • Exciting work! • simple but brilliant insight of “protagonist” • But • is really only a first step towards scripts • mainly learns verb+arg co-associations in a text • temporal ordering and agglomeration is a post-processing step • quality of learned results still questionable • Cloze: needs >1000 guesses before hitting a mentioned, co-associated verb+arg • nice “Prosecution” script: a special case as most verbs in script are necessarily specific to Prosecution? • fluidity of language use (multiple ways of viewing same scene, multiple ways of referring to same entity) still a challenge • maybe don’t need to reify scripts (?) • fill in missing (implied) events on the fly in context-sensitive way

Unsupervised Learning of Narrative Event Chains

Unsupervised Learning of Narrative Event Chains

Presentation Transcript

Unsupervised Learning: Clustering

Unsupervised Learning

Unsupervised learning (II)

Unsupervised Learning

Unsupervised Learning

Machine learning: Unsupervised learning

Unsupervised learning

Unsupervised Learning

Unsupervised Learning

Unsupervised learning Networks

Unsupervised Learning

Unsupervised Learning

Unsupervised learning

Unsupervised Learning

Unsupervised Learning Networks

Unsupervised Learning

Unsupervised learning

Unsupervised Learning

Unsupervised Learning

Unsupervised Learning