Learning Adjective-Noun Selectional Preference Using Probabilistic Graphical Model

Learning Adjective-Noun Selectional Preference Using Probabilistic Graphical Model Debaleena Chattopadhyay MandeepSingh Grang CSE 507, Spring 2011

Outline • The Problem statement • The Prior Collection • The Max-Flow Model • Results • Conclusion CSE 507, Spring 2011

The Problem Statement To learn Selectional Preference of adjectives and use the knowledge towards word sense disambiguation. • I want a red pen to write. • Stay away from the red hot burners. • He likes to eat red meat. • Jones is looking at the fat guy. • He has a fat salary. CSE 507, Spring 2011

Related Work • Selectional Preference and Sense Disambiguation , Resnik (1997) • Word Sense Disambiguation of Adjectives. Using Probabilistic Networks. Chao et al. (2000) • Determinants of Adjective-Noun Plausibility, Lapata et al. (1999) • Evaluating and Combining Approaches to SelectionalPreference Acquisition, Brockmann et al. (2007) • Web-based WSD using Adjective-Noun pairs, Buscaldi, et al. • Explaining away ambiguity: learning verb selectional preference with Bayesian networks, Ciaramita et al. (2000) CSE 507, Spring 2011

Dataset • Training Dataset: (For Prior Collection) • Adjective – Verb and Adjective-Verb -Object tuples • Hand Labeled Description from ImageClef ( http://www.imageclef.org/) • Project Gutenberg eBooks (http://www.gutenberg.org/wiki/Main_Page) • Wiki Text (Using web-crawler ) • Google N-gram counts • Test Dataset: (For testing Final Model) • Adjective – Verb and Adjective-Verb -Object tuples • SemCor 3.0 • SenseVal 3.0 CSE 507, Spring 2011

The Prior Collection Training data: Sentences Parsed with a dependency Parser and stored as Adjective-Noun and Adjective-Noun -Verb tuples co-occurrence count. Learn selection preference using the data to obtain posterior marginal probabilities. Use WordNet Hierarchy to create a subnetwork for each adjective-object pair. CSE 507, Spring 2011

The Prior Collection • For an adjective there are different nouns occurring in the training data. • Each noun in Wordnet belongs to certain classes – general and specific e.g.: pen might belong to classes ‘Object’, ‘Writing Instrument’, ‘Small Things’ • For each noun in the data, query Wordnet to get its K most general classes.(hypernyms) • For prior collection we do not consider the path of the hypernyms of the nouns. • Compute priors for adjective-noun tuples and adjective-class tuples from the training data. • Also compute priors for verb-adjective-noun tuples and verb-adjective-class tuples from the training data. • Use a probabilistic model to calculate these priors.

The Prior Collection Naïve Bayes Model to get the P(noun|adj) • P(noun|adj) = P(adj|noun)P(noun)/ P(adj) • P(adj|noun) = P(adj|c1,c2,c3,..ck) = P(adj)P(c1|adj)P(c2|adj)P(c3|adj)…/ p(c1,c2,c3…) = P(adj)P(c1|adj)P(c2|adj)P(c3|adj)… Where the noun belongs to the set of classes C = {c1, c2,…cK} according to the Wordnethypernym structure. P(class|adj) = #(class,adj)/#(class) Similarly, Compute P(noun|adj,verb) = P(adj,verb|noun)P(noun)/P(adj,verb) = P(adj,verb|noun)/P(adj,verb) = P(adj|noun)P(verb|noun)/(P(adj)P(verb)) Assuming that adjective and verb are independent random variables = P(a|n)P(v|n) P(adj|noun) is calculated as above P(verb|noun) = P(verb|c1,c2,c3,…ck) = P(verb)P(c1|verb)P(c2|verb)P(c3|verb)

A Max-Flow Problem • Model the selectional preference as problem of maximum flow in a graphical network. • The adjective synsetis the ‘source’ • The different noun synsetsare the ‘sinks’ • The hypernyms at different granularities are the vertices of the graph. • The edges between vertices are the hypernym path • The weight on each vertex is its capacity which is computed as P(class|adjective) • Find the path which maximizes the flow from source to sink i.e. from the adjective to a noun-Synset sink. CSE 507, Spring 2011

An Example CSE 507, Spring 2011

Results Frequency Counts Enter an adjective: red • jersey , bricks, flowers, car, tee Enter an adjective: sullen • reproach, looks, gloomy Enter an adjective: dark • blue, brown, night, grey, boy, girl, eyes, room Naïve Bayes Probabilities Enter an adjective: red • ink , ornaments Enter an adjective: sullen • girl, life, sea, tone Enter an adjective: dark • weeks, damage, crimes CSE 507, Spring 2011

Results Naïve Bayes Probabilities with Enter an adjective: red Enter a verb: drank • Wine Enter an adjective: dark Enter a verb: see • time, book, path verb-adjective-noun tuples Enter an adjective: red Enter a verb: see • Lion, tongues Enter an adjective: dark Enter a verb: was • order, traditions, people CSE 507, Spring 2011

Results • Disambiguating word sense: CSE 507, Spring 2011

Evaluation • We extracted adjective-noun pairs and verb-adjective-noun pairs from test set. Created priors on them using Google N-gram dataset. • And built graphical model for adjectives to disambiguate the sense of a noun using the adjective’s Selectional preference. CSE 507, Spring 2011

Learning Adjective-Noun Selectional Preference Using Probabilistic Graphical Model