Word Sense Disambiguation Another NLP working problem for learning with constraints… Lluís Màrquez

Word Sense Disambiguation Another NLP working problem for learning with constraints… Lluís Màrquez TALP, LSI, Technical University of Catalonia UIUC, June 10 2004

Word Sense Disambiguation • The problem • WSD is the problem of assigning the correct meaning to the words occurring in a text or discourse (sense tagging) • Example: “He was mad about stars at the age#1 of nine” “About 20,000 years ago the last ice age#2 ended” age#1: the length of time something (or someone) has existed age#2: a historic period • Origin in the beginning of AI (60’s) around first MT models • Renewed interest with the explosion of statistical and ML-based approaches to NLP (90’s)

Word Sense Disambiguation • Usual approaches • Supervised learning (ML): multiclass classification problem; “word-experts”. Results about 75% accuracy on subsets of selected polysemous words. Sometimes better (over 90%) on some specific words • “Unsupervised”, “knowledge-based” = heuristic rules based on preexisting knowledge sources (WorNet, MRDs, multilingual aligned corpora, etc.). Accuracy: around 60% (allwords WSD) • Combined approaches: 65% (allwords WSD) • Supervised methods are better but difficult to apply to “allwords” WSD

WSD: ML Approach • Usual Features: • Local context patterns (POS, words, lemmas) • the <age> of,<age> CD • <age> limit,mean <age> • Broad context features: Bag of (relevant) words • Atomicoccurs in the sentence • Darkoccurs in the sentence • Also syntactic features capturing predicate-argument relations

WSD: ML Approach • Main difficulties: • Each word is a classification problem => data scarceness • High granularity of sense repositories used => many classes • Difficulty in capturing the semantic information present in the context: words (sparseness problem) which are also ambiguous (no interactions between word-classifiers have been exploited).

WSD: Difficulties • Example (from WSJ) The jury further said in term end presentments that the City Executive Committee, which had over-all charge of the election, “deserves the praise and thanks of the City of Atlanta” for the manner in which the election was conducted.

WSD: Difficulties • Example (from WSJ, WordNet senses) The jury#NN#1 further#RB#2said#VB#1 in term#NN#2 end#NN#2 presentments#NN#1that the City_Executive_ Committee#1 , which had#VB#4 over-all#JJ#2 charge#NN#6 of the election#NN#1 , “ deserves#VB#1 the praise#NN#1 and thanks#NN#1 of the City_of_Atlanta#1 ” for the manner#NN#1 in which the election#NN#1 was conducted#VB#1 .

WSD: Difficulties • Example (from WSJ, WordNet senses) jury#NN#1 further#RB#2said#VB#1 term#NN#2 end#NN#2 presentments#NN#1 had#VB#4 over-all#JJ#2 charge#NN#6 election#NN#1 deserves#VB#1praise#NN#1 thanks#NN#1 manner#NN#1 election#NN#1 conducted#VB#1 .

WSD: Difficulties • Example (from WSJ, WordNet senses) The jury(2) further(5)said(11) in term(6) end(15) presentments(3)that the City_Executive_ Committee , which had(21) over-all(2) charge(15) of the election(2) , “ deserves the praise(2) and thanks(2) of the City_of_Atlanta ” for the manner(3) in which the election(2) was conducted(5) .

WSD: ML Approach • Utility? • Useful for IR / IE / Semantic parsing / Knowledge acquisition? • Accurately resolving WSD is more difficult that most of the NLP tasks for which is potentially helpful • Evaluation Exercises for WSD: Senseval-1/2/3 • Senseval-3 collocated with ACL-2004 • 2 major types of task: “lexical sample”, “allwords” • 10 different languages + 1 multilingual lexical sample task • Several new tasks: Automatic subcategorization acquisition, WSD of WordNet glosses, Semantic Roles (English and Swedish), Logic Forms, etc.

Word Sense Disambiguation • Our implication in Senseval-3 (TALP research group) • As organizers: • Lexical sample tasks for Catalan and Spanish: • Coarse sense dictionary developed for the tasks with additional information (collocations, examples, etc.) • Manual annotation of about 300 examples for 50 different words in each language. Context of 3 sentences. Also POS and lemma annotation • Large corpus of about 1,500 unnanotated examples for each word • Best results: 85% accuracy • But nothing new was presented!!!

Word Sense Disambiguation • As participants: • English lexical sample task: SVMs, constraint classification, thorough feature optimization and parameter tuning, (semantically) rich feature set. Accuracy: 71.6% - 78.2%, state-of-the-art. • English allwords task: combination (cascade + weighted voted scheme) of several supervised and knowledge based modules. Supervised trained on frequent words of the SemCor corpus. Knowledge based modules rely on WordNet and WordNet Domains. Accuracy: 62.40% (67.4%) • Desambiguation of WordNet glosses (best results) • Five papers already available. Also resources (datasets and dictionaries) will be also available after the workshop in July.

New Direction • AllwordsWSDin context ... The jury#NN#1 further#RB#2said#VB#1 in term#NN#2 end#NN#2 presentments#NN#1that the City_Executive_ Committee#1 , which had#VB#4 over-all#JJ#2 charge#NN#6 of the election#NN#1 , “ deserves#VB#1 the praise#NN#1 and thanks#NN#1 of the City_of_Atlanta#1 ” for the manner#NN#1 in which the election#NN#1 was conducted#VB#1 . ...

Allwords WSD in context • Example (WSJ, only nouns) jurytermend presentmentschargeelection praisethanksmanner election

Allwords WSD in context • Example (WSJ, only nouns) jurytermend presentmentschargeelection praisethanksmanner election “One sense per discourse” constraint

Allwords WSD in context • Example (WSJ, only nouns) jurytermend body of citizens...word or expression point in time in which something ends committee, panellimited period of time surface of a three dimensional object presentmentschargeelection an accusation of crime... electrical charge the act of presenting somethinga impetuous rush toward someone... a pleading a command to do something praisethanksmanner acnkowledgement of appreciation with the help or owing to Sense pairs likely to occur together

Allwords WSD in context • Example (WSJ, only nouns) jurytermend body of citizens...word or expression point in time in which something ends committee, panellimited period of time surface of a three dimensional object presentmentschargeelection an accusation of crime... electrical charge the act of presenting somethinga impetuous rush toward someone... a pleading a command to do something praisethanksmanner acnkowledgement of appreciation with the help or owing to Uncompatible sense pairs

Allwords WSD in context • Example (WSJ, only nouns) jurytermend body of citizens...word or expression point in time in which something ends committee, panellimited period of time surface of a three dimensional object presentmentschargeelection an accusation of crime... electrical charge the act of presenting somethinga impetuous rush toward someone... a pleading a command to do something praisethanksmanner acnkowledgement of appreciation with the help or owing to Lots of irrelevant/unknown sense pairs

Allwords WSD in context • Selectional preferences • To produce compatibility constraints between verbs and subject/object head nouns • For instance: “when money#1 appears as object the preferred verbs are: raise#4 (1.44), {take_in#5, collect#2} (0.45), {earn#2, garner#2} (0.23), …” • Need of syntactic information

Allwords WSD in context • A very good starting point • Funding: MEANING, European research project • Resources: MCR, including WordNets from different languages, “ontologies” (Domains, SUMO, TopOntology, SemFile) linked to WordNet synsets, selectional preferences, etc. • Tools: the Senseval-3 allwords WSD system and all its components • People: Lluís Villarejo (PhD student at TALP) • ML approach: Inference & Learning with Linear Constraints

Allwords WSD in context • Potential problems • Computational requirements • Soft constraints • Lots of irrelevant sense pairs • Can compatibility constraints be reliably estimated from existing labeled corpora? • … • We have to codify only the most relevant constraints between pairs of “related” words at a coarse level of granularity (very general semantic class labels)

Allwords WSD in context • Current status • Semantic-class attributes of the context words have already been incorporated as features for capturing “interactions”: gain 1-2 points (but context words are very ambiguous…) • Training/testing the system assuming that we know the actual senses of context words (upper bounds) • (near) Future • Inference on top of classifiers’ output • Learning with global feedback (coming from inference)

Thanks again for your attention!!!

Word Sense Disambiguation Another NLP working problem for learning with constraints… Lluís Màrquez