230 likes | 411 Views
Word Sense Disambiguation Another NLP working problem for learning with constraints… Lluís Màrquez TALP, LSI, Technical University of Catalonia UIUC, June 10 2004. Word Sense Disambiguation. The problem
E N D
Word Sense Disambiguation Another NLP working problem for learning with constraints… Lluís Màrquez TALP, LSI, Technical University of Catalonia UIUC, June 10 2004
Word Sense Disambiguation • The problem • WSD is the problem of assigning the correct meaning to the words occurring in a text or discourse (sense tagging) • Example: “He was mad about stars at the age#1 of nine” “About 20,000 years ago the last ice age#2 ended” age#1: the length of time something (or someone) has existed age#2: a historic period • Origin in the beginning of AI (60’s) around first MT models • Renewed interest with the explosion of statistical and ML-based approaches to NLP (90’s)
Word Sense Disambiguation • Usual approaches • Supervised learning (ML): multiclass classification problem; “word-experts”. Results about 75% accuracy on subsets of selected polysemous words. Sometimes better (over 90%) on some specific words • “Unsupervised”, “knowledge-based” = heuristic rules based on preexisting knowledge sources (WorNet, MRDs, multilingual aligned corpora, etc.). Accuracy: around 60% (allwords WSD) • Combined approaches: 65% (allwords WSD) • Supervised methods are better but difficult to apply to “allwords” WSD
WSD: ML Approach • Usual Features: • Local context patterns (POS, words, lemmas) • the <age> of,<age> CD • <age> limit,mean <age> • Broad context features: Bag of (relevant) words • Atomicoccurs in the sentence • Darkoccurs in the sentence • Also syntactic features capturing predicate-argument relations
WSD: ML Approach • Main difficulties: • Each word is a classification problem => data scarceness • High granularity of sense repositories used => many classes • Difficulty in capturing the semantic information present in the context: words (sparseness problem) which are also ambiguous (no interactions between word-classifiers have been exploited).
WSD: Difficulties • Example (from WSJ) The jury further said in term end presentments that the City Executive Committee, which had over-all charge of the election, “deserves the praise and thanks of the City of Atlanta” for the manner in which the election was conducted.
WSD: Difficulties • Example (from WSJ, WordNet senses) The jury#NN#1 further#RB#2said#VB#1 in term#NN#2 end#NN#2 presentments#NN#1that the City_Executive_ Committee#1 , which had#VB#4 over-all#JJ#2 charge#NN#6 of the election#NN#1 , “ deserves#VB#1 the praise#NN#1 and thanks#NN#1 of the City_of_Atlanta#1 ” for the manner#NN#1 in which the election#NN#1 was conducted#VB#1 .
WSD: Difficulties • Example (from WSJ, WordNet senses) jury#NN#1 further#RB#2said#VB#1 term#NN#2 end#NN#2 presentments#NN#1 had#VB#4 over-all#JJ#2 charge#NN#6 election#NN#1 deserves#VB#1praise#NN#1 thanks#NN#1 manner#NN#1 election#NN#1 conducted#VB#1 .
WSD: Difficulties • Example (from WSJ, WordNet senses) The jury(2) further(5)said(11) in term(6) end(15) presentments(3)that the City_Executive_ Committee , which had(21) over-all(2) charge(15) of the election(2) , “ deserves the praise(2) and thanks(2) of the City_of_Atlanta ” for the manner(3) in which the election(2) was conducted(5) .
WSD: ML Approach • Utility? • Useful for IR / IE / Semantic parsing / Knowledge acquisition? • Accurately resolving WSD is more difficult that most of the NLP tasks for which is potentially helpful • Evaluation Exercises for WSD: Senseval-1/2/3 • Senseval-3 collocated with ACL-2004 • 2 major types of task: “lexical sample”, “allwords” • 10 different languages + 1 multilingual lexical sample task • Several new tasks: Automatic subcategorization acquisition, WSD of WordNet glosses, Semantic Roles (English and Swedish), Logic Forms, etc.
Word Sense Disambiguation • Our implication in Senseval-3 (TALP research group) • As organizers: • Lexical sample tasks for Catalan and Spanish: • Coarse sense dictionary developed for the tasks with additional information (collocations, examples, etc.) • Manual annotation of about 300 examples for 50 different words in each language. Context of 3 sentences. Also POS and lemma annotation • Large corpus of about 1,500 unnanotated examples for each word • Best results: 85% accuracy • But nothing new was presented!!!
Word Sense Disambiguation • As participants: • English lexical sample task: SVMs, constraint classification, thorough feature optimization and parameter tuning, (semantically) rich feature set. Accuracy: 71.6% - 78.2%, state-of-the-art. • English allwords task: combination (cascade + weighted voted scheme) of several supervised and knowledge based modules. Supervised trained on frequent words of the SemCor corpus. Knowledge based modules rely on WordNet and WordNet Domains. Accuracy: 62.40% (67.4%) • Desambiguation of WordNet glosses (best results) • Five papers already available. Also resources (datasets and dictionaries) will be also available after the workshop in July.
New Direction • AllwordsWSDin context ... The jury#NN#1 further#RB#2said#VB#1 in term#NN#2 end#NN#2 presentments#NN#1that the City_Executive_ Committee#1 , which had#VB#4 over-all#JJ#2 charge#NN#6 of the election#NN#1 , “ deserves#VB#1 the praise#NN#1 and thanks#NN#1 of the City_of_Atlanta#1 ” for the manner#NN#1 in which the election#NN#1 was conducted#VB#1 . ...
Allwords WSD in context • Example (WSJ, only nouns) jurytermend presentmentschargeelection praisethanksmanner election
Allwords WSD in context • Example (WSJ, only nouns) jurytermend presentmentschargeelection praisethanksmanner election “One sense per discourse” constraint
Allwords WSD in context • Example (WSJ, only nouns) jurytermend body of citizens...word or expression point in time in which something ends committee, panellimited period of time surface of a three dimensional object presentmentschargeelection an accusation of crime... electrical charge the act of presenting somethinga impetuous rush toward someone... a pleading a command to do something praisethanksmanner acnkowledgement of appreciation with the help or owing to Sense pairs likely to occur together
Allwords WSD in context • Example (WSJ, only nouns) jurytermend body of citizens...word or expression point in time in which something ends committee, panellimited period of time surface of a three dimensional object presentmentschargeelection an accusation of crime... electrical charge the act of presenting somethinga impetuous rush toward someone... a pleading a command to do something praisethanksmanner acnkowledgement of appreciation with the help or owing to Uncompatible sense pairs
Allwords WSD in context • Example (WSJ, only nouns) jurytermend body of citizens...word or expression point in time in which something ends committee, panellimited period of time surface of a three dimensional object presentmentschargeelection an accusation of crime... electrical charge the act of presenting somethinga impetuous rush toward someone... a pleading a command to do something praisethanksmanner acnkowledgement of appreciation with the help or owing to Lots of irrelevant/unknown sense pairs
Allwords WSD in context • Selectional preferences • To produce compatibility constraints between verbs and subject/object head nouns • For instance: “when money#1 appears as object the preferred verbs are: raise#4 (1.44), {take_in#5, collect#2} (0.45), {earn#2, garner#2} (0.23), …” • Need of syntactic information
Allwords WSD in context • A very good starting point • Funding: MEANING, European research project • Resources: MCR, including WordNets from different languages, “ontologies” (Domains, SUMO, TopOntology, SemFile) linked to WordNet synsets, selectional preferences, etc. • Tools: the Senseval-3 allwords WSD system and all its components • People: Lluís Villarejo (PhD student at TALP) • ML approach: Inference & Learning with Linear Constraints
Allwords WSD in context • Potential problems • Computational requirements • Soft constraints • Lots of irrelevant sense pairs • Can compatibility constraints be reliably estimated from existing labeled corpora? • … • We have to codify only the most relevant constraints between pairs of “related” words at a coarse level of granularity (very general semantic class labels)
Allwords WSD in context • Current status • Semantic-class attributes of the context words have already been incorporated as features for capturing “interactions”: gain 1-2 points (but context words are very ambiguous…) • Training/testing the system assuming that we know the actual senses of context words (upper bounds) • (near) Future • Inference on top of classifiers’ output • Learning with global feedback (coming from inference)