A Hybrid Relational Approach for Word Sense Disambiguation in Machine Translation Lucia Specia Mark Stevenson Maria G. V. Nunes
WSD in Machine Translation (MT) • Lexical choice in the case of semantic ambiguity. • Examples (English-Portuguese): • take = tomar (carry out), levar (lead, direct, conduct, guide), aceitar (accept), pegar (choose, pick out), etc.
WSD in Machine Translation (cont.) • One of the main challenges in MT. • Conflicting results on the usefulness of WSD for (statistical) MT: • (Vickrey et al., 2005); • (Carpuat and Wu, 2005). • Particularly for English-Portuguese, studies have shown that the lack of WSD modules is one of the main reasons for the unsatisfactory results of the existent MT systems • We suggested that an effective WSD module, specifically designed for MT, would improve MT performance.
Approaches to WSD • Knowledge-based: linguistic knowledge manually codified or extracted from lexical resources • Corpus-based: knowledge automatically acquired from text using machine learning • Hybrid: merge characteristics of the two other approaches Accurate, but suffer from the knowledge acquisition bottelneck. Wide coverage, but need consistent and significant sample corpus. Explore advantages and minimize limitations of other approaches → wide coverage and accurate results.
Approaches to multilingual WSD • Approaches to WSD as an application-independent task date back to 1960’s. • Most are monolingual, for English disambiguation: • WSD is application-dependent (Wilks and Stevenson, 1998; Kilgarriff, 1997; Resnik and Yarowsky, 1997). • WSD for MT differs from monolingual WSD (Hutchins and Sommers, 1992), particularly with respect to the sense repository (Specia et al., 2006).
Approaches to multilingual WSD • Corpus-based and hybrid approaches use propositional formalisms (attribute-value vectors): • Limited expressiveness; data sparseness: Ex1) John gave Mary a big cake. Ex2) Give me something. • Consequences: Impracticalto represent substantial knowledge and use it during the learning process • Hybrid approaches use knowledge in pre-processing steps, before applying machine learning algorithms.
Proposal – a novel approach • LeAR (Lexical Ambiguity Resolution): • Specific for MT: senses, knowledge, techniques. • Hybrid - corpus and knowledge-based • Several knowledge sources (KSs) automatically acquired from corpus and lexical resources; • Evidence provided by examples of disambiguation extracted from automatically created sense tagged corpora. • Relational formalism • Highly expressive, avoiding data sparseness: each example is represented independently. • Inductive Logic Programming (ILP) Relational symbolic supervised learning approach.
Machine Learning Logic Programming ILP Theory (1st-order clauses) Aleph Inductive Logic Programming Back. Knowledge (1st-order clauses) • Allows the efficient representation of substantial knowledge about the problem, and allows this knowledge to be used during the learning process (Muggleton, 1991). Examples (1st-order clauses)
Inductive Logic Programming (cont.) • Given: • a set of positive and negative examples E = E+ E- • a predicate p specifying the target relation to be learned • knowledge of a certain domain which specifies which predicates qi can be part of the definition of p. • The goal is: to induce a hypothesis (or theory) h forp, with respect to E and , which covers most of the E+,without covering the E-. • Additionally: clauses representing K, E, and h must satisfy a set of syntactic restrictions S (language bias). • h can be used to classify new cases of disambiguation.
Inductive Logic Programming (cont.) • Aleph (Srinivasan, 2000): • Provides a complete relational learning inference engine. • Provides various customization options: • Induction methods; • Search strategies; • Evaluation functions; etc. • We are using: • bottom-up search (generalisation); • non-incremental learning (batch learning); • non-interactive learning (without user intervention); • learning based on positive examples only.
Inductive Logic Programming (cont.) • The default inference engine induces a theory iteratively by means of the following steps: • One example is selected to be generalized. Ex.: sense(sent1,voltar). • A more specific clause (bottom clause), which explains the selected example, is built. It usually consists of the representation of all knowledge about that example. • A clause that is more generic than the bottom clause is searched, by means of different search, evaluation, and generalization strategies. • The best clause found is added to the theory and the examples covered by such clause are removed from the example set. If there are more instances, return to step 1.
KS1 Parser POS tagger Examples Bag-of-words (10) KS2 POS of the Narrow Context (10) Rules to use POS Mode + type + general settings Rules to use Bag-of-words (10) KS3 ILP Inference Engine Rule-based model Subject-object syntactic relations Rules to use syntactic relations KS4 KS7 11 Collocations Rules to use definitions overlapping KS5 Bag-of-words (10) Subject-object syntactic relations Rules to use Collocations Definitions overlapping Verbs selectional restrictions Rules to use selectional restrictions Overlapping counting Nouns semantic features Hierarchical relations KS6 Bag-of-words (10) Verb definitions and examples LDOCE Wordnet Feature types hierarchy Rules to use context, ph. verbs & idioms LDOCE + Password Phrasal verbs and idioms Bilingual MRDs Bag-of-words (200)
Scope • Experiments with: • English-Portuguese MT • No studies have examined English-Portuguese. • 10 highly frequent and ambiguous verbs • Relevant and difficult cases for English-Portuguese MT (Specia, 2005a). • Knowledge from syntactic, semantic and pragmatic sources • Working on knowledge which is specific for translation. • Although especially designed for MT of verbs, the approach can be adapted for WSD of any words and languages.
Sample data • Corpus: fiction books, automatically tagged with the verb translation and manually reviewed (Specia et al., 2005a).
Knowledge sources • Example: sent1, verb “to come”: “If there is such a thing as reincarnation, I would not mind coming back as a squirrel”. • KS1: Bag-of-words – ± 5 words (lemmas) surrounding the verb for every sentence (sent_id) bag(sent_id, list_of_words). Ex.: bag(sent1, [mind,not,will,i,reincarnation,back,as,a,squirrel]) • KS2: Part-of-speech (POS) tags of content words in a ±5 word window surrounding the verb has_pos(sent_id, word_position, pos). Ex.: has_pos(sent1, word_left_1, nn). has_pos(sent1, word_left_2, vbp). …
Knowledge sources • KS3:Subject and object syntactic relations with respect to the verb has_rel(sent_id, subject_word, object_word). Ex.: has_rel(sent1, i, nil). • KS4: Context words represented by 11 collocations with respect to the verb: 1st preposition to the right, 1st and 2nd words to the left and right, 1st noun, 1st adjective, and 1st verb to the left and right has_collocation(sent_id, collocation_type, collocation). Ex.: has_collocation(sent1, word_right_1, back). has_collocation(sent1, word_left_1, mind). …
Knowledge sources • KS5: Selectional restrictions of verbs and semantic features of their arguments from LDOCE rest(verb, subj_restrition, obj_ restriction, translation) Ex.: rest(come, , nil, voltar). rest(come, [animal,human], nil, vir). rest(come, , nil, aparecer). ... feature(noun, sense_id, features). Ex.: feature(reincarnation, 0_1, [abstract]). feature(reincarnation, 0_2, [animate]). feature(squirrel, 0_0, [animal]). …
Knowledge sources • KS5 (cont.): • Hierarchy for LDOCE feature types (Bruce and Guthrie, 1992) relation(feature1, feature2). Ex.: sub(human, animate). … • Ontological relations from WordNet relation(word1, sense_id1, word2, sense_id2). Ex.: hyper(reincarnation, 1, avatar, 1). hyper(reincarnation, 3, religious_doctrine, 2). synon(rebirth, 2, reincarnation, -1). …
Knowledge sources • KS6: Idioms and phrasal verbs exp(verbal_expression, translation) Ex.: exp('come about', acontecer). exp('come about', chegar). exp('come to fruition', amadurecer). … • KS7: A count of the overlapping words in dictionary definitions for the possible translations of the verb and the words surrounding it in the sentence highest_overlap(sent_id, translation, overlapping). Ex.: highest_overlap(sent1, voltar, 0.222222). highest_overlap(sent1, chegar, 0.0857143). …
Additional predicates • Examples: sense(sent_id, translation). Ex.: sense(sent1, voltar). sense(sent2, ir). … • Mode definitions Ex.: :- modeh(1, sense(sent, translation)). :- modeb(11, has_collocation(sent, colloc_id, colloc)). :- modeb(10, has_bag(sent, word)). … • Auxiliary predicates: Ex.: has_bag(Sent, Word) :- bag(Sent, List), member(Word, List). … bag(sent1, [mind,not,will,i,reincarnation,back,as,a,squirrel])
Example of rules produced • Verb “to come”: 1.sense(A, sair) :- has_collocation(A, preposition_right_1, out). 2.sense(A, chegar) :- satisfy_restrictions(A, [animal,human],[concrete]), has_expression(A, 'come at'). 3.sense(A, vir) :- satisfy_restriction(A, [human],[abstract]); has_collocation(A, word_right_1, from); (has_rel(A, subj, B), (has_pos(B,nn);has_pos(B,pron))). 4. sense(A, passar) :- (has_bag(A, to), has_bag(A, propernoun)); highest_overlapping(A,passar). In order classify new cases, rules must be applied in the order they are produced.
Evaluation • Induction methods: • induce: builds one clause each time, removing the examples covered by that clause. Theory to be produced depends on the order of the examples; • induce_max: builds one clause each time, without removing the examples covered by the clause. Builds a bottom clause for all the examples, and not only for the first one. • Search strategies: • bf: enumerates shorter clauses before longer ones; • df: enumerates longer clauses before shorter ones; • heuristic: enumerates clauses in a best-first manner.
Evaluation • Generalization strategy: • Relative least general generalisation (rlgg): lgg of two clauses c1 and c2, which is the minimum upper bound of c1 and c2 in the lattice introduced by -subsumption, with relation to the background knowledge. • Evaluation function: • Only positive examples (Bayesian score). • Knowledge sources: • All 7; • All – 1 = 6 each time; • 1 each time.
Comparison: propositional approaches • Algorithm: C4.5, Naive Bayes, Memory-based, SVM. • Knowledge sources: • Narrow context: 5 surrounding words and/or POS tags; • Broad context: 1-100 surrounding words and/or POS tags; • 11 collocations: 1st preposition to the right, 1st and 2nd words to the left and right, 1st noun, 1st adjective, and 1st verb to the left and right; • Subject-object syntactic relations. • Best combination of these features, along with use of filters and optimization of parameters. • Best algorithm: SVM. Specia et al. (2005b).
Results * 10 fold-cross validation, best experimental setting: induce_max, all KSs, heuristic search, but without filters or other optimizations.
Results - KSs • All KSs together yield better results than subsets of KSs. • Different KSs seem to be more relevant than others for certain verbs. • One KS each time – very low quality results (accuracy and rules).
Next steps • Try different corpora: larger and of different domains/genres (although the verbs are not domain specific). • Other ILP options: • Induction methods; • Manual pruning; • Manual constraints; • Search strategies; etc. • Optimization (time). • Use of the translation context KS specific to MT. • Extrinsic evaluation: transfer rule-based MT system.
Final remarks • Results are promising: hybrid relational approach outperforms propositional approaches, yielding a small set of symbolic rules, which are easy to understand and adapt, if necessary. • All KSs seem to play an important role. • In general, the approach showed to be feasible and we expect the resultant system will be able to improve the quality of English-Portuguese MT systems.
“Which of you shall we say doth love us most? That we our largest bounty may extend Where nature doth with merit challenge.” Lucia Specia L.Specia@open.ac.uk
References • Bruce, R. and Guthrie. L. (1992). Genus disambiguation: A study in weighted performance. In Proceedings of the 14th COLING, Nantes, pp. 1187-1191. Carpuat, M. and Wu, D. (2005). Word sense disambiguation vs. statistical machine translation. 43rd ACL Meeting, Ann Arbor, pp. 387–394. • Kilgarriff, A. (1997). I Don't Believe in Word Senses. Computers and the Humanities, 31 (2), pp. 91-113. • Hutchins, W.J. and Somers, H.L. 1992. An Introduction to Machine Translation. Academic Press, Great Britain. • Muggleton, S. 1991. Inductive Logic Programming. New Generation Computing, 8 (4):295-318. • Resnik, P. and Yarowsky, D. (1997). A Perspective on Word Sense Disambiguation Methods and their Evaluating. ACL-SIGLEX Workshop Tagging Texts with Lexical Semantics: Why, What and How?. Washington. • Specia, L. (2005a). A Hybrid Model for Word Sense Disambiguation in English-Portuguese Machine Translation. In Proceedings of the 8th CLUK, Manchester, pp. 71-78. • Specia, L. (2005b). Knowledge sources for disambiguating highly ambiguous verbs in machine translation. In Proceedings of the Student Session of the 17th ESSLLI, Edinburgh.
References • Specia, L., Nunes, M.G.V., Stevenson, M. (2005). Exploiting Parallel Texts to Produce a Multilingual Sense-tagged Corpus for Word Sense Disambiguation. In Proceedings of RANLP-05, Borovets, pp. 525-531. • Specia, L., Nunes, M.G.V., Stevenson, M. 2006 (to appear). Multilingual versus Monolingual WSD. Proceedings of EACL Workshop Making Sense of Sense, April 4th, Trento. • Srinivasan, A. 2000. The Aleph Manual. Technical Report. Computing Laboratory, Oxford University (http://web.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/). • Vickrey, D., Biewald, L., Teyssier, M., and Koller, D. (2005). Word-Sense Disambiguation for Machine Translation. HLT/EMNLP, Vancouver. • Wilks, Y. and Stevenson. M. 1998. The Grammar of Sense: Using Part-of-speech Tags as a First Step in Semantic Disambiguation. Journal of Natural Language Engineering, 4(1):1-9.