a hybrid relational approach for word sense disambiguation in machine translation l.
Skip this Video
Loading SlideShow in 5 Seconds..
A Hybrid Relational Approach for Word Sense Disambiguation in Machine Translation PowerPoint Presentation
Download Presentation
A Hybrid Relational Approach for Word Sense Disambiguation in Machine Translation

Loading in 2 Seconds...

play fullscreen
1 / 31

A Hybrid Relational Approach for Word Sense Disambiguation in Machine Translation - PowerPoint PPT Presentation

  • Uploaded on

A Hybrid Relational Approach for Word Sense Disambiguation in Machine Translation. Lucia Specia Mark Stevenson Maria G. V. Nunes. WSD in Machine Translation (MT). Lexical choice in the case of semantic ambiguity. Examples (English-Portuguese): take = tomar (carry out),

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'A Hybrid Relational Approach for Word Sense Disambiguation in Machine Translation' - paul2

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
a hybrid relational approach for word sense disambiguation in machine translation

A Hybrid Relational Approach for Word Sense Disambiguation in Machine Translation

Lucia Specia

Mark Stevenson

Maria G. V. Nunes

wsd in machine translation mt
WSD in Machine Translation (MT)
  • Lexical choice in the case of semantic ambiguity.
  • Examples (English-Portuguese):
    • take =

tomar (carry out),

levar (lead, direct, conduct, guide),

aceitar (accept),

pegar (choose, pick out), etc.

wsd in machine translation cont
WSD in Machine Translation (cont.)
  • One of the main challenges in MT.
  • Conflicting results on the usefulness of

WSD for (statistical) MT:

    • (Vickrey et al., 2005);
    • (Carpuat and Wu, 2005).
  • Particularly for English-Portuguese, studies have shown that the lack of WSD modules is one of the main reasons for the unsatisfactory results of the existent MT systems
    • We suggested that an effective WSD module, specifically designed for MT, would improve MT performance.
approaches to wsd
Approaches to WSD
  • Knowledge-based: linguistic knowledge manually codified or extracted from lexical resources
  • Corpus-based: knowledge automatically acquired from text using machine learning
  • Hybrid: merge characteristics of the two other approaches

Accurate, but suffer from the knowledge acquisition bottelneck.

Wide coverage, but need consistent and significant sample corpus.

Explore advantages and minimize limitations of other approaches → wide coverage and accurate results.

approaches to multilingual wsd
Approaches to multilingual WSD
  • Approaches to WSD as an application-independent task date back to 1960’s.
  • Most are monolingual, for English disambiguation:
    • WSD is application-dependent (Wilks and Stevenson, 1998; Kilgarriff, 1997; Resnik and Yarowsky, 1997).
    • WSD for MT differs from monolingual WSD (Hutchins and Sommers, 1992), particularly with respect to the sense repository (Specia et al., 2006).
approaches to multilingual wsd6
Approaches to multilingual WSD
  • Corpus-based and hybrid approaches use propositional formalisms (attribute-value vectors):
    • Limited expressiveness; data sparseness:

Ex1) John gave Mary a big cake.

Ex2) Give me something.

    • Consequences: Impracticalto represent substantial knowledge and use it during the learning process
      • Hybrid approaches use knowledge in pre-processing steps, before applying machine learning algorithms.
proposal a novel approach
Proposal – a novel approach
  • LeAR (Lexical Ambiguity Resolution):
    • Specific for MT: senses, knowledge, techniques.
    • Hybrid - corpus and knowledge-based
      • Several knowledge sources (KSs) automatically acquired from corpus and lexical resources;
      • Evidence provided by examples of disambiguation extracted from automatically created sense tagged corpora.
    • Relational formalism
      • Highly expressive, avoiding data sparseness:

each example is represented independently.

    • Inductive Logic Programming (ILP)

Relational symbolic supervised learning approach.

inductive logic programming

Machine Learning

Logic Programming



(1st-order clauses)


Inductive Logic Programming

Back. Knowledge

(1st-order clauses)

  • Allows the efficient representation of substantial knowledge about the problem, and allows this knowledge to be used during the learning process (Muggleton, 1991).


(1st-order clauses)

inductive logic programming cont
Inductive Logic Programming (cont.)
  • Given:
    • a set of positive and negative examples E = E+  E-
    • a predicate p specifying the target relation to be learned
    • knowledge  of a certain domain which specifies which predicates qi can be part of the definition of p.
  • The goal is: to induce a hypothesis (or theory) h forp, with respect to E and , which covers most of the E+,without covering the E-.
  • Additionally: clauses representing K, E, and h must satisfy a set of syntactic restrictions S (language bias).
  • h can be used to classify new cases of disambiguation.
inductive logic programming cont10
Inductive Logic Programming (cont.)
  • Aleph (Srinivasan, 2000):
    • Provides a complete relational learning inference engine.
    • Provides various customization options:
      • Induction methods;
      • Search strategies;
      • Evaluation functions; etc.
    • We are using:
      • bottom-up search (generalisation);
      • non-incremental learning (batch learning);
      • non-interactive learning (without user intervention);
      • learning based on positive examples only.
inductive logic programming cont11
Inductive Logic Programming (cont.)
  • The default inference engine induces a theory iteratively by means of the following steps:
    • One example is selected to be generalized.

Ex.: sense(sent1,voltar).

    • A more specific clause (bottom clause), which explains the selected example, is built. It usually consists of the representation of all knowledge about that example.
    • A clause that is more generic than the bottom clause is searched, by means of different search, evaluation, and generalization strategies.
    • The best clause found is added to the theory and the examples covered by such clause are removed from the example set. If there are more instances, return to step 1.



POS tagger


Bag-of-words (10)


POS of the Narrow Context (10)

Rules to use POS

Mode + type + general settings

Rules to use Bag-of-words (10)


ILP Inference


Rule-based model

Subject-object syntactic relations

Rules to use syntactic relations



11 Collocations

Rules to use definitions overlapping


Bag-of-words (10)

Subject-object syntactic relations

Rules to use Collocations

Definitions overlapping

Verbs selectional restrictions

Rules to use selectional restrictions

Overlapping counting

Nouns semantic features

Hierarchical relations


Bag-of-words (10)

Verb definitions and examples



Feature types hierarchy

Rules to use context, ph. verbs & idioms

LDOCE + Password

Phrasal verbs and idioms

Bilingual MRDs

Bag-of-words (200)

  • Experiments with:
    • English-Portuguese MT
      • No studies have examined English-Portuguese.
    • 10 highly frequent and ambiguous verbs
      • Relevant and difficult cases for English-Portuguese MT (Specia, 2005a).
    • Knowledge from syntactic, semantic and pragmatic sources
      • Working on knowledge which is specific for translation.
  • Although especially designed for MT of verbs, the approach can be adapted for WSD of any words and languages.
sample data
Sample data
  • Corpus: fiction books, automatically tagged with the verb translation and manually reviewed (Specia et al., 2005a).
knowledge sources
Knowledge sources
  • Example: sent1, verb “to come”:

“If there is such a thing as reincarnation, I would not mind coming back as a squirrel”.

    • KS1: Bag-of-words – ± 5 words (lemmas) surrounding the verb for every sentence (sent_id)

bag(sent_id, list_of_words).

Ex.: bag(sent1, [mind,not,will,i,reincarnation,back,as,a,squirrel])

    • KS2: Part-of-speech (POS) tags of content words in a ±5 word window surrounding the verb

has_pos(sent_id, word_position, pos).

Ex.: has_pos(sent1, word_left_1, nn).

has_pos(sent1, word_left_2, vbp). …

knowledge sources16
Knowledge sources
  • KS3:Subject and object syntactic relations with respect to the verb

has_rel(sent_id, subject_word, object_word).

Ex.: has_rel(sent1, i, nil).

  • KS4: Context words represented by 11 collocations with respect to the verb: 1st preposition to the right, 1st and 2nd words to the left and right, 1st noun, 1st adjective, and 1st verb to the left and right

has_collocation(sent_id, collocation_type, collocation).

Ex.: has_collocation(sent1, word_right_1, back).

has_collocation(sent1, word_left_1, mind). …

knowledge sources17
Knowledge sources
  • KS5: Selectional restrictions of verbs and semantic features of their arguments from LDOCE

rest(verb, subj_restrition, obj_ restriction, translation)

Ex.: rest(come, [], nil, voltar).

rest(come, [animal,human], nil, vir).

rest(come, [], nil, aparecer). ...

feature(noun, sense_id, features).

Ex.: feature(reincarnation, 0_1, [abstract]).

feature(reincarnation, 0_2, [animate]).

feature(squirrel, 0_0, [animal]). …

knowledge sources18
Knowledge sources
  • KS5 (cont.):
    • Hierarchy for LDOCE feature types (Bruce and Guthrie, 1992)

relation(feature1, feature2).

Ex.: sub(human, animate). …

    • Ontological relations from WordNet

relation(word1, sense_id1, word2, sense_id2).

Ex.: hyper(reincarnation, 1, avatar, 1).

hyper(reincarnation, 3, religious_doctrine, 2).

synon(rebirth, 2, reincarnation, -1). …

knowledge sources19
Knowledge sources
  • KS6: Idioms and phrasal verbs

exp(verbal_expression, translation)

Ex.: exp('come about', acontecer).

exp('come about', chegar).

exp('come to fruition', amadurecer). …

  • KS7: A count of the overlapping words in dictionary definitions for the possible translations of the verb and the words surrounding it in the sentence

highest_overlap(sent_id, translation, overlapping).

Ex.: highest_overlap(sent1, voltar, 0.222222).

highest_overlap(sent1, chegar, 0.0857143). …

additional predicates
Additional predicates
  • Examples:

sense(sent_id, translation).

Ex.: sense(sent1, voltar).

sense(sent2, ir). …

  • Mode definitions

Ex.: :- modeh(1, sense(sent, translation)).

:- modeb(11, has_collocation(sent, colloc_id, colloc)).

:- modeb(10, has_bag(sent, word)). …

  • Auxiliary predicates:

Ex.: has_bag(Sent, Word) :-

bag(Sent, List), member(Word, List). …

bag(sent1, [mind,not,will,i,reincarnation,back,as,a,squirrel])

example of rules produced
Example of rules produced
  • Verb “to come”:

1.sense(A, sair) :-

has_collocation(A, preposition_right_1, out).

2.sense(A, chegar) :-

satisfy_restrictions(A, [animal,human],[concrete]),

has_expression(A, 'come at').

3.sense(A, vir) :-

satisfy_restriction(A, [human],[abstract]);

has_collocation(A, word_right_1, from);

(has_rel(A, subj, B),


4. sense(A, passar) :-

(has_bag(A, to), has_bag(A, propernoun));


In order classify new cases,

rules must be applied in

the order they are produced.

  • Induction methods:
    • induce: builds one clause each time, removing the examples covered by that clause. Theory to be produced depends on the order of the examples;
    • induce_max: builds one clause each time, without removing the examples covered by the clause. Builds a bottom clause for all the examples, and not only for the first one.
  • Search strategies:
    • bf: enumerates shorter clauses before longer ones;
    • df: enumerates longer clauses before shorter ones;
    • heuristic: enumerates clauses in a best-first manner.
  • Generalization strategy:
    • Relative least general generalisation (rlgg): lgg of two clauses c1 and c2, which is the minimum upper bound of c1 and c2 in the lattice introduced by -subsumption, with relation to the background knowledge.
  • Evaluation function:
    • Only positive examples (Bayesian score).
  • Knowledge sources:
    • All 7;
    • All – 1 = 6 each time;
    • 1 each time.
comparison propositional approaches
Comparison: propositional approaches
  • Algorithm: C4.5, Naive Bayes, Memory-based, SVM.
  • Knowledge sources:
    • Narrow context: 5 surrounding words and/or POS tags;
    • Broad context: 1-100 surrounding words and/or POS tags;
    • 11 collocations: 1st preposition to the right, 1st and 2nd words to the left and right, 1st noun, 1st adjective, and 1st verb to the left and right;
    • Subject-object syntactic relations.
  • Best combination of these features, along with use of filters and optimization of parameters.
  • Best algorithm: SVM.

Specia et al. (2005b).


* 10 fold-cross validation, best experimental setting: induce_max, all KSs, heuristic search, but without filters or other optimizations.

results kss
Results - KSs
  • All KSs together yield better results than subsets of KSs.
  • Different KSs seem to be more relevant than others for certain verbs.
  • One KS each time – very low quality results (accuracy and rules).
next steps
Next steps
  • Try different corpora: larger and of different domains/genres (although the verbs are not domain specific).
  • Other ILP options:
    • Induction methods;
    • Manual pruning;
    • Manual constraints;
    • Search strategies; etc.
  • Optimization (time).
  • Use of the translation context  KS specific to MT.
  • Extrinsic evaluation: transfer rule-based MT system.
final remarks
Final remarks
  • Results are promising: hybrid relational approach outperforms propositional approaches, yielding a small set of symbolic rules, which are easy to understand and adapt, if necessary.
  • All KSs seem to play an important role.
  • In general, the approach showed to be

feasible and we expect the resultant

system will be able to improve

the quality of English-Portuguese MT


“Which of you shall we say doth love us most?

That we our largest bounty may extend

Where nature doth with merit challenge.”

Lucia Specia


  • Bruce, R. and Guthrie. L. (1992). Genus disambiguation: A study in weighted performance. In Proceedings of the 14th COLING, Nantes, pp. 1187-1191. Carpuat, M. and Wu, D. (2005). Word sense disambiguation vs. statistical machine translation. 43rd ACL Meeting, Ann Arbor, pp. 387–394.
  • Kilgarriff, A. (1997). I Don't Believe in Word Senses. Computers and the Humanities, 31 (2), pp. 91-113.
  • Hutchins, W.J. and Somers, H.L. 1992. An Introduction to Machine Translation. Academic Press, Great Britain.
  • Muggleton, S. 1991. Inductive Logic Programming. New Generation Computing, 8 (4):295-318.
  • Resnik, P. and Yarowsky, D. (1997). A Perspective on Word Sense Disambiguation Methods and their Evaluating. ACL-SIGLEX Workshop Tagging Texts with Lexical Semantics: Why, What and How?. Washington.
  • Specia, L. (2005a). A Hybrid Model for Word Sense Disambiguation in English-Portuguese Machine Translation. In Proceedings of the 8th CLUK, Manchester, pp. 71-78.
  • Specia, L. (2005b). Knowledge sources for disambiguating highly ambiguous verbs in machine translation. In Proceedings of the Student Session of the 17th ESSLLI, Edinburgh.
  • Specia, L., Nunes, M.G.V., Stevenson, M. (2005). Exploiting Parallel Texts to Produce a Multilingual Sense-tagged Corpus for Word Sense Disambiguation. In Proceedings of RANLP-05, Borovets, pp. 525-531.
  • Specia, L., Nunes, M.G.V., Stevenson, M. 2006 (to appear). Multilingual versus Monolingual WSD. Proceedings of EACL Workshop Making Sense of Sense, April 4th, Trento.
  • Srinivasan, A. 2000. The Aleph Manual. Technical Report. Computing Laboratory, Oxford University (http://web.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/).
  • Vickrey, D., Biewald, L., Teyssier, M., and Koller, D. (2005). Word-Sense Disambiguation for Machine Translation. HLT/EMNLP, Vancouver.
  • Wilks, Y. and Stevenson. M. 1998. The Grammar of Sense: Using Part-of-speech Tags as a First Step in Semantic Disambiguation. Journal of Natural Language Engineering, 4(1):1-9.