1 / 22

Improving Translation Selection using Conceptual Vectors

Improving Translation Selection using Conceptual Vectors. LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia. Presentation Overview. Problem Background & Motivation Research Objectives Methodology Advantages & Contributions.

winda
Download Presentation

Improving Translation Selection using Conceptual Vectors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia

  2. Presentation Overview • Problem Background & Motivation • Research Objectives • Methodology • Advantages & Contributions

  3. Presentation Overview • Problem Background & Motivation • Research Objectives • Methodology • Advantages & Contributions

  4. Natural Language is Ambiguous bank ? ?

  5. Given: a list of meanings/senses of words (dictionaries) input text containing occurrences of ambiguous words Assign the correct sense to particular instance of ambiguous word in context A.k.a. “sense-tagging” Word Sense Disambiguation …. bank#1: a financial institution that accepts deposits and channels the money into lending activities bank#2: sloping land (especially the slope beside a body of water) …. bank#1 …withdraw money from the bank...

  6. Disambiguation in Machine Translation (1) (Malay translations) bank tebing …. bank#1: a financial institution that accepts deposits and channels the money into lending activities bank#2: sloping land (especially the slope beside a bodyof water) …. English input …withdraw money from the bank... sense-tag(WSD) …withdraw money from the bank#1... select translation word That worked well… Malay output …mengeluarkan wang dari bank...

  7. Disambiguation in Machine Translation (2) (Malay translations) edaran (money) penyebaran (berita) …. circulation#6: the spread or transmission of something(as news or money) to a wider group or area …. English input …50 ringgit notes in circulation... sense-tag(WSD) … 50 ringgit notes in circulation#6... translate That DIDN’T work well… Malay output …duit kertas 50 ringgit dalam edaran?? penyebaran?...

  8. Optimising WSD for MT select select (Lee and Kim 2002) Input word Sense number Translation word select

  9. Presentation Overview • Problem Background & Motivation • Research Objectives • Methodology • Advantages & Contributions

  10. Main Objective • Existing MT system: • Selects fragments (translation units) from previously translated examples • Re-combines selected translation units to produce translation output for new input text • Improve the translation quality of this MT system by adapting a WSD algorithm specifically for MT purposes .

  11. Need semantic knowledge about… • Word senses • Use dictionary definitions • Pairs of translation words • From bilingual knowledge bank (BKB) made up of pairs of sentences that are translations of each other • Corresponding words in each translation sentence pair are explicitly marked • Need a model to capture semantic knowledge of lexical items • Conceptual Vectors (Lafourcade 2001) • Using a selection of concepts or themes • Construct mathematical vectors from concepts • Thematic similarity between lexical items ≡ angle between CVs

  12. Need to: • Compile CVs for word meanings on 2 levels: • Word sense (from dictionary) • Word/phrase translation unit (from BKB) using data compiled from previous step • Use compiled information during translation runtime to select correct translation units

  13. Presentation Overview • Problem Background & Motivation • Research Objectives • Methodology • Advantages and Contributions

  14. word → sense numberlevel knowledge Brief Outline Input Text Dictionary / Lexicon Word senses tag “clues” Concept Category Labels matching, comparison, selection BKB Translation Unit Profile(word → translation level knowledge) Examples Translationunits selected translation units Translated Text Data Preparation Phase EBMT Run-time Phase

  15. word → sense numberlevel knowledge During Translation Input Text Dictionary / Lexicon Word senses tag “clues” Concept Category Labels matching, comparison, selection BKB Translation Unit Profile(word → translation level knowledge) Examples Translationunits selected translation units Translated Text Data Preparation Phase EBMT Run-time Phase

  16. Some Results • Translating ‘circulation’ to Malay • edaran or penyebaran • TS: proposed translation selection using CVs • BS: baseline strategy, chooses • the translation that co-occur with the same input words (and same structure) as in the BKB • or the most frequently occuring translation

  17. Presentation Overview • Problem Background & Motivation • Research Objectives • Methodology • Advantages & Contributions

  18. Advantages and Weaknesses • Pros: • optimized for EBMT • focus on translation selection, bypass intermediate WSD at run time • Handles many-to-many mapping of source word  sense  translation words • allows for bi-directional translation with sense-tagging for 1 language • mathematical operations on vectors are easy to implement • avoids combinatorial effect when multiple ambiguous words in input • Cons: • not all ambiguities can be solved using co-occurring concepts • does not handle translation selection of function words • manual work required in data preparation

  19. Research Contributions • Adaptation of a WSD approach for the specific aim of translation selection • Proposal of specific guidelines for assigning related concepts for word meanings from dictionaries • Production of knowledge about word meanings on two levels: • Word senses as in dictionaries • Translations as in parallel text

  20. Summary • WSD can be customized for different NLP applications accordingly • Different requirements • Increase efficiency • WSD and related tasks based on concepts common to co-occurring word senses can be facilitated using conceptual vector model • Requires a concept category hierarchy and word sense list • Concepts related to a word sense modelled as mathematical vector • Conceptual similarity = angular distance between vectors • Future work • Automating data preparation tasks • Investigating suitable weights or normalizing factors during CV manipulation • Integration with other WSD or translation selection strategies

  21. Future Work • Automate tagging tasks that are currently done manually • Investigate different weight values for CVs for different syntactic relations or word classes • Integrate with other WSD/translation selection tasks

  22. Thank You

More Related