270 likes | 280 Views
This paper discusses a machine transliteration model that converts text from one language to phonetically equivalent text in another language. The model uses a grapheme-based approach and dynamically handles source graphemes and phonemes to improve accuracy. The objective is to minimize errors caused by error propagation and produce translations effectively. The proposed method involves a correspondence-based strategy and a hybrid approach incorporating both grapheme and phoneme data. Experiments show promising results in transliterating English text to Korean, Japanese, or Chinese. The researchers aim to apply this model to English-to-Chinese transliteration in the future for applications in machine translation, CLIR, IR, and NLP.
E N D
A machine transliteration model based on correspondence between graphemes and phonemes Presenter:Chien-Hsing Chen Author: Jong-Hoon Oh Key-Sun choi Hitoshi Isahara 2007.TALIP (ACM Transactions on Asian Language Information Processing)
Outline • Motivation • Objective • Previous work • Method • Experiments • Conclusions • Opinion
data (English) G-based Motivation • deiteo (korean); deta (Jap.) • Machine transliteration (MT) • automatically convert in one language into phonetically equivalent ones in another language • Such as from English to Korean, Japanese, or Chinese • a special case of CLIR, it is useful for query translation … • Graphemes-based • Source G target G • Phonemes-based • Source G source P target G • Hybrid • linear interpolation • dynamically handle source graphemes and phonemes data (English) [`det#] P-based • deiteo (korean); deta (Jap.)
Objective data (English) P-based [`det#] • Correspondence-based • correspondence between source G and P • dynamically handle source G and P based on the contexts • an example: neomycin (G + P) • deiteo (korean); deta (Jap.) data (English) C-based G [`det#] P • deiteo (korean); deta (Jap.)
Previous work- Grapheme-Based 1/4 • G-based transliteration modes are classified into: • statistical translation, decision trees, transliteration network, joint source channels • board (/B AO R D/); b, oa, r, d are PUs 依音節切割 • Ei = epui1, … epuin [1998, 1999] • Ki = kpui1, … kpuin • E=b:oar:d, b:oa:r:d, b:o:a:r:d, • K=b:o:deu, b:o:reu:deu, b:o:a:reu:deu • error PUs, incorrect • PUs for each word, time
Previous work- Grapheme-Based 2/4 • Decision trees [2000; 2001] • English grapheme to Korean grapheme conversion • no consider the phonetic aspect of the transliteration
Previous work- Grapheme-Based 3/4 • network [2000] • Each node is composed of more than one English grapheme and the corresponding Korean graphemes. • Each arc represents a possible link between nodes. • The optimal path is the highest total weight, Viterbi and tree-trellis algorithms ca ka ca ki
Previous work- Grapheme-Based 4/4 • Network [2003] • EN: actinium • Jap: a ku chi ni u mu chunking model z4=um e78:u m Translation model =P(ku3 | ku21,e41) c=2, b=1
Previous work- Phoneme-Based 1/3 • source language word pronunciation target language • Weighted finite-state transducers (WFSTs) • sord sequence • word to English sound • English sound to Japanese sound • Japanese sound to katakana • katakana to OCR • A basic framework for Phoneme-based 0.6
Previous work- Phoneme-Based 2/3 • Two-step procedure • English PUs English phoneme, [statistical translation model] • English phoneme Korean PUs, [EKSCRs standard conversion rule] • Two problems: • error propagation: English PU English phoneme usually error • limitation EKSCRs
Previous work- Phoneme-Based 3/3 • decision trees • Phoneme-based English Korean transliteration • depend on a pronunciation dictionary
Previous work- Hybrid Transliteration 1/1 • Combined through linear interpolation • 0.4 G-based + 0.6 P-based • not consider the dependence between the source graphemes and phonemes in the combining process
data (English) Summary C-based G [`det#] P • G-based • source grapheme target grapheme • P-based • source grapheme source phoneme target grapheme • Correspondence-based • minimize error caused by error propagation by using source grapheme corresponding to a source phoneme • use dynamically source graphemes and source phoneme depending on context, produce effectively • deiteo (korean); deta (Jap.)
C-based find relevant • d find relevant
C-based • Producing Pronunciation • The most relevant source phoneme of b, /B/ can be produced by means of the context, fs, fStype, and fp at L1-L3, C0, and R1-R3.
C-based • Producing Target Graphmemes
C-based Memory-Based Learning 3/3 • k-nearest neighborhood algorithm
Experiments 1/2 P-based G-based C-based
Conclusion • The author plans to apply the transliteration model to an English-to-Chinese transliteration.
Conclusion • The author plans to apply the transliteration model to an English-to-Chinese transliteration.
Opinion • Advantage • Combine Grapheme and Phoneme • Drawback • lack dynamic alignment • Application • machine translation, CLIR, IR, NLP applications