Machine Transliteration Model Based on Grapheme-Phoneme Correspondence

A machine transliteration model based on correspondence between graphemes and phonemes Presenter：Chien-Hsing Chen Author: Jong-Hoon Oh Key-Sun choi Hitoshi Isahara 2007.TALIP (ACM Transactions on Asian Language Information Processing)

Outline • Motivation • Objective • Previous work • Method • Experiments • Conclusions • Opinion

data (English) G-based Motivation • deiteo (korean); deta (Jap.) • Machine transliteration (MT) • automatically convert in one language into phonetically equivalent ones in another language • Such as from English to Korean, Japanese, or Chinese • a special case of CLIR, it is useful for query translation … • Graphemes-based • Source G target G • Phonemes-based • Source G source P target G • Hybrid • linear interpolation • dynamically handle source graphemes and phonemes data (English) [`det#] P-based • deiteo (korean); deta (Jap.)

Objective data (English) P-based [`det#] • Correspondence-based • correspondence between source G and P • dynamically handle source G and P based on the contexts • an example: neomycin (G + P) • deiteo (korean); deta (Jap.) data (English) C-based G [`det#] P • deiteo (korean); deta (Jap.)

Previous work- Grapheme-Based 1/4 • G-based transliteration modes are classified into: • statistical translation, decision trees, transliteration network, joint source channels • board (/B AO R D/); b, oa, r, d are PUs 依音節切割 • Ei = epui1, … epuin [1998, 1999] • Ki = kpui1, … kpuin • E=b:oar:d, b:oa:r:d, b:o:a:r:d, • K=b:o:deu, b:o:reu:deu, b:o:a:reu:deu • error PUs, incorrect • PUs for each word, time

Previous work- Grapheme-Based 2/4 • Decision trees [2000; 2001] • English grapheme to Korean grapheme conversion • no consider the phonetic aspect of the transliteration

Previous work- Grapheme-Based 3/4 • network [2000] • Each node is composed of more than one English grapheme and the corresponding Korean graphemes. • Each arc represents a possible link between nodes. • The optimal path is the highest total weight, Viterbi and tree-trellis algorithms ca ka ca ki

Previous work- Grapheme-Based 4/4 • Network [2003] • EN: actinium • Jap: a ku chi ni u mu chunking model z4=um e78:u m Translation model =P(ku3 | ku21,e41) c=2, b=1

Previous work- Phoneme-Based 1/3 • source language word pronunciation target language • Weighted finite-state transducers (WFSTs) • sord sequence • word to English sound • English sound to Japanese sound • Japanese sound to katakana • katakana to OCR • A basic framework for Phoneme-based 0.6

Previous work- Phoneme-Based 2/3 • Two-step procedure • English PUs English phoneme, [statistical translation model] • English phoneme Korean PUs, [EKSCRs standard conversion rule] • Two problems: • error propagation: English PU English phoneme usually error • limitation EKSCRs

Previous work- Phoneme-Based 3/3 • decision trees • Phoneme-based English Korean transliteration • depend on a pronunciation dictionary

Previous work- Hybrid Transliteration 1/1 • Combined through linear interpolation • 0.4 G-based + 0.6 P-based • not consider the dependence between the source graphemes and phonemes in the combining process

data (English) Summary C-based G [`det#] P • G-based • source grapheme target grapheme • P-based • source grapheme source phoneme target grapheme • Correspondence-based • minimize error caused by error propagation by using source grapheme corresponding to a source phoneme • use dynamically source graphemes and source phoneme depending on context, produce effectively • deiteo (korean); deta (Jap.)

C-based find relevant • d find relevant

C-baed

C-based • Producing Pronunciation • The most relevant source phoneme of b, /B/ can be produced by means of the context, fs, fStype, and fp at L1-L3, C0, and R1-R3.

C-based • Producing Target Graphmemes

C-based Maximum Entropy Model 1/2/1/3

C-based Maximum Entropy Model 2/2/1/3

C-based Decision Tree 2/3

C-based Memory-Based Learning 3/3 • k-nearest neighborhood algorithm

Experiments 1/2 P-based G-based C-based

Experiments 2/2

Discuss

Conclusion • The author plans to apply the transliteration model to an English-to-Chinese transliteration.

Opinion • Advantage • Combine Grapheme and Phoneme • Drawback • lack dynamic alignment • Application • machine translation, CLIR, IR, NLP applications

Machine Transliteration Model Based on Grapheme-Phoneme Correspondence

Machine Transliteration Model Based on Grapheme-Phoneme Correspondence

Presentation Transcript

Feature-based Surface Decomposition for Correspondence and Morphing between Polyhedra

Phonemes

Phonemes

Phonemes and allophones

Correspondence between typing methods results: A web-based quantitative analysis tool

Phonology: More on allophones and phonemes

English consonant graphemes

Phonemes

Principal Axis-Based Correspondence between Multiple Cameras for People Tracking

Consonant graphemes

Phonemes

A Joint Source-Channel Model for Machine Transliteration

Ch3 – More on Phonemes

Transliteration

A Path-based Transfer Model for Machine Translation

Transliteration

Backward Machine Transliteration by Learning Phonetic Similarity

Phonemes

Feature-based Surface Decomposition for Correspondence and Morphing between Polyhedra

Backward Machine Transliteration by Learning Phonetic Similarity

Ch3 – More on Phonemes

DIFFERENCES BETWEEN MACHINE LEARNING AND RULE BASED SYSTEMS