Structural phrase alignment based on consistency criteria
This presentation is the property of its rightful owner.
Sponsored Links
1 / 1

Structural Phrase Alignment Based on Consistency Criteria PowerPoint PPT Presentation


  • 50 Views
  • Uploaded on
  • Presentation posted in: General

my. traffic. The light. was green. when. Frequency (log). entering. the intersection. 1/1+1/2=1.5. Dist of E-Side. Dist of J-Side. baseline. J-Side Distance. E-Side Distance. Consistency Score. 3. 3. you. デ格. NP. 日本 で. Pair 1: (Ds, Dt) = (1, 1) Positive Score. (in Japan).

Download Presentation

Structural Phrase Alignment Based on Consistency Criteria

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Structural phrase alignment based on consistency criteria

my

traffic

The light

was green

when

Frequency (log)

entering

the intersection

1/1+1/2=1.5

Dist of E-Side

Dist of J-Side

baseline

J-Side Distance

E-Side Distance

Consistency Score

you

デ格

NP

日本 で

Pair 1:

(Ds, Dt) = (1, 1)

Positive Score

(in Japan)

[case “de”]

(insurance)

will have to file

保険

文節内

[inside clause]

会社 に 対して

連用

NN

insurance

(to company)

Score

[renyou]

an claim

NP

保険

文節内

[inside clause]

(insurance)

insurance

NN

請求 の

ノ格

(claim)

[case “no”]

Pair 2:

(Ds, Dt) = (1, 7)

Negative Score

with the office

E-Side

Distance

申し立て が

ガ格

PP

[case “ga”]

(instance)

可能です よ

J-Side Distance

in Japan

PP

(you can)

Toshiaki Nakazawa, Kun Yu, Sadao Kurohashi

(Graduate School of Informatics, Kyoto University)

{nakazawa, [email protected]

[email protected]

Core Steps of Alignment

Flow of Our EBMT System

  • Searching Correspondence Candidates

    • Fine alignment is efficient in translation

    • Search candidates as much as possible using variety of linguistic information

      • Bilingual dictionaries

      • Transliteration (Katakana words, NEs)

      • ローズワイン → rosuwain ⇔ rose wine (similarity:0.78)

      • 新宿 → shinjuku ⇔ shinjuku (similarity:1.0)

      • Numeral normalization

      • 二百十六万 → 2,160,000 ← 2.16 million

      • Japanese flexible matching(Odani et. al. 2007)

      • Substring co-occurrence measure (Cromieres 2006)

  • Selecting Correspondence Candidates

    • More candidates derive more ambiguities and improper alignments

    • Necessity of robust alignment method which can align parallel sentences consistently by selecting the adequate candidates set

Translation Examples

Input

交差

(cross)

came

交差点に入る時

私の信号は青でした。

点 で 、

(point)

at me

突然

from the side

(suddenly)

飛び出して 来た のです 。

at the intersection

Structural Phrase AlignmentBased on Consistency Criteria

(rush out)

交差

家 に

(cross)

to remove

点 に

(house)

入る

(point)

when

入る

(enter)

entering

(enter)

(when)

(when)

脱ぐ

(put off)

a house

私 の

(my)

私 の

(my)

my

信号 は

(signal)

signature

サイン

(signal)

Language Models

(blue)

信号 は

でした 。

traffic

(signal)

(was)

Output

The light

(blue)

My traffic light was green when entering the intersection.

でした 。

was green

(was)

Selecting Correspondence Candidates

Using Consistency Score and Dependency Type

Ambiguities!

日本 で

you

(in Japan)

Near!

will have to file

保険

(insurance)

insurance

会社 に 対して

Far!

Far!

(to company)

an claim

保険

(insurance)

insurance

請求 の

(claim)

申し立て が

with the office

Near!

(instance)

Improper alignments!

in Japan

可能ですよ

How to reflect the inconsistency?

(you can)

Dependency Type Distance

Distribution of the distance of alignment pairs in hand-annotated data (Mainichi newspaper 40K sentence pairs) [Uchimoto04]

“Near-Near” pair → Positive Score

“Far-Far” pair → 0

“Near-Far” pair → Negative Score

Consistency Score Function

Experimental Result

Quality of Other Language Pairs

  • 500 test sentences from Mainichi newspaper parallel corpus

  • Bilingual dictionary: KENKYUSYA J-E/J-E 500K entries

  • Evaluation criteria: Precision / Recall / F-measure

  • Character-base for Japanese, word-base for English

(AER)

Conclusion

  • Proposed a new phrase alignment method using consistency criteria.

  • Enough alignment accuracy compared to other language pairs.

  • We need to acquire the parameters automatically by machine learning.

  • We are planning to evolve the framework which revises the parse result.

* Using 300K newspaper domain bi-sentences for training

(There is a translation demos in exhibition corner by NICT which is using our system!)


  • Login