structural phrase alignment based on consistency criteria n.
Download
Skip this Video
Download Presentation
Structural Phrase Alignment Based on Consistency Criteria

Loading in 2 Seconds...

play fullscreen
1 / 1

Structural Phrase Alignment Based on Consistency Criteria - PowerPoint PPT Presentation


  • 98 Views
  • Uploaded on

my. traffic. The light. was green. when. Frequency (log). entering. the intersection. 1/1+1/2=1.5. Dist of E-Side. Dist of J-Side. baseline. J-Side Distance. E-Side Distance. Consistency Score. 3. 3. you. デ格. NP. 日本 で. Pair 1: (Ds, Dt) = (1, 1) Positive Score. (in Japan).

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Structural Phrase Alignment Based on Consistency Criteria' - chaeli


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
structural phrase alignment based on consistency criteria

my

traffic

The light

was green

when

Frequency (log)

entering

the intersection

1/1+1/2=1.5

Dist of E-Side

Dist of J-Side

baseline

J-Side Distance

E-Side Distance

Consistency Score

you

デ格

NP

日本 で

Pair 1:

(Ds, Dt) = (1, 1)

Positive Score

(in Japan)

[case “de”]

(insurance)

will have to file

保険

文節内

[inside clause]

会社 に 対して

連用

NN

insurance

(to company)

Score

[renyou]

an claim

NP

保険

文節内

[inside clause]

(insurance)

insurance

NN

請求 の

ノ格

(claim)

[case “no”]

Pair 2:

(Ds, Dt) = (1, 7)

Negative Score

with the office

E-Side

Distance

申し立て が

ガ格

PP

[case “ga”]

(instance)

可能です よ

J-Side Distance

in Japan

PP

(you can)

Toshiaki Nakazawa, Kun Yu, Sadao Kurohashi

(Graduate School of Informatics, Kyoto University)

{nakazawa, kunyu}@nlp.kuee.kyoto-u.ac.jp

kuro@i.kyoto-u.ac.jp

Core Steps of Alignment

Flow of Our EBMT System

  • Searching Correspondence Candidates
    • Fine alignment is efficient in translation
    • Search candidates as much as possible using variety of linguistic information
      • Bilingual dictionaries
      • Transliteration (Katakana words, NEs)
      • ローズワイン → rosuwain ⇔ rose wine (similarity:0.78)
      • 新宿 → shinjuku ⇔ shinjuku (similarity:1.0)
      • Numeral normalization
      • 二百十六万 → 2,160,000 ← 2.16 million
      • Japanese flexible matching(Odani et. al. 2007)
      • Substring co-occurrence measure (Cromieres 2006)
  • Selecting Correspondence Candidates
    • More candidates derive more ambiguities and improper alignments
    • Necessity of robust alignment method which can align parallel sentences consistently by selecting the adequate candidates set

Translation Examples

Input

交差

(cross)

came

交差点に入る時

私の信号は青でした。

点 で 、

(point)

at me

突然

from the side

(suddenly)

飛び出して 来た のです 。

at the intersection

Structural Phrase AlignmentBased on Consistency Criteria

(rush out)

交差

家 に

(cross)

to remove

点 に

(house)

入る

(point)

when

入る

(enter)

entering

(enter)

(when)

(when)

脱ぐ

(put off)

a house

私 の

(my)

私 の

(my)

my

信号 は

(signal)

signature

サイン

(signal)

Language Models

(blue)

信号 は

でした 。

traffic

(signal)

(was)

Output

The light

(blue)

My traffic light was green when entering the intersection.

でした 。

was green

(was)

Selecting Correspondence Candidates

Using Consistency Score and Dependency Type

Ambiguities!

日本 で

you

(in Japan)

Near!

will have to file

保険

(insurance)

insurance

会社 に 対して

Far!

Far!

(to company)

an claim

保険

(insurance)

insurance

請求 の

(claim)

申し立て が

with the office

Near!

(instance)

Improper alignments!

in Japan

可能ですよ

How to reflect the inconsistency?

(you can)

Dependency Type Distance

Distribution of the distance of alignment pairs in hand-annotated data (Mainichi newspaper 40K sentence pairs) [Uchimoto04]

“Near-Near” pair → Positive Score

“Far-Far” pair → 0

“Near-Far” pair → Negative Score

Consistency Score Function

Experimental Result

Quality of Other Language Pairs

  • 500 test sentences from Mainichi newspaper parallel corpus
  • Bilingual dictionary: KENKYUSYA J-E/J-E 500K entries
  • Evaluation criteria: Precision / Recall / F-measure
  • Character-base for Japanese, word-base for English

(AER)

Conclusion

  • Proposed a new phrase alignment method using consistency criteria.
  • Enough alignment accuracy compared to other language pairs.
  • We need to acquire the parameters automatically by machine learning.
  • We are planning to evolve the framework which revises the parse result.

* Using 300K newspaper domain bi-sentences for training

(There is a translation demos in exhibition corner by NICT which is using our system!)