Example-Based Machine Translation Based on the Synchronous SSTC Annotation Schema
This presentation is the property of its rightful owner.
Sponsored Links
1 / 44

The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique PowerPoint PPT Presentation


  • 73 Views
  • Uploaded on
  • Presentation posted in: General

Example-Based Machine Translation Based on the Synchronous SSTC Annotation Schema. The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique. Computer Aided Translation Unit School of Computer Sciences U niversity S cience M alaysia. Presentation Outline.

Download Presentation

The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

Example-Based Machine Translation Based on the Synchronous SSTC Annotation Schema

The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

Computer Aided Translation Unit

School of Computer Sciences

University Science Malaysia


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

Presentation Outline

  • Introduction

  • Structured String-Tree Correspondence (SSTC)

  • Synchronous Structured String-Tree Correspondence (SSTC)

  • EBMTbased on synchronousSSTC

  • The Construction of a BKB Based on the Synchronous SSTC

  • Bitext World-level Mapping (Word Alignment)

  • Bitext Synchronous Parsing Technique


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

interval of the substring that corresponds to the node.

interval of the substring that corresponds to the subtree having the node as root.

X:SNODE =

Y:STREE =

Tree

Tree

2-3

0-4

eat(2-3 /0-4)

eat(2-3/0-4)

mice

(3-4/3-4)

cats

(1-2/0-2)

cats

(1-2/0-2)

mice

(3-4/3-4)

all

(0-1/0-1)

all

(0-1/0-1)

String

String

2 eat 3

0 all 1 cats 2 eat 3 mice 4

all cats eat mice

0-1 1-2 2-3 3-4

0 all 1 cats 2 eat 3 mice 4

X:SNODE

Y:STREE

TheStructured String-Tree Correspondence (SSTC)

SSTC= string + arbitrary tree structure + correspondence

Correspondence= node(X/Y)


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

Tree

Tree

eat(2-3/0-4)

eat(2-3/0-4)

cats

(1-2/0-2)

mice

(3-4/3-4)

cats

(1-2/0-2)

mice

(3-4/3-4)

0-2

1-2

all

(0-1/0-1)

all

(0-1/0-1)

String

String

1cats2

0all 1 cats 2

0 all 1 cats 2 eat 3 mice 4

all cats eat 3 mice 4

X:STREE

X:SNODE


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

English source sentence “ he picks the ball up”

Malay target sentence “dia kutip bola itu”

Translation units

MALAY

ENGLISH

E

M

IndexStree

pick[v] up[p]

(1-2+4-5/0-5)

kutip[v]

(1-2/0-4)

(0-5,0-4)

(0-1,0-1)

(2-4,2-4)

he[n]

(0-1/0-1)

ball[n]

(3-4/2-4)

dia[n]

(0-1/0-1)

bola[n]

(2-3/2-4)

(2-3,3-4)

IndexSnode

(1-2+4-5,1-2)

the[det]

(2-3/2-3)

itu[det]

(3-4/3-4)

(0-1,0-1)

(3-4,2-3)

0he1pick2the3ball4up5

0dia1kutip2bola3itu4

(2-3,3-4)


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

English source sentence “ I did not give it to him”

French target sentence “Je ne le lui ai pas donné”

ENGLISH

Translation units

FRENCH

IndexStree

F

E

not [neg]

(2-3/0-7)

ne[neg] pas[neg]

(1-2+5-6/0-7)

(0-7,0-7)

(0-2+3-7,

0-1+2-5+6-7)

Did [v] give [v]

(1-2+3-4/3-7)

ai[v]donné [v]

(4-5+6-7/0-1+2-5+6-7)

(0-1,0-1)

:

IndexSnode

I [n]

(0-1/0-1)

it [n]

(4-5/4-5)

to [p]

(5-6/5-7)

Je [n]

(0-1/0-1)

le [n]

(2-3/2-3)

lui [n]

(3-4/3-4)

(2-3, 1-2+5-6)

(1-2+3-4,

4-5+6-7)

him [n]

(6-7/6-7)

(0-1,0-1)

(4-5,2-3)

(5-6, - )

0I1did2not3give4it5to6him7

0Je1ne2le3lui4ai5pas6donné7

(6-7,3-4)


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

English source sentence “ hopefully Kim miss Dale”

French target sentence “on espére que Dale manque á Kim”

ENGLISH

FRENCH

F

E

miss [v](2-3/0-4)

manque[v] á[p]

(4-5+5-6/0-7)

hopefully [adv]

(0-1/0-1)

Dale [n]

(3-4/3-4)

on[n]espére[v]que[c]

(0-1+1-2+2-3/0-3)

Kim [n]

(6-7/6-7)

Kim [n]

(1-2/1-2)

Dale [n]

(3-4/3-4)

0 hopefully1 Kim2 miss3 Dale4

0on1espére2que3Dale4manque5á6Kim7

IndexStree

(0-1,0-3)

(3-4,3-4)

(1-2,6-7)

(0-4,0-7)

Translation units

(1-2,6-7)

IndexSnode

(0-1,0-1+1-2+2-3)

(2-3,4-5+5-6)

(3-4,3-4)


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

Example-Based Machine Translation (EBMT)

EBMT is the case-based reasoning approach to MT

EBMT uses translated examples of similar sentences to translate a given Source sentence into the target sentence.


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

Find closest related SL examples

Retrieve Corresponding TL examples

Combination

Source

sentence

Target

sentence

For Source language

For Target language

correspondence

BKB

The general ArchitectureforEBMT


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

Tagged source sentence

source sentence

tagger

List of Sub-synchronous SSTCs constructed from the chosen example

List of sub-synchronous SSTCs generated based on the source sentence

BKB

A chosen closest synchronous SSTC example

The resultant synchronous

SSTC

target sentence

EBMT based on synchronous SSTC.

Different senses for the word “bank” :

bank 1: a land beside the river.

bank 2: a place to keep money.

E.g:The1 man2 keep1 his1 money1 in1 the1 bank2.

Replacement & Combination


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

1

2

English sentence:

The lamp is off.

Malay translation:

Lampu itu padam.

English sentence:

He pick the ball up.

Malay translation:

Dia kutip bola itu.

3

4

English sentence:

The green signal turn on.

Malay translation:

Isyarat hijau itu bertukar.

English sentence:

The old man drink tea.

Malay translation:

Lelaki tua itu minum teh.

Source sentence: The old man picks the green lamp up


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

1E

IndexStree

1M

(0-5,0-4)

pick(1)[v] up(1)[p]

(1-2+4-5/0-5)

kutip(1)[v]

(1-2/0-4)

(0-1,0-1)

English sentence:

He pick the ball up.

Malay translation:

Dia kutip bola itu.

(2-4,2-4)

(2-3,3-4)

dia(1)[n]

(0-1/0-1)

he(1)[n]

(0-1/0-1)

bola(1)[n]

(2-3/2-4)

ball(1)[n]

(3-4/2-4)

IndexSnode

(1-2+4-5,1-2)

itu(1)[det]

(3-4/3-4)

the(1)[det]

(2-3/2-3)

(0-1,0-1)

(3-4,2-3)

(2-3,3-4)

0he1pick2the3ball4up5

0dia1kutip2bola3itu4

2M

2E

IndexStree

(0-4,0-4)

is[v](2) off(1)[adv]

(2-3+3-4/0-4)

padam(1)[v]

(2-3/0-3)

(0-2,0-2)

(0-4,0-4)

lamp(1)[n]

(1-2/0-2)

lampu(1)[n]

(0-1/0-2)

(0-1,1-2)

IndexSnode

(2-3+3-4,2-3)

the(1)[det]

(0-1/0-1)

itu(1)[det]

(1-2/1-2)

(1-2,0-1)

(0-4,0-4)

0lampu1itu2padam3

0the1lamp2is3off4

(0-1,1-2)

Set of synchronous SSTCsrepresents Example-base.

English sentence:

The lamp is off.

Malay translation:

Lampu itu padam.


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

3E

3M

IndexStree

turn(1)[v] on(1)[adv]

(3-4+4-5/0-5)

bertukar(2)[v]

(3-4/0-4)

(0-5,0-4)

(0-3,0-3)

(0-1,2-3)

English sentence:

The green signal turn on.

Malay translation:

Isyarat hijau itu bertukar.

signal(2)[n]

(2-3/0-3)

isyarat(1)[n]

(0-1/0-3)

(1-2,1-2)

IndexSnode

(3-4+4-5,3-4)

hijau(1)[adj]

(1-2/1-2)

itu(1)[det]

(2-3/2-3)

green(1)[adj]

(1-2/1-2)

the(1)[det]

(0-1/0-1)

(2-3,0-1)

(0-1,2-3)

(1-2,1-2)

0the1green2signal3turn4on5

0Isyarat1hijau2itu3bertukar4

4E

IndexStree

4M

drink (1)[v]

(3-4/0-5)

(0-5,0-5)

minum (1)[v]

(3-4/0-5)

(0-3,0-3)

(0-1,2-3)

man (1)[n]

(2-3/0-3)

(1-2,1-2)

tea (1)[n]

(4-5/4-5)

lelaki (1)[n]

(0-1/0-3)

teh (1)[n]

(4-5/4-5)

(4-5,4-5)

IndexSnode

the (1)[det]

(0-1/0-1)

old (1)[adj]

(1-2/1-2)

(3-4,3-4)

itu (1)[det]

(2-3/2-3)

tua (1)[adj]

(1-2/1-2)

(2-3,0-1)

(0-1,2-3)

0the1old2man3drink4 tea5

(1-2,1-2)

0lelaki1tua2itu3minum4teh5

(4-5,4-5)

English sentence:The old man drinks tea.

Malay translation: Lelaki tua itu minum teh.


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

(2)

pick[v] up[p]

(2-3+5-6/0-6)

pick[v] up[p]

(2-3+5-6/0-6)

(1)

turn[v]on[adv] (3-4+4-5/0-5)

man[n]

(2-3/0-3 )

boy[n]

(1-2/0-2)

ball[n]

(4-5/3-5)

signal[n]

(2-3/0-3)

the[det]

(0-1/0-1)

old[adj]

(1-2/1-2)

the[det]

(0-1/0-1)

the[det]

(3-4/3-4)

green[adj]

(1-2/1-2)

the[det]

(0-1/0-1)

green[adj]

(1-2/1-2)

pick[v]

(3-4/ 0-8 )

0the1green2signal3turn4on5

0the1boy2pick3the4ball5up6

(4)

(3)

drink[v]

(3-4/0-5)

is[v]off[adv]

(2-3+3-4/0-4)

lamp[n]

(6-7/ 4-7 )

man[n]

(2-3/0-3)

man[n]

(2-3/0-3)

tea[n]

(4-5/4-5)

lamp[n]

(1-2/0-2)

lamp[n]

(1-2/0-2)

green[adj]

(5-6/5-6)

the[det]

(4-5/4-5)

old[adj]

(1-2/1-2)

the[det]

(0-1/0-1)

the[det]

(0-1/0-1)

old[adj]

(1-2/1-2)

the[det]

(0-1/0-1)

up[p]

(7-8/-)

0the1old2man3drink4tea5

0the1lamp2is3off4

Source: the old man picks the green lamp up


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

man[n]

(2-3/0-3 )

man(1)[n]

(2-3/0-3)

IndexStree

(1)

lelaki (1)[n]

(0-1/0-3)

(0-3,0-3)

(0-1,2-3)

(1-2,1-2)

the(1)[det]

(0-1/0-1)

tua (1)[adj]

(1-2/1-2)

old(1)[adj]

(1-2/1-2)

itu (1)[det]

(2-3/2-3)

the[det]

(0-1/0-1)

old[adj]

(1-2/1-2)

IndexSnode

(2-3,0-1)

0the1old2man3

(0-1,2-3)

0lelaki1tua2itu3

(1-2,1-2)

IndexStree

(2)

kutip(1)[v]

(3-4/3-4)

pick(1)[v]

(3-4/3-4)

pick[v]

(3-4/ 0-8 )

(3-4,3-4)

IndexSnode

3pick4

3kutip4

(3-4,3-4)

lamp[n]

(6-7/ 4-7 )

IndexStree

(3)

lamp(1)[n]

(6-7/4-7)

lampu(1)[n]

(4-5/4-7)

(4-7,4-7)

(4-5,6-7)

green(1)adj]

(5-6/5-6)

itu(1)[det]

(6-7/6-7)

hijau(1)[adj]

(5-6/5-6)

(5-6,5-6)

the(1)[det]

(4-5/4-5)

IndexSnode

green[adj]

(5-6/5-6)

the[det]

(4-5/4-5)

(6-7,4-5)

(4-5,6-7)

4the5green6lamp7

4lampu5hijau6itu7

(5-6,5-6)

IndexStree

(4)

up(1)[p]

(7-8/7-8)

up[p]

(7-8/-)

(7-8,-)

IndexSnode

7up8

(7-8,-)

Sub-synchronous SSTCs for the source sentence


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

1E

IndexStree

1M

pick(1)[v] up(1)[p]

(1-2+4-5/0-5)

kutip(1)[v]

(1-2/0-4)

(0-5,0-4)

IndexStree

(1)

(0-1,0-1)

he(1)[n]

(0-1/0-1)

English sentence:

He pick the ball up.

Malay translation:

Dia kutip bola itu.

dia(1)[n]

(0-1/0-1)

(0-1,0-1)

(2-4,2-4)

dia(1)[n]

(0-1/0-1)

he(1)[n]

(0-1/0-1)

bola(1)[n]

(2-3/2-4)

(2-3,3-4)

ball(1)[n]

(3-4/2-4)

IndexSnode

0dia1

0he1

IndexSnode

(0-1,0-1)

(1-2+4-5,1-2)

itu(1)[det]

(3-4/3-4)

the(1)[det]

(2-3/2-3)

(0-1,0-1)

(3-4,2-3)

(2)

IndexStree

0he1pick2the3ball4up5

0dia1kutip2bola3itu4

(2-3,3-4)

kutip(1)[v]

(1-2/0-4)

pick(1)[v]

(1-2/0-5)

(0-5,0-4)

IndexSnode

1pick2

1kutip2

(1-2,1-2)

(3)

bula(1)[n]

(2-3/2-4)

IndexStree

ball(1)[n]

(3-4/2-4)

(2-4,2-4)

(2-3,3-4)

itu (1)[det]

(3-4/3-4)

the(1)[det]

(2-3/2-3)

IndexSnode

(2-3,0-1)

2bula3itu4

2the3ball4

(3-4,2-3)

IndexStree

(4)

up(1)[p]

(4-5/ -)

(- , -)

IndexSnode

4up5

(4-5, -)

Selected closed example

Sub-synchronous SSTCs derived from the example


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

Sub-synchronous SSTCs.

Example sentence

Source sentence

man(1)[n]

(2-3/0-3)

IndexStree

IndexStree

(1)

lelaki (1)[n]

(0-1/0-3)

(1)

he(1)[n]

(0-1/0-1)

dia(1)[n]

(0-1/0-1)

(0-3,0-3)

(0-1,0-1)

(0-1,2-3)

IndexSnode

(1-2,1-2)

the(1)[det]

(0-1/0-1)

tua (1)[adj]

(1-2/1-2)

old(1)[adj]

(1-2/1-2)

itu (1)[det]

(2-3/2-3)

0dia1

0he1

IndexSnode

(0-1,0-1)

(2-3,0-1)

0the1old2man3

(0-1,2-3)

0lelaki1tua2itu3

(1-2,1-2)

(2)

IndexStree

IndexStree

(2)

kutip(1)[v]

(3-4/3-4)

pick(1)[v]

(3-4/3-4)

kutip(1)[v]

(1-2/0-4)

pick(1)[v]

(1-2/0-5)

(0-5,0-4)

(3-4,3-4)

IndexSnode

IndexSnode

1pick2

1kutip2

3pick4

3kutip4

(1-2,1-2)

(3-4,3-4)

(3)

IndexStree

IndexStree

(3)

lamp(1)[n]

(6-7/4-7)

bula(1)[n]

(2-3/2-4)

lampu(1)[n]

(4-5/4-7)

ball(1)[n]

(3-4/2-4)

(2-4,2-4)

(4-7,4-7)

(4-5,6-7)

(2-3,3-4)

green(1)adj]

(5-6/5-6)

itu(1)[det]

(6-7/6-7)

hijau(1)[adj]

(5-6/5-6)

(5-6,5-6)

the(1)[det]

(4-5/4-5)

IndexSnode

the(1)[det]

(2-3/2-3)

itu (1)[det]

(3-4/3-4)

IndexSnode

(2-3,0-1)

(6-7,4-5)

(3-4,2-3)

2the3ball4

2bula3itu4

(4-5,6-7)

4the5green6lamp7

4lampu5hijau6itu7

(5-6,5-6)

IndexStree

(4)

IndexStree

(4)

up(1)[p]

(7-8/7-8)

up(1)[p]

(4-5/ -)

(- , -)

(7-8,-)

IndexSnode

IndexSnode

7up8

4up5

(7-8,-)

(4-5, -)


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

Source part

Example part

IndexStree

IndexStree

pick(1)[v]

(3-4/3-4)

kutip(1)[v]

(3-4/3-4)

kutip(1)[v]

(1-2/0-4)

pick(1)[v]

(1-2/0-5)

(0-5,0-4)

(3-4,3-4)

(2)

(2)

IndexSnode

IndexSnode

3pick4

3kutip4

1pick2

1kutip2

(1-2,1-2)

(3-4,3-4)

Replacement

1E

1M

1E

1E

1M

1M

IndexStree

IndexStree

IndexStree

(0-5,0-4)

(0-5,0-4)

(0-5,0-4)

(0-5,0-4)

kutip(1)[v]

(1-2/0-4)

kutip(1)[v]

(3-4/3-4)

Pick(1)[v]

(1-2/0-5)

pick (1)[v]

kutip(1)[v]

(1-2/0-4)

kutip(1)[v]

(1-2/0-4)

pick(1)[v] up(1)[p]

(3-4+4-5/3-4)

pick (1)[v] up(1)[p]

(1-2+4-5/0-5)

pick(1)[v] up(1)[p]

(1-2+4-5/0-5)

(0-1,0-1)

(0-1,0-1)

(0-1,0-1)

1-2

0-5

(2-4,2-4)

(2-4,2-4)

(2-4,2-4)

(2-3,3-4)

dia(1)[n]

(0-1/0-1)

bola(1)[n]

(2-3/2-4)

(2-3,3-4)

(2-3,3-4)

ball(1)[n]

(3-4/2-4)

dia(1)[n]

(0-1/0-1)

dia(1)[n]

(0-1/0-1)

bola(1)[n]

(2-3/2-4)

bola(1)[n]

(2-3/2-4)

ball(1)[n]

(3-4/2-4)

ball(1)[n]

(3-4/2-4)

he(1)[n]

(0-1/0-1)

he(1)[n]

(0-1/0-1)

he(1)[n]

(0-1/0-1)

IndexSnode

IndexSnode

IndexSnode

(1-2 ,1-2)

(1-2+4-5,1-2)

(1-2+4-5,1-2)

(1-2+4-5,1-2)

itu(1)[det]

(3-4/3-4)

itu(1)[det]

(3-4/3-4)

itu(1)[det]

(3-4/3-4)

the(1)[det]

(2-3/2-3)

the(1)[det]

(2-3/2-3)

the(1)[det]

(2-3/2-3)

(0-1,0-1)

(0-1,0-1)

(0-1,0-1)

(3-4,2-3)

(3-4,2-3)

(3-4,2-3)

he pick the ball up

0-1 1-2 2-3 3-4 4-5

he pick the ball up

0-1 1-2 2-3 3-4 4-5

dia kutip bola itu

0-1 1-2 2-3 3-4

dia kutip bola itu

0-1 1-2 2-3 3-4

(2-3,3-4)

(2-3,3-4)

(2-3,3-4)

0he1pick2the3ball4up5

0dia1kutip2bola3itu4


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

Source part

Example part

IndexStree

man(1)[n]

(2-3/0-3)

lelaki (1)[n]

(0-1/0-3)

(1)

(1)

(0-3,0-3)

IndexStree

(0-1,2-3)

he(1)[n]

(0-1/0-1)

dia(1)[n]

(0-1/0-1)

(0-1,0-1)

(1-2,1-2)

the(1)[det]

(0-1/0-1)

old(1)[adj]

(1-2/1-2)

tua (1)[adj]

(1-2/1-2)

itu (1)[det]

(2-3/2-3)

IndexSnode

IndexSnode

(2-3,0-1)

(0-1,0-1)

0dia1

0he1

(0-1,2-3)

0the1old2man3

0lelaki1tua2itu3

(1-2,1-2)

Replacement

1E

1M

IndexStree

1E

1E

1M

1M

1E

1M

IndexStree

IndexStree

IndexStree

(0-5,0-4)

kutip(1)[v]

(3-4/3-4)

(0-5,0-4)

(0-5,0-4)

(0-5,0-4)

pick(1)[v] up(1)[p]

(3-4+4-5/3-4)

kutip(1)[v]

(3-4/3-4)

kutip(1)[v]

(3-4/3-4)

kutip(1)[v]

(3-4/3-4)

pick(1)[v] up(1)[p]

(3-4+7-8/3-4)

pick(1)[v] up(1)[p]

(3-4+7-8/3-4)

pick(1)[v] up(1)[p]

(3-4+7-8/3-4)

(0-1,0-1)

(0-1,0-1)

(0-1,0-1)

(0-1,0-1)

(0-1,0-1)

(2-4,2-4)

(2-4,2-4)

(2-4,2-4)

(2-4,2-4)

(2-3,3-4)

dia(1)[n]

(0-1/0-1)

bola(1)[n]

(2-3/2-4)

ball(1)[n]

(3-4/2-4)

dia(1)[n]

(0-1/0-1)

he(1)[n]

(0-1/0-1)

he(1)[n]

(0-1/0-1)

(2-3,3-4)

(2-3,3-4)

(2-3,3-4)

he(1)[n]

dia(1)[n]

(0-1/0-1)

dia(1)[n]

(0-1/0-1)

bola(1)[n]

(2-3/2-4)

bola(1)[n]

(2-3/2-4)

ball(1)[n]

(3-4/2-4)

ball(1)[n]

(3-4/2-4)

lelaki(1)[n]

(0-1/0-3)

he(1)[n]

(0-1/0-1)

he(1)[n]

(0-1/0-1)

ball(1)[n]

(3-4/2-4)

bola(1)[n]

(2-3/2-4)

man(1)[n]

(2-3/0-3)

0-1

0-1

IndexSnode

IndexSnode

IndexSnode

IndexSnode

(1-2+4-5,1-2)

itu(1)[det]

(3-4/3-4)

(1-2+4-5,1-2)

(1-2+4-5,1-2)

(1-2+4-5,1-2)

the(1)[det]

(2-3/2-3)

itu(1)[det]

(3-4/3-4)

itu(1)[det]

(3-4/3-4)

the(1)[det]

(0-1/0-1)

old(1)[adj]

(1-2/1-2)

the(1)[det]

(2-3/2-3)

the(1)[det]

(2-3/2-3)

the(1)[det]

(2-3/2-3)

tua (1)[adj]

(1-2/1-2)

itu (1)[det]

(2-3/2-3)

itu(1)[det]

(3-4/3-4)

(0-1,0-1)

(0-1,0-1)

(0-1,0-1)

(0-1,0-1)

(0-1,0-1)

(3-4,2-3)

(3-4,2-3)

(3-4,2-3)

(3-4,2-3)

(2-3,3-4)

he pick the ball up

0-1 3-4 2-3 3-4 7-8

he pick the ball up

0-1 3-4 2-3 3-4 7-8

dia kutip bola itu

0-1 3-4 2-3 3-4

dia kutip bola itu

0-1 3-4 2-3 3-4

(2-3,3-4)

(2-3,3-4)

(2-3,3-4)

0he1pick2the3ball4up5

0dia1kutip2bola3itu4

0the1old2man3pick4the5ball6up7

0lelaki1tua2itu3kutip4bola5itu6


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

1E

1M

IndexStree

(0-5,0-4)

kutip(1)[v]

(3-4/3-4)

pick(1)[v] up(1)[p]

(3-4+7-8/0-8)

(0-1,0-1)

(2-4,2-4)

(2-3,3-4)

(0-1,0-1)

lelaki(1)[n]

(0-1/0-3)

lampu(1)[n]

(0-1/0-3)

(2-4,2-4)

man(1)[n]

(2-3/0-3)

lamp(1)[n]

(2-3/0-3)

(2-3,3-4)

IndexSnode

(1-2+4-5,1-2)

the(1)[det]

(0-1/0-1)

old(1)[adj]

(1-2/1-2)

the(1)[det]

(0-1/0-1)

tua(1)[adj]

(1-2/1-2)

itu(1)[det]

(2-3/2-3)

itu(1)[det]

(2-3/2-3)

green(1)[adj]

(1-2/1-2)

hijau(1)[adj]

(1-2/1-2)

(0-1,0-1)

(3-4,2-3)

(2-3,3-4)

the old man pick the green lamp up

0-1 1-2 2-3 3-4 4-5 5-6 6-7 7-8

lelaki tua itu kutip lampu hijau itu

0-1 1-2 2-3 3-4 4-5 5-6 6-7

(0-1,0-1)

The translation

(3-4,2-3)

(2-3,3-4)

lelaki tua itu kutip lampu hijau itu

lelaki tua itu kutip lampu hijau itu

Generation

The translation for the source sentence is generated from the synchronous SSTC the Malay part, which is the String in the SSTC.


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

Our approach overcomes these problems

EBMT General Problems

  • How to utilize more than one example to translate one source sentence

The construction of well-formed target language sentences from extracted fragments of a BKB.

  • lack of flexibility in representing translation relations between source and target substrings

The treatment of wild linguistic phenomena, which are non-standard, e.g. crossed dependencies


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

Transfer Approach to MT

transfer

Synthesis

Analysis

Target

Source


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

The general ArchitectureforEBMT

Find closest related SL examples

Retrieve Corresponding TL examples

Combination

Source

sentence

Target

sentence

For Source language

For Target language

correspondence

BKB


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

How to Construct The Bilingual Knowledge Bank (BKB)or(Example-Base)

Substantial Reservation !!!


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

S: English

T: Malay

Idea asas bagi penghuraian berasaskan-contoh adalah mudah: iaitu untuk mencari perwakilan yang sepadan bagi suatu ayat input berdasarkan perwakilan ayat yang serupa dalam pengkalan-contoh.

The basic idea of example-based parsing is very simple: it is to find the corresponding representation for an input sentence based on the representations of similar sentences in the example-base.

  • The Construction of a BKB Based on the Synchronous SSTC

Based on Bitext Synchronous Parsing Technique

  • BiText: Text that is available in two languages.


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

Bilingual dictionary

Sentence level

Bi-text

Phrase level

Alignment Process

word level

English source

Malay target

English source

Malay target

English source

Malay target

( S ( NP . ( ..(..)))

( S ( VP …( ..(..)))

BKB

Synchronous SSTC

SSTC Editor

English source

Malay target

Apple Pie Parser

  • Schema

Parsing & POS Tagging for the English source text

Build the SSTC for Malay target text based on the SSTC for the English source text using the word alignment

Compile the APP output into SSTC for the English source text


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

Bilingual dictionary

Sentence level

Bi-text

Phrase level

Alignment Process

word level

English source

Malay target

English source

Malay target

English source

Malay target

( S ( NP . ( ..(..)))

( S ( VP …( ..(..)))

BKB

Synchronous SSTC

SSTC Editor

English source

Malay target

Apple Pie Parser


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

Bitext World-level Mapping (Word Alignment)

Real texts are noisy:

- Fertility = A single word in the source sentence may correspond to zero, one, two or more words in the target sentence and vice versa.

- crossed dependencies (distortion) = Where human translators change and rearrange material so the target output text will not flow well according to the order of the source text.


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

S: English

T: Malay

0Idea1asas2bagi3penghuraian4berasaskan5-6contoh7adalah 8mudah9:10Iaitu11untuk12 mencari13perwakilan14yang15 sepadan16 bagi17suatu18ayat 19input20berdasarkan21 perwakilan22ayat23yang24 serupa25dalam26pengkalan27- 28contoh29.30

0The1basic2idea3of4example5-6based7parsing8is9very10simple11: 12It13is14to15find16the17corresponding18representation19for20an21input22sentence23based24on25the26representations27of28similar29sentences 30in31the32example33-34base35 .36

±n Context Window Word Alignment

The correspondence between the source and the target is denoted by an interval attached to each subtext according to its offset in the text.


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

Cognate words

Computer

Komputer

Dice coefficient

Dice = 2prob(S,T) / [prob(S) + prob(T)]

  • The probabilities of S and T to occur in the text.

  • The probability of both to co-occur in the same bitext segment.

±n Context Window Word Alignment

Find the TPCs between the source and the target. (Bilingual dictionary)

Bilingual dictionary


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

contoh(6-7)

Example(4-5)

contoh(28-29)

basic(1-2) idea(2-3) of(3-4) example(4-5) – (5-6) based (6-7) parsing (7-8)

bagi(2-3) penghuraian(3-4) berasaskan(4-5) – (5-6) contoh (6-7)

basic(1-2) idea(2-3) of(3-4) example(4-5) – (5-6) based (6-7) parsing (7-8)

– (27-28) contoh(28-29)

±n Context Window Word Alignment

Find out the chains for all possible TPCs for a source word.


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

For every chain, calculate the weight W:

len(seq): length of continuous sequence of words.

len(gap): length of the gaps between the words in the chain.

len(chain): length of the chain.

contoh(6-7)

W=1.39

Example(4-5)

contoh(28-29)

W=0.60

±n Context Window Word Alignment


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

S: English

T: Malay

The basic idea of example-based parsing is very simple

0Idea1asas2bagi3penghuraian4berasaskan5-6contoh7adalah 8mudah9:10Iaitu11untuk12 mencari13perwakilan14yang15 sepadan16 bagi17suatu18ayat 19input20berdasarkan21 perwakilan22ayat23yang24 serupa25dalam26pengkalan27- 28contoh29.30

0The1basic2idea3of4example5-6based7parsing8is9very10simple11: 12It13is14to15find16the17corresponding18representation19for20an21input22sentence23based24on25the26representations27of28similar29sentences 30in31the32example33-34base35 .36

Ideaasasbagipenghuraianberasaskan–contohadalahmudah

  • Bitext Synchronous Parsing Technique


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

Bilingual dictionary

Sentence level

Bi-text

Phrase level

Alignment Process

word level

English source

Malay target

English source

Malay target

English source

Malay target

( S ( NP . ( ..(..)))

( S ( VP …( ..(..)))

BKB

Synchronous SSTC

SSTC Editor

English source

Malay target

Apple Pie Parser


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

Apple Pie Parser (APP)

  • It is a bottom-up probabilistic chart parser to find the parse tree for an input text (English).

  • It was developed at New York University.

  • The parser generates a syntactic tree in PennTreeBank bracketing.

  • It is Free, and available to download with the source code.

  • http://cs.nyu.edu/cs/projects/proteus/sekine


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

APP

(S (NP (NPL The basic idea) (PP of (NPL example-based parsing))) (VP is (ADJP very simple)))

Apple Pie Parser (APP)

The basic idea of example-based parsing is very simple

The representation structure and the POS for the source English is obtained


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

Bilingual dictionary

Sentence level

Bi-text

Phrase level

Alignment Process

word level

English source

Malay target

English source

Malay target

English source

Malay target

( S ( NP . ( ..(..)))

( S ( VP …( ..(..)))

BKB

Synchronous SSTC

SSTC Editor

English source

Malay target

Apple Pie Parser


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

S

(Ø/0-11)

Tree

NP

(Ø/0-8)

VP

(Ø/8-11)

is

(8-9/8-9)

PP(1)

(Ø/3-8)

ADJP(1)

(Ø/9-11)

NPL(1)

(Ø/0-3)

of

(3-4/3-4)

NPL(1)

(Ø/4-8)

Very simple

(9-11/9-11)

The basic idea

(0-3/0-3)

Example-based parsing

(4-8/4-8)

String

0the1basic2idea3of4example5-6based7parsing8is9very10simple11

Compile the APP output to SSTC structure

(S (NP (NPL The basic idea) (PP of (NPL example-based parsing))) (VP is (ADJP very simple)))


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

The basic idea of example-based parsing is very simple

Ideaasasbagipenghuraianberasaskan–contohadalahmudah

S

(Ø/0-9)

S

(Ø/0-11)

Tree

Tree

NP

(Ø/0-7)

NP

(Ø/0-8)

VP

(Ø/8-11)

VP

(Ø/7-9)

is

(8-9/8-9)

adalah

(7-8/7-8)

PP(1)

(Ø/3-8)

PP(1)

(Ø/2-7)

ADJP(1)

(Ø/9-11)

ADJP(1)

(Ø/8-9)

NPL(1)

(Ø/0-2)

NPL(1)

(Ø/0-3)

of

(3-4/3-4)

bagi

(2-3/2-3)

NPL(1)

(Ø/3-7)

NPL(1)

(Ø/4-8)

Very simple

(9-11/9-11)

mudah

(8-9/8-9)

The basic idea

(0-3/0-3)

Idea asas

(0-2/0-2)

Penghuraian berasaskan-contoh

(3-7/3-7)

Example-based parsing

(4-8/4-8)

String

String

0idea1asas2bagi3penghuraian4berasaskan5-6contoh7adalah8mudah9

0the1basic2idea3of4example5-6based7parsing8is9very10simple11

Lexical Transfer


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

Bilingual dictionary

Sentence level

Bi-text

Phrase level

Alignment Process

word level

English source

Malay target

English source

Malay target

English source

Malay target

( S ( NP . ( ..(..)))

( S ( VP …( ..(..)))

BKB

Synchronous SSTC

SSTC Editor

English source

Malay target

Apple Pie Parser


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

File Edit Correspondences Windows

S(Ø/0-11)

S(Ø/0-9)

NP

(Ø/0-8)

NP

(Ø/0-7)

VP

(Ø/8-11)

VP

(Ø/7-9)

NPL(1)

(Ø/0-3)

NPL(1)

(Ø/0-2)

PP(1)

(Ø/3-8)

is

(8-9/8-9)

PP(1)

(Ø/2-7)

adalah

(7-8/7-8)

ADJP(1)

(Ø/9-11)

ADJP(1)

(Ø/8-9)

of

(3-4/3-4)

bagi

(2-3/2-3)

NPL(1)

(Ø/4-8)

NPL(1)

(Ø/3-7)

The basic idea

(0-3/0-3)

Idea asas

(0-3/0-3)

Very simple

(9-11/9-11)

mudah

(8-9/8-9)

Example-based parsing

(4-8/4-8)

Penghuraian berasaskan-contoh

(3-7/3-7)

0the1 basic2 idea3 of4 example5 –6 based7 parsing8is9 very10 simple11

0Idea1 asas2 bagi3 penghuraian4 berasaskan5 –6 contoh7adalah 8 mudah9

The synchronous SSTC editor.


The construction of bilingual knowledge bank based on a bitext synchronous parsing technique

Discussion

Thank you…..


  • Login