Example-Based Machine Translation Based on the Synchronous SSTC Annotation Schema
Download
1 / 44

The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique - PowerPoint PPT Presentation


  • 113 Views
  • Uploaded on

Example-Based Machine Translation Based on the Synchronous SSTC Annotation Schema. The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique. Computer Aided Translation Unit School of Computer Sciences U niversity S cience M alaysia. Presentation Outline.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique' - nyssa-atkins


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Example-Based Machine Translation Based on the Synchronous SSTC Annotation Schema

The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

Computer Aided Translation Unit

School of Computer Sciences

University Science Malaysia


Presentation Outline SSTC Annotation Schema

  • Introduction

  • Structured String-Tree Correspondence (SSTC)

  • Synchronous Structured String-Tree Correspondence (SSTC)

  • EBMTbased on synchronousSSTC

  • The Construction of a BKB Based on the Synchronous SSTC

  • Bitext World-level Mapping (Word Alignment)

  • Bitext Synchronous Parsing Technique


interval of the substring that corresponds to the node. SSTC Annotation Schema

interval of the substring that corresponds to the subtree having the node as root.

X:SNODE =

Y:STREE =

Tree

Tree

2-3

0-4

eat(2-3 /0-4)

eat(2-3/0-4)

mice

(3-4/3-4)

cats

(1-2/0-2)

cats

(1-2/0-2)

mice

(3-4/3-4)

all

(0-1/0-1)

all

(0-1/0-1)

String

String

2 eat 3

0 all 1 cats 2 eat 3 mice 4

all cats eat mice

0-1 1-2 2-3 3-4

0 all 1 cats 2 eat 3 mice 4

X:SNODE

Y:STREE

TheStructured String-Tree Correspondence (SSTC)

SSTC= string + arbitrary tree structure + correspondence

Correspondence= node(X/Y)


Tree SSTC Annotation Schema

Tree

eat(2-3/0-4)

eat(2-3/0-4)

cats

(1-2/0-2)

mice

(3-4/3-4)

cats

(1-2/0-2)

mice

(3-4/3-4)

0-2

1-2

all

(0-1/0-1)

all

(0-1/0-1)

String

String

1cats2

0all 1 cats 2

0 all 1 cats 2 eat 3 mice 4

all cats eat 3 mice 4

X:STREE

X:SNODE


English source sentence SSTC Annotation Schema“ he picks the ball up”

Malay target sentence “dia kutip bola itu”

Translation units

MALAY

ENGLISH

E

M

IndexStree

pick[v] up[p]

(1-2+4-5/0-5)

kutip[v]

(1-2/0-4)

(0-5,0-4)

(0-1,0-1)

(2-4,2-4)

he[n]

(0-1/0-1)

ball[n]

(3-4/2-4)

dia[n]

(0-1/0-1)

bola[n]

(2-3/2-4)

(2-3,3-4)

IndexSnode

(1-2+4-5,1-2)

the[det]

(2-3/2-3)

itu[det]

(3-4/3-4)

(0-1,0-1)

(3-4,2-3)

0he1pick2the3ball4up5

0dia1kutip2bola3itu4

(2-3,3-4)


English source sentence SSTC Annotation Schema“ I did not give it to him”

French target sentence “Je ne le lui ai pas donné”

ENGLISH

Translation units

FRENCH

IndexStree

F

E

not [neg]

(2-3/0-7)

ne[neg] pas[neg]

(1-2+5-6/0-7)

(0-7,0-7)

(0-2+3-7,

0-1+2-5+6-7)

Did [v] give [v]

(1-2+3-4/3-7)

ai[v]donné [v]

(4-5+6-7/0-1+2-5+6-7)

(0-1,0-1)

:

IndexSnode

I [n]

(0-1/0-1)

it [n]

(4-5/4-5)

to [p]

(5-6/5-7)

Je [n]

(0-1/0-1)

le [n]

(2-3/2-3)

lui [n]

(3-4/3-4)

(2-3, 1-2+5-6)

(1-2+3-4,

4-5+6-7)

him [n]

(6-7/6-7)

(0-1,0-1)

(4-5,2-3)

(5-6, - )

0I1did2not3give4it5to6him7

0Je1ne2le3lui4ai5pas6donné7

(6-7,3-4)


English source sentence SSTC Annotation Schema“ hopefully Kim miss Dale”

French target sentence “on espére que Dale manque á Kim”

ENGLISH

FRENCH

F

E

miss [v](2-3/0-4)

manque[v] á[p]

(4-5+5-6/0-7)

hopefully [adv]

(0-1/0-1)

Dale [n]

(3-4/3-4)

on[n]espére[v]que[c]

(0-1+1-2+2-3/0-3)

Kim [n]

(6-7/6-7)

Kim [n]

(1-2/1-2)

Dale [n]

(3-4/3-4)

0 hopefully1 Kim2 miss3 Dale4

0on1espére2que3Dale4manque5á6Kim7

IndexStree

(0-1,0-3)

(3-4,3-4)

(1-2,6-7)

(0-4,0-7)

Translation units

(1-2,6-7)

IndexSnode

(0-1,0-1+1-2+2-3)

(2-3,4-5+5-6)

(3-4,3-4)


E SSTC Annotation Schemaxample-Based Machine Translation (EBMT)

EBMT is the case-based reasoning approach to MT

EBMT uses translated examples of similar sentences to translate a given Source sentence into the target sentence.


Find closest related SSTC Annotation SchemaSL examples

Retrieve Corresponding TL examples

Combination

Source

sentence

Target

sentence

For Source language

For Target language

correspondence

BKB

The general ArchitectureforEBMT


Tagged source sentence SSTC Annotation Schema

source sentence

tagger

List of Sub-synchronous SSTCs constructed from the chosen example

List of sub-synchronous SSTCs generated based on the source sentence

BKB

A chosen closest synchronous SSTC example

The resultant synchronous

SSTC

target sentence

EBMT based on synchronous SSTC.

Different senses for the word “bank” :

bank 1: a land beside the river.

bank 2: a place to keep money.

E.g:The1 man2 keep1 his1 money1 in1 the1 bank2.

Replacement & Combination


1 SSTC Annotation Schema

2

English sentence:

The lamp is off.

Malay translation:

Lampu itu padam.

English sentence:

He pick the ball up.

Malay translation:

Dia kutip bola itu.

3

4

English sentence:

The green signal turn on.

Malay translation:

Isyarat hijau itu bertukar.

English sentence:

The old man drink tea.

Malay translation:

Lelaki tua itu minum teh.

Source sentence: The old man picks the green lamp up


1E SSTC Annotation Schema

IndexStree

1M

(0-5,0-4)

pick(1)[v] up(1)[p]

(1-2+4-5/0-5)

kutip(1)[v]

(1-2/0-4)

(0-1,0-1)

English sentence:

He pick the ball up.

Malay translation:

Dia kutip bola itu.

(2-4,2-4)

(2-3,3-4)

dia(1)[n]

(0-1/0-1)

he(1)[n]

(0-1/0-1)

bola(1)[n]

(2-3/2-4)

ball(1)[n]

(3-4/2-4)

IndexSnode

(1-2+4-5,1-2)

itu(1)[det]

(3-4/3-4)

the(1)[det]

(2-3/2-3)

(0-1,0-1)

(3-4,2-3)

(2-3,3-4)

0he1pick2the3ball4up5

0dia1kutip2bola3itu4

2M

2E

IndexStree

(0-4,0-4)

is[v](2) off(1)[adv]

(2-3+3-4/0-4)

padam(1)[v]

(2-3/0-3)

(0-2,0-2)

(0-4,0-4)

lamp(1)[n]

(1-2/0-2)

lampu(1)[n]

(0-1/0-2)

(0-1,1-2)

IndexSnode

(2-3+3-4,2-3)

the(1)[det]

(0-1/0-1)

itu(1)[det]

(1-2/1-2)

(1-2,0-1)

(0-4,0-4)

0lampu1itu2padam3

0the1lamp2is3off4

(0-1,1-2)

Set of synchronous SSTCsrepresents Example-base.

English sentence:

The lamp is off.

Malay translation:

Lampu itu padam.


3E SSTC Annotation Schema

3M

IndexStree

turn(1)[v] on(1)[adv]

(3-4+4-5/0-5)

bertukar(2)[v]

(3-4/0-4)

(0-5,0-4)

(0-3,0-3)

(0-1,2-3)

English sentence:

The green signal turn on.

Malay translation:

Isyarat hijau itu bertukar.

signal(2)[n]

(2-3/0-3)

isyarat(1)[n]

(0-1/0-3)

(1-2,1-2)

IndexSnode

(3-4+4-5,3-4)

hijau(1)[adj]

(1-2/1-2)

itu(1)[det]

(2-3/2-3)

green(1)[adj]

(1-2/1-2)

the(1)[det]

(0-1/0-1)

(2-3,0-1)

(0-1,2-3)

(1-2,1-2)

0the1green2signal3turn4on5

0Isyarat1hijau2itu3bertukar4

4E

IndexStree

4M

drink (1)[v]

(3-4/0-5)

(0-5,0-5)

minum (1)[v]

(3-4/0-5)

(0-3,0-3)

(0-1,2-3)

man (1)[n]

(2-3/0-3)

(1-2,1-2)

tea (1)[n]

(4-5/4-5)

lelaki (1)[n]

(0-1/0-3)

teh (1)[n]

(4-5/4-5)

(4-5,4-5)

IndexSnode

the (1)[det]

(0-1/0-1)

old (1)[adj]

(1-2/1-2)

(3-4,3-4)

itu (1)[det]

(2-3/2-3)

tua (1)[adj]

(1-2/1-2)

(2-3,0-1)

(0-1,2-3)

0the1old2man3drink4 tea5

(1-2,1-2)

0lelaki1tua2itu3minum4teh5

(4-5,4-5)

English sentence:The old man drinks tea.

Malay translation: Lelaki tua itu minum teh.


(2) SSTC Annotation Schema

pick[v] up[p]

(2-3+5-6/0-6)

pick[v] up[p]

(2-3+5-6/0-6)

(1)

turn[v]on[adv] (3-4+4-5/0-5)

man[n]

(2-3/0-3 )

boy[n]

(1-2/0-2)

ball[n]

(4-5/3-5)

signal[n]

(2-3/0-3)

the[det]

(0-1/0-1)

old[adj]

(1-2/1-2)

the[det]

(0-1/0-1)

the[det]

(3-4/3-4)

green[adj]

(1-2/1-2)

the[det]

(0-1/0-1)

green[adj]

(1-2/1-2)

pick[v]

(3-4/ 0-8 )

0the1green2signal3turn4on5

0the1boy2pick3the4ball5up6

(4)

(3)

drink[v]

(3-4/0-5)

is[v]off[adv]

(2-3+3-4/0-4)

lamp[n]

(6-7/ 4-7 )

man[n]

(2-3/0-3)

man[n]

(2-3/0-3)

tea[n]

(4-5/4-5)

lamp[n]

(1-2/0-2)

lamp[n]

(1-2/0-2)

green[adj]

(5-6/5-6)

the[det]

(4-5/4-5)

old[adj]

(1-2/1-2)

the[det]

(0-1/0-1)

the[det]

(0-1/0-1)

old[adj]

(1-2/1-2)

the[det]

(0-1/0-1)

up[p]

(7-8/-)

0the1old2man3drink4tea5

0the1lamp2is3off4

Source: the old man picks the green lamp up


man[n] SSTC Annotation Schema

(2-3/0-3 )

man(1)[n]

(2-3/0-3)

IndexStree

(1)

lelaki (1)[n]

(0-1/0-3)

(0-3,0-3)

(0-1,2-3)

(1-2,1-2)

the(1)[det]

(0-1/0-1)

tua (1)[adj]

(1-2/1-2)

old(1)[adj]

(1-2/1-2)

itu (1)[det]

(2-3/2-3)

the[det]

(0-1/0-1)

old[adj]

(1-2/1-2)

IndexSnode

(2-3,0-1)

0the1old2man3

(0-1,2-3)

0lelaki1tua2itu3

(1-2,1-2)

IndexStree

(2)

kutip(1)[v]

(3-4/3-4)

pick(1)[v]

(3-4/3-4)

pick[v]

(3-4/ 0-8 )

(3-4,3-4)

IndexSnode

3pick4

3kutip4

(3-4,3-4)

lamp[n]

(6-7/ 4-7 )

IndexStree

(3)

lamp(1)[n]

(6-7/4-7)

lampu(1)[n]

(4-5/4-7)

(4-7,4-7)

(4-5,6-7)

green(1)adj]

(5-6/5-6)

itu(1)[det]

(6-7/6-7)

hijau(1)[adj]

(5-6/5-6)

(5-6,5-6)

the(1)[det]

(4-5/4-5)

IndexSnode

green[adj]

(5-6/5-6)

the[det]

(4-5/4-5)

(6-7,4-5)

(4-5,6-7)

4the5green6lamp7

4lampu5hijau6itu7

(5-6,5-6)

IndexStree

(4)

up(1)[p]

(7-8/7-8)

up[p]

(7-8/-)

(7-8,-)

IndexSnode

7up8

(7-8,-)

Sub-synchronous SSTCs for the source sentence


1E SSTC Annotation Schema

IndexStree

1M

pick(1)[v] up(1)[p]

(1-2+4-5/0-5)

kutip(1)[v]

(1-2/0-4)

(0-5,0-4)

IndexStree

(1)

(0-1,0-1)

he(1)[n]

(0-1/0-1)

English sentence:

He pick the ball up.

Malay translation:

Dia kutip bola itu.

dia(1)[n]

(0-1/0-1)

(0-1,0-1)

(2-4,2-4)

dia(1)[n]

(0-1/0-1)

he(1)[n]

(0-1/0-1)

bola(1)[n]

(2-3/2-4)

(2-3,3-4)

ball(1)[n]

(3-4/2-4)

IndexSnode

0dia1

0he1

IndexSnode

(0-1,0-1)

(1-2+4-5,1-2)

itu(1)[det]

(3-4/3-4)

the(1)[det]

(2-3/2-3)

(0-1,0-1)

(3-4,2-3)

(2)

IndexStree

0he1pick2the3ball4up5

0dia1kutip2bola3itu4

(2-3,3-4)

kutip(1)[v]

(1-2/0-4)

pick(1)[v]

(1-2/0-5)

(0-5,0-4)

IndexSnode

1pick2

1kutip2

(1-2,1-2)

(3)

bula(1)[n]

(2-3/2-4)

IndexStree

ball(1)[n]

(3-4/2-4)

(2-4,2-4)

(2-3,3-4)

itu (1)[det]

(3-4/3-4)

the(1)[det]

(2-3/2-3)

IndexSnode

(2-3,0-1)

2bula3itu4

2the3ball4

(3-4,2-3)

IndexStree

(4)

up(1)[p]

(4-5/ -)

(- , -)

IndexSnode

4up5

(4-5, -)

Selected closed example

Sub-synchronous SSTCs derived from the example


Sub-synchronous SSTC Annotation SchemaSSTCs.

Example sentence

Source sentence

man(1)[n]

(2-3/0-3)

IndexStree

IndexStree

(1)

lelaki (1)[n]

(0-1/0-3)

(1)

he(1)[n]

(0-1/0-1)

dia(1)[n]

(0-1/0-1)

(0-3,0-3)

(0-1,0-1)

(0-1,2-3)

IndexSnode

(1-2,1-2)

the(1)[det]

(0-1/0-1)

tua (1)[adj]

(1-2/1-2)

old(1)[adj]

(1-2/1-2)

itu (1)[det]

(2-3/2-3)

0dia1

0he1

IndexSnode

(0-1,0-1)

(2-3,0-1)

0the1old2man3

(0-1,2-3)

0lelaki1tua2itu3

(1-2,1-2)

(2)

IndexStree

IndexStree

(2)

kutip(1)[v]

(3-4/3-4)

pick(1)[v]

(3-4/3-4)

kutip(1)[v]

(1-2/0-4)

pick(1)[v]

(1-2/0-5)

(0-5,0-4)

(3-4,3-4)

IndexSnode

IndexSnode

1pick2

1kutip2

3pick4

3kutip4

(1-2,1-2)

(3-4,3-4)

(3)

IndexStree

IndexStree

(3)

lamp(1)[n]

(6-7/4-7)

bula(1)[n]

(2-3/2-4)

lampu(1)[n]

(4-5/4-7)

ball(1)[n]

(3-4/2-4)

(2-4,2-4)

(4-7,4-7)

(4-5,6-7)

(2-3,3-4)

green(1)adj]

(5-6/5-6)

itu(1)[det]

(6-7/6-7)

hijau(1)[adj]

(5-6/5-6)

(5-6,5-6)

the(1)[det]

(4-5/4-5)

IndexSnode

the(1)[det]

(2-3/2-3)

itu (1)[det]

(3-4/3-4)

IndexSnode

(2-3,0-1)

(6-7,4-5)

(3-4,2-3)

2the3ball4

2bula3itu4

(4-5,6-7)

4the5green6lamp7

4lampu5hijau6itu7

(5-6,5-6)

IndexStree

(4)

IndexStree

(4)

up(1)[p]

(7-8/7-8)

up(1)[p]

(4-5/ -)

(- , -)

(7-8,-)

IndexSnode

IndexSnode

7up8

4up5

(7-8,-)

(4-5, -)


Source part SSTC Annotation Schema

Example part

IndexStree

IndexStree

pick(1)[v]

(3-4/3-4)

kutip(1)[v]

(3-4/3-4)

kutip(1)[v]

(1-2/0-4)

pick(1)[v]

(1-2/0-5)

(0-5,0-4)

(3-4,3-4)

(2)

(2)

IndexSnode

IndexSnode

3pick4

3kutip4

1pick2

1kutip2

(1-2,1-2)

(3-4,3-4)

Replacement

1E

1M

1E

1E

1M

1M

IndexStree

IndexStree

IndexStree

(0-5,0-4)

(0-5,0-4)

(0-5,0-4)

(0-5,0-4)

kutip(1)[v]

(1-2/0-4)

kutip(1)[v]

(3-4/3-4)

Pick(1)[v]

(1-2/0-5)

pick (1)[v]

kutip(1)[v]

(1-2/0-4)

kutip(1)[v]

(1-2/0-4)

pick(1)[v] up(1)[p]

(3-4+4-5/3-4)

pick (1)[v] up(1)[p]

(1-2+4-5/0-5)

pick(1)[v] up(1)[p]

(1-2+4-5/0-5)

(0-1,0-1)

(0-1,0-1)

(0-1,0-1)

1-2

0-5

(2-4,2-4)

(2-4,2-4)

(2-4,2-4)

(2-3,3-4)

dia(1)[n]

(0-1/0-1)

bola(1)[n]

(2-3/2-4)

(2-3,3-4)

(2-3,3-4)

ball(1)[n]

(3-4/2-4)

dia(1)[n]

(0-1/0-1)

dia(1)[n]

(0-1/0-1)

bola(1)[n]

(2-3/2-4)

bola(1)[n]

(2-3/2-4)

ball(1)[n]

(3-4/2-4)

ball(1)[n]

(3-4/2-4)

he(1)[n]

(0-1/0-1)

he(1)[n]

(0-1/0-1)

he(1)[n]

(0-1/0-1)

IndexSnode

IndexSnode

IndexSnode

(1-2 ,1-2)

(1-2+4-5,1-2)

(1-2+4-5,1-2)

(1-2+4-5,1-2)

itu(1)[det]

(3-4/3-4)

itu(1)[det]

(3-4/3-4)

itu(1)[det]

(3-4/3-4)

the(1)[det]

(2-3/2-3)

the(1)[det]

(2-3/2-3)

the(1)[det]

(2-3/2-3)

(0-1,0-1)

(0-1,0-1)

(0-1,0-1)

(3-4,2-3)

(3-4,2-3)

(3-4,2-3)

he pick the ball up

0-1 1-2 2-3 3-4 4-5

he pick the ball up

0-1 1-2 2-3 3-4 4-5

dia kutip bola itu

0-1 1-2 2-3 3-4

dia kutip bola itu

0-1 1-2 2-3 3-4

(2-3,3-4)

(2-3,3-4)

(2-3,3-4)

0he1pick2the3ball4up5

0dia1kutip2bola3itu4


Source part SSTC Annotation Schema

Example part

IndexStree

man(1)[n]

(2-3/0-3)

lelaki (1)[n]

(0-1/0-3)

(1)

(1)

(0-3,0-3)

IndexStree

(0-1,2-3)

he(1)[n]

(0-1/0-1)

dia(1)[n]

(0-1/0-1)

(0-1,0-1)

(1-2,1-2)

the(1)[det]

(0-1/0-1)

old(1)[adj]

(1-2/1-2)

tua (1)[adj]

(1-2/1-2)

itu (1)[det]

(2-3/2-3)

IndexSnode

IndexSnode

(2-3,0-1)

(0-1,0-1)

0dia1

0he1

(0-1,2-3)

0the1old2man3

0lelaki1tua2itu3

(1-2,1-2)

Replacement

1E

1M

IndexStree

1E

1E

1M

1M

1E

1M

IndexStree

IndexStree

IndexStree

(0-5,0-4)

kutip(1)[v]

(3-4/3-4)

(0-5,0-4)

(0-5,0-4)

(0-5,0-4)

pick(1)[v] up(1)[p]

(3-4+4-5/3-4)

kutip(1)[v]

(3-4/3-4)

kutip(1)[v]

(3-4/3-4)

kutip(1)[v]

(3-4/3-4)

pick(1)[v] up(1)[p]

(3-4+7-8/3-4)

pick(1)[v] up(1)[p]

(3-4+7-8/3-4)

pick(1)[v] up(1)[p]

(3-4+7-8/3-4)

(0-1,0-1)

(0-1,0-1)

(0-1,0-1)

(0-1,0-1)

(0-1,0-1)

(2-4,2-4)

(2-4,2-4)

(2-4,2-4)

(2-4,2-4)

(2-3,3-4)

dia(1)[n]

(0-1/0-1)

bola(1)[n]

(2-3/2-4)

ball(1)[n]

(3-4/2-4)

dia(1)[n]

(0-1/0-1)

he(1)[n]

(0-1/0-1)

he(1)[n]

(0-1/0-1)

(2-3,3-4)

(2-3,3-4)

(2-3,3-4)

he(1)[n]

dia(1)[n]

(0-1/0-1)

dia(1)[n]

(0-1/0-1)

bola(1)[n]

(2-3/2-4)

bola(1)[n]

(2-3/2-4)

ball(1)[n]

(3-4/2-4)

ball(1)[n]

(3-4/2-4)

lelaki(1)[n]

(0-1/0-3)

he(1)[n]

(0-1/0-1)

he(1)[n]

(0-1/0-1)

ball(1)[n]

(3-4/2-4)

bola(1)[n]

(2-3/2-4)

man(1)[n]

(2-3/0-3)

0-1

0-1

IndexSnode

IndexSnode

IndexSnode

IndexSnode

(1-2+4-5,1-2)

itu(1)[det]

(3-4/3-4)

(1-2+4-5,1-2)

(1-2+4-5,1-2)

(1-2+4-5,1-2)

the(1)[det]

(2-3/2-3)

itu(1)[det]

(3-4/3-4)

itu(1)[det]

(3-4/3-4)

the(1)[det]

(0-1/0-1)

old(1)[adj]

(1-2/1-2)

the(1)[det]

(2-3/2-3)

the(1)[det]

(2-3/2-3)

the(1)[det]

(2-3/2-3)

tua (1)[adj]

(1-2/1-2)

itu (1)[det]

(2-3/2-3)

itu(1)[det]

(3-4/3-4)

(0-1,0-1)

(0-1,0-1)

(0-1,0-1)

(0-1,0-1)

(0-1,0-1)

(3-4,2-3)

(3-4,2-3)

(3-4,2-3)

(3-4,2-3)

(2-3,3-4)

he pick the ball up

0-1 3-4 2-3 3-4 7-8

he pick the ball up

0-1 3-4 2-3 3-4 7-8

dia kutip bola itu

0-1 3-4 2-3 3-4

dia kutip bola itu

0-1 3-4 2-3 3-4

(2-3,3-4)

(2-3,3-4)

(2-3,3-4)

0he1pick2the3ball4up5

0dia1kutip2bola3itu4

0the1old2man3pick4the5ball6up7

0lelaki1tua2itu3kutip4bola5itu6


1E SSTC Annotation Schema

1M

IndexStree

(0-5,0-4)

kutip(1)[v]

(3-4/3-4)

pick(1)[v] up(1)[p]

(3-4+7-8/0-8)

(0-1,0-1)

(2-4,2-4)

(2-3,3-4)

(0-1,0-1)

lelaki(1)[n]

(0-1/0-3)

lampu(1)[n]

(0-1/0-3)

(2-4,2-4)

man(1)[n]

(2-3/0-3)

lamp(1)[n]

(2-3/0-3)

(2-3,3-4)

IndexSnode

(1-2+4-5,1-2)

the(1)[det]

(0-1/0-1)

old(1)[adj]

(1-2/1-2)

the(1)[det]

(0-1/0-1)

tua(1)[adj]

(1-2/1-2)

itu(1)[det]

(2-3/2-3)

itu(1)[det]

(2-3/2-3)

green(1)[adj]

(1-2/1-2)

hijau(1)[adj]

(1-2/1-2)

(0-1,0-1)

(3-4,2-3)

(2-3,3-4)

the old man pick the green lamp up

0-1 1-2 2-3 3-4 4-5 5-6 6-7 7-8

lelaki tua itu kutip lampu hijau itu

0-1 1-2 2-3 3-4 4-5 5-6 6-7

(0-1,0-1)

The translation

(3-4,2-3)

(2-3,3-4)

lelaki tua itu kutip lampu hijau itu

lelaki tua itu kutip lampu hijau itu

Generation

The translation for the source sentence is generated from the synchronous SSTC the Malay part, which is the String in the SSTC.


Our approach SSTC Annotation Schemaovercomes these problems

EBMT General Problems

  • How to utilize more than one example to translate one source sentence

The construction of well-formed target language sentences from extracted fragments of a BKB.

  • lack of flexibility in representing translation relations between source and target substrings

The treatment of wild linguistic phenomena, which are non-standard, e.g. crossed dependencies


Transfer Approach to MT SSTC Annotation Schema

transfer

Synthesis

Analysis

Target

Source


The general Architecture SSTC Annotation SchemaforEBMT

Find closest related SL examples

Retrieve Corresponding TL examples

Combination

Source

sentence

Target

sentence

For Source language

For Target language

correspondence

BKB


How to Construct The Bilingual Knowledge Bank SSTC Annotation Schema(BKB)or(Example-Base)

Substantial Reservation !!!


S: SSTC Annotation SchemaEnglish

T: Malay

Idea asas bagi penghuraian berasaskan-contoh adalah mudah: iaitu untuk mencari perwakilan yang sepadan bagi suatu ayat input berdasarkan perwakilan ayat yang serupa dalam pengkalan-contoh.

The basic idea of example-based parsing is very simple: it is to find the corresponding representation for an input sentence based on the representations of similar sentences in the example-base.

  • The Construction of a BKB Based on the Synchronous SSTC

Based on Bitext Synchronous Parsing Technique

  • BiText: Text that is available in two languages.


Bilingual dictionary SSTC Annotation Schema

Sentence level

Bi-text

Phrase level

Alignment Process

word level

English source

Malay target

English source

Malay target

English source

Malay target

( S ( NP . ( ..(..)))

( S ( VP …( ..(..)))

BKB

Synchronous SSTC

SSTC Editor

English source

Malay target

Apple Pie Parser

  • Schema

Parsing & POS Tagging for the English source text

Build the SSTC for Malay target text based on the SSTC for the English source text using the word alignment

Compile the APP output into SSTC for the English source text


Bilingual dictionary SSTC Annotation Schema

Sentence level

Bi-text

Phrase level

Alignment Process

word level

English source

Malay target

English source

Malay target

English source

Malay target

( S ( NP . ( ..(..)))

( S ( VP …( ..(..)))

BKB

Synchronous SSTC

SSTC Editor

English source

Malay target

Apple Pie Parser


Bitext World-level Mapping ( SSTC Annotation SchemaWord Alignment)

Real texts are noisy:

- Fertility = A single word in the source sentence may correspond to zero, one, two or more words in the target sentence and vice versa.

- crossed dependencies (distortion) = Where human translators change and rearrange material so the target output text will not flow well according to the order of the source text.


S: SSTC Annotation SchemaEnglish

T: Malay

0Idea1asas2bagi3penghuraian4berasaskan5-6contoh7adalah 8mudah9:10Iaitu11untuk12 mencari13perwakilan14yang15 sepadan16 bagi17suatu18ayat 19input20berdasarkan21 perwakilan22ayat23yang24 serupa25dalam26pengkalan27- 28contoh29.30

0The1basic2idea3of4example5-6based7parsing8is9very10simple11: 12It13is14to15find16the17corresponding18representation19for20an21input22sentence23based24on25the26representations27of28similar29sentences 30in31the32example33-34base35 .36

±n Context Window Word Alignment

The correspondence between the source and the target is denoted by an interval attached to each subtext according to its offset in the text.


Cognate words SSTC Annotation Schema

Computer

Komputer

Dice coefficient

Dice = 2prob(S,T) / [prob(S) + prob(T)]

  • The probabilities of S and T to occur in the text.

  • The probability of both to co-occur in the same bitext segment.

±n Context Window Word Alignment

Find the TPCs between the source and the target. (Bilingual dictionary)

Bilingual dictionary


contoh(6-7) SSTC Annotation Schema

Example(4-5)

contoh(28-29)

basic(1-2) idea(2-3) of(3-4) example(4-5) – (5-6) based (6-7) parsing (7-8)

bagi(2-3) penghuraian(3-4) berasaskan(4-5) – (5-6) contoh (6-7)

basic(1-2) idea(2-3) of(3-4) example(4-5) – (5-6) based (6-7) parsing (7-8)

– (27-28) contoh(28-29)

±n Context Window Word Alignment

Find out the chains for all possible TPCs for a source word.


For every chain, calculate the weight SSTC Annotation SchemaW:

len(seq): length of continuous sequence of words.

len(gap): length of the gaps between the words in the chain.

len(chain): length of the chain.

contoh(6-7)

W=1.39

Example(4-5)

contoh(28-29)

W=0.60

±n Context Window Word Alignment


S: SSTC Annotation SchemaEnglish

T: Malay

The basic idea of example-based parsing is very simple

0Idea1asas2bagi3penghuraian4berasaskan5-6contoh7adalah 8mudah9:10Iaitu11untuk12 mencari13perwakilan14yang15 sepadan16 bagi17suatu18ayat 19input20berdasarkan21 perwakilan22ayat23yang24 serupa25dalam26pengkalan27- 28contoh29.30

0The1basic2idea3of4example5-6based7parsing8is9very10simple11: 12It13is14to15find16the17corresponding18representation19for20an21input22sentence23based24on25the26representations27of28similar29sentences 30in31the32example33-34base35 .36

Ideaasasbagipenghuraianberasaskan–contohadalahmudah

  • Bitext Synchronous Parsing Technique


Bilingual dictionary SSTC Annotation Schema

Sentence level

Bi-text

Phrase level

Alignment Process

word level

English source

Malay target

English source

Malay target

English source

Malay target

( S ( NP . ( ..(..)))

( S ( VP …( ..(..)))

BKB

Synchronous SSTC

SSTC Editor

English source

Malay target

Apple Pie Parser


Apple Pie Parser ( SSTC Annotation SchemaAPP)

  • It is a bottom-up probabilistic chart parser to find the parse tree for an input text (English).

  • It was developed at New York University.

  • The parser generates a syntactic tree in PennTreeBank bracketing.

  • It is Free, and available to download with the source code.

  • http://cs.nyu.edu/cs/projects/proteus/sekine


APP SSTC Annotation Schema

(S (NP (NPL The basic idea) (PP of (NPL example-based parsing))) (VP is (ADJP very simple)))

Apple Pie Parser (APP)

The basic idea of example-based parsing is very simple

The representation structure and the POS for the source English is obtained


Bilingual dictionary SSTC Annotation Schema

Sentence level

Bi-text

Phrase level

Alignment Process

word level

English source

Malay target

English source

Malay target

English source

Malay target

( S ( NP . ( ..(..)))

( S ( VP …( ..(..)))

BKB

Synchronous SSTC

SSTC Editor

English source

Malay target

Apple Pie Parser


S SSTC Annotation Schema

(Ø/0-11)

Tree

NP

(Ø/0-8)

VP

(Ø/8-11)

is

(8-9/8-9)

PP(1)

(Ø/3-8)

ADJP(1)

(Ø/9-11)

NPL(1)

(Ø/0-3)

of

(3-4/3-4)

NPL(1)

(Ø/4-8)

Very simple

(9-11/9-11)

The basic idea

(0-3/0-3)

Example-based parsing

(4-8/4-8)

String

0the1basic2idea3of4example5-6based7parsing8is9very10simple11

Compile the APP output to SSTC structure

(S (NP (NPL The basic idea) (PP of (NPL example-based parsing))) (VP is (ADJP very simple)))


The basic idea of example-based parsing is very simple SSTC Annotation Schema

Ideaasasbagipenghuraianberasaskan–contohadalahmudah

S

(Ø/0-9)

S

(Ø/0-11)

Tree

Tree

NP

(Ø/0-7)

NP

(Ø/0-8)

VP

(Ø/8-11)

VP

(Ø/7-9)

is

(8-9/8-9)

adalah

(7-8/7-8)

PP(1)

(Ø/3-8)

PP(1)

(Ø/2-7)

ADJP(1)

(Ø/9-11)

ADJP(1)

(Ø/8-9)

NPL(1)

(Ø/0-2)

NPL(1)

(Ø/0-3)

of

(3-4/3-4)

bagi

(2-3/2-3)

NPL(1)

(Ø/3-7)

NPL(1)

(Ø/4-8)

Very simple

(9-11/9-11)

mudah

(8-9/8-9)

The basic idea

(0-3/0-3)

Idea asas

(0-2/0-2)

Penghuraian berasaskan-contoh

(3-7/3-7)

Example-based parsing

(4-8/4-8)

String

String

0idea1asas2bagi3penghuraian4berasaskan5-6contoh7adalah8mudah9

0the1basic2idea3of4example5-6based7parsing8is9very10simple11

Lexical Transfer


Bilingual dictionary SSTC Annotation Schema

Sentence level

Bi-text

Phrase level

Alignment Process

word level

English source

Malay target

English source

Malay target

English source

Malay target

( S ( NP . ( ..(..)))

( S ( VP …( ..(..)))

BKB

Synchronous SSTC

SSTC Editor

English source

Malay target

Apple Pie Parser


File Edit Correspondences Windows SSTC Annotation Schema

S(Ø/0-11)

S(Ø/0-9)

NP

(Ø/0-8)

NP

(Ø/0-7)

VP

(Ø/8-11)

VP

(Ø/7-9)

NPL(1)

(Ø/0-3)

NPL(1)

(Ø/0-2)

PP(1)

(Ø/3-8)

is

(8-9/8-9)

PP(1)

(Ø/2-7)

adalah

(7-8/7-8)

ADJP(1)

(Ø/9-11)

ADJP(1)

(Ø/8-9)

of

(3-4/3-4)

bagi

(2-3/2-3)

NPL(1)

(Ø/4-8)

NPL(1)

(Ø/3-7)

The basic idea

(0-3/0-3)

Idea asas

(0-3/0-3)

Very simple

(9-11/9-11)

mudah

(8-9/8-9)

Example-based parsing

(4-8/4-8)

Penghuraian berasaskan-contoh

(3-7/3-7)

0the1 basic2 idea3 of4 example5 –6 based7 parsing8is9 very10 simple11

0Idea1 asas2 bagi3 penghuraian4 berasaskan5 –6 contoh7adalah 8 mudah9

The synchronous SSTC editor.


Discussion SSTC Annotation Schema

Thank you…..


ad