# The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

Example-Based Machine Translation Based on the Synchronous SSTC Annotation Schema. The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique. Computer Aided Translation Unit School of Computer Sciences U niversity S cience M alaysia. Presentation Outline.

The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

Example-Based Machine Translation Based on the Synchronous SSTC Annotation Schema

The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

Computer Aided Translation Unit

School of Computer Sciences

University Science Malaysia

Presentation Outline SSTC Annotation Schema

• Introduction

• Structured String-Tree Correspondence (SSTC)

• Synchronous Structured String-Tree Correspondence (SSTC)

• EBMTbased on synchronousSSTC

• The Construction of a BKB Based on the Synchronous SSTC

• Bitext World-level Mapping (Word Alignment)

• Bitext Synchronous Parsing Technique

interval of the substring that corresponds to the node. SSTC Annotation Schema

interval of the substring that corresponds to the subtree having the node as root.

X:SNODE =

Y:STREE =

Tree

Tree

2-3

0-4

eat(2-3 /0-4)

eat(2-3/0-4)

mice

(3-4/3-4)

cats

(1-2/0-2)

cats

(1-2/0-2)

mice

(3-4/3-4)

all

(0-1/0-1)

all

(0-1/0-1)

String

String

2 eat 3

0 all 1 cats 2 eat 3 mice 4

all cats eat mice

0-1 1-2 2-3 3-4

0 all 1 cats 2 eat 3 mice 4

X:SNODE

Y:STREE

TheStructured String-Tree Correspondence (SSTC)

SSTC= string + arbitrary tree structure + correspondence

Correspondence= node(X/Y)

Tree SSTC Annotation Schema

Tree

eat(2-3/0-4)

eat(2-3/0-4)

cats

(1-2/0-2)

mice

(3-4/3-4)

cats

(1-2/0-2)

mice

(3-4/3-4)

0-2

1-2

all

(0-1/0-1)

all

(0-1/0-1)

String

String

1cats2

0all 1 cats 2

0 all 1 cats 2 eat 3 mice 4

all cats eat 3 mice 4

X:STREE

X:SNODE

English source sentence SSTC Annotation Schema“ he picks the ball up”

Malay target sentence “dia kutip bola itu”

Translation units

MALAY

ENGLISH

E

M

IndexStree

pick[v] up[p]

(1-2+4-5/0-5)

kutip[v]

(1-2/0-4)

(0-5,0-4)

(0-1,0-1)

(2-4,2-4)

he[n]

(0-1/0-1)

ball[n]

(3-4/2-4)

dia[n]

(0-1/0-1)

bola[n]

(2-3/2-4)

(2-3,3-4)

IndexSnode

(1-2+4-5,1-2)

the[det]

(2-3/2-3)

itu[det]

(3-4/3-4)

(0-1,0-1)

(3-4,2-3)

0he1pick2the3ball4up5

0dia1kutip2bola3itu4

(2-3,3-4)

English source sentence SSTC Annotation Schema“ I did not give it to him”

French target sentence “Je ne le lui ai pas donné”

ENGLISH

Translation units

FRENCH

IndexStree

F

E

not [neg]

(2-3/0-7)

ne[neg] pas[neg]

(1-2+5-6/0-7)

(0-7,0-7)

(0-2+3-7,

0-1+2-5+6-7)

Did [v] give [v]

(1-2+3-4/3-7)

ai[v]donné [v]

(4-5+6-7/0-1+2-5+6-7)

(0-1,0-1)

:

IndexSnode

I [n]

(0-1/0-1)

it [n]

(4-5/4-5)

to [p]

(5-6/5-7)

Je [n]

(0-1/0-1)

le [n]

(2-3/2-3)

lui [n]

(3-4/3-4)

(2-3, 1-2+5-6)

(1-2+3-4,

4-5+6-7)

him [n]

(6-7/6-7)

(0-1,0-1)

(4-5,2-3)

(5-6, - )

0I1did2not3give4it5to6him7

0Je1ne2le3lui4ai5pas6donné7

(6-7,3-4)

English source sentence SSTC Annotation Schema“ hopefully Kim miss Dale”

French target sentence “on espére que Dale manque á Kim”

ENGLISH

FRENCH

F

E

miss [v](2-3/0-4)

manque[v] á[p]

(4-5+5-6/0-7)

(0-1/0-1)

Dale [n]

(3-4/3-4)

on[n]espére[v]que[c]

(0-1+1-2+2-3/0-3)

Kim [n]

(6-7/6-7)

Kim [n]

(1-2/1-2)

Dale [n]

(3-4/3-4)

0 hopefully1 Kim2 miss3 Dale4

0on1espére2que3Dale4manque5á6Kim7

IndexStree

(0-1,0-3)

(3-4,3-4)

(1-2,6-7)

(0-4,0-7)

Translation units

(1-2,6-7)

IndexSnode

(0-1,0-1+1-2+2-3)

(2-3,4-5+5-6)

(3-4,3-4)

E SSTC Annotation Schemaxample-Based Machine Translation (EBMT)

EBMT is the case-based reasoning approach to MT

EBMT uses translated examples of similar sentences to translate a given Source sentence into the target sentence.

Find closest related SSTC Annotation SchemaSL examples

Retrieve Corresponding TL examples

Combination

Source

sentence

Target

sentence

For Source language

For Target language

correspondence

BKB

The general ArchitectureforEBMT

Tagged source sentence SSTC Annotation Schema

source sentence

tagger

List of Sub-synchronous SSTCs constructed from the chosen example

List of sub-synchronous SSTCs generated based on the source sentence

BKB

A chosen closest synchronous SSTC example

The resultant synchronous

SSTC

target sentence

EBMT based on synchronous SSTC.

Different senses for the word “bank” :

bank 1: a land beside the river.

bank 2: a place to keep money.

E.g:The1 man2 keep1 his1 money1 in1 the1 bank2.

Replacement & Combination

English sentence:

The lamp is off.

Malay translation:

English sentence:

He pick the ball up.

Malay translation:

Dia kutip bola itu.

3

4

English sentence:

The green signal turn on.

Malay translation:

Isyarat hijau itu bertukar.

English sentence:

The old man drink tea.

Malay translation:

Lelaki tua itu minum teh.

Source sentence: The old man picks the green lamp up

1E SSTC Annotation Schema

IndexStree

1M

(0-5,0-4)

pick(1)[v] up(1)[p]

(1-2+4-5/0-5)

kutip(1)[v]

(1-2/0-4)

(0-1,0-1)

English sentence:

He pick the ball up.

Malay translation:

Dia kutip bola itu.

(2-4,2-4)

(2-3,3-4)

dia(1)[n]

(0-1/0-1)

he(1)[n]

(0-1/0-1)

bola(1)[n]

(2-3/2-4)

ball(1)[n]

(3-4/2-4)

IndexSnode

(1-2+4-5,1-2)

itu(1)[det]

(3-4/3-4)

the(1)[det]

(2-3/2-3)

(0-1,0-1)

(3-4,2-3)

(2-3,3-4)

0he1pick2the3ball4up5

0dia1kutip2bola3itu4

2M

2E

IndexStree

(0-4,0-4)

(2-3+3-4/0-4)

(2-3/0-3)

(0-2,0-2)

(0-4,0-4)

lamp(1)[n]

(1-2/0-2)

lampu(1)[n]

(0-1/0-2)

(0-1,1-2)

IndexSnode

(2-3+3-4,2-3)

the(1)[det]

(0-1/0-1)

itu(1)[det]

(1-2/1-2)

(1-2,0-1)

(0-4,0-4)

0the1lamp2is3off4

(0-1,1-2)

Set of synchronous SSTCsrepresents Example-base.

English sentence:

The lamp is off.

Malay translation:

3E SSTC Annotation Schema

3M

IndexStree

(3-4+4-5/0-5)

bertukar(2)[v]

(3-4/0-4)

(0-5,0-4)

(0-3,0-3)

(0-1,2-3)

English sentence:

The green signal turn on.

Malay translation:

Isyarat hijau itu bertukar.

signal(2)[n]

(2-3/0-3)

isyarat(1)[n]

(0-1/0-3)

(1-2,1-2)

IndexSnode

(3-4+4-5,3-4)

(1-2/1-2)

itu(1)[det]

(2-3/2-3)

(1-2/1-2)

the(1)[det]

(0-1/0-1)

(2-3,0-1)

(0-1,2-3)

(1-2,1-2)

0the1green2signal3turn4on5

0Isyarat1hijau2itu3bertukar4

4E

IndexStree

4M

drink (1)[v]

(3-4/0-5)

(0-5,0-5)

minum (1)[v]

(3-4/0-5)

(0-3,0-3)

(0-1,2-3)

man (1)[n]

(2-3/0-3)

(1-2,1-2)

tea (1)[n]

(4-5/4-5)

lelaki (1)[n]

(0-1/0-3)

teh (1)[n]

(4-5/4-5)

(4-5,4-5)

IndexSnode

the (1)[det]

(0-1/0-1)

(1-2/1-2)

(3-4,3-4)

itu (1)[det]

(2-3/2-3)

(1-2/1-2)

(2-3,0-1)

(0-1,2-3)

0the1old2man3drink4 tea5

(1-2,1-2)

0lelaki1tua2itu3minum4teh5

(4-5,4-5)

English sentence:The old man drinks tea.

Malay translation: Lelaki tua itu minum teh.

pick[v] up[p]

(2-3+5-6/0-6)

pick[v] up[p]

(2-3+5-6/0-6)

(1)

man[n]

(2-3/0-3 )

boy[n]

(1-2/0-2)

ball[n]

(4-5/3-5)

signal[n]

(2-3/0-3)

the[det]

(0-1/0-1)

(1-2/1-2)

the[det]

(0-1/0-1)

the[det]

(3-4/3-4)

(1-2/1-2)

the[det]

(0-1/0-1)

(1-2/1-2)

pick[v]

(3-4/ 0-8 )

0the1green2signal3turn4on5

0the1boy2pick3the4ball5up6

(4)

(3)

drink[v]

(3-4/0-5)

(2-3+3-4/0-4)

lamp[n]

(6-7/ 4-7 )

man[n]

(2-3/0-3)

man[n]

(2-3/0-3)

tea[n]

(4-5/4-5)

lamp[n]

(1-2/0-2)

lamp[n]

(1-2/0-2)

(5-6/5-6)

the[det]

(4-5/4-5)

(1-2/1-2)

the[det]

(0-1/0-1)

the[det]

(0-1/0-1)

(1-2/1-2)

the[det]

(0-1/0-1)

up[p]

(7-8/-)

0the1old2man3drink4tea5

0the1lamp2is3off4

Source: the old man picks the green lamp up

man[n] SSTC Annotation Schema

(2-3/0-3 )

man(1)[n]

(2-3/0-3)

IndexStree

(1)

lelaki (1)[n]

(0-1/0-3)

(0-3,0-3)

(0-1,2-3)

(1-2,1-2)

the(1)[det]

(0-1/0-1)

(1-2/1-2)

(1-2/1-2)

itu (1)[det]

(2-3/2-3)

the[det]

(0-1/0-1)

(1-2/1-2)

IndexSnode

(2-3,0-1)

0the1old2man3

(0-1,2-3)

0lelaki1tua2itu3

(1-2,1-2)

IndexStree

(2)

kutip(1)[v]

(3-4/3-4)

pick(1)[v]

(3-4/3-4)

pick[v]

(3-4/ 0-8 )

(3-4,3-4)

IndexSnode

3pick4

3kutip4

(3-4,3-4)

lamp[n]

(6-7/ 4-7 )

IndexStree

(3)

lamp(1)[n]

(6-7/4-7)

lampu(1)[n]

(4-5/4-7)

(4-7,4-7)

(4-5,6-7)

(5-6/5-6)

itu(1)[det]

(6-7/6-7)

(5-6/5-6)

(5-6,5-6)

the(1)[det]

(4-5/4-5)

IndexSnode

(5-6/5-6)

the[det]

(4-5/4-5)

(6-7,4-5)

(4-5,6-7)

4the5green6lamp7

4lampu5hijau6itu7

(5-6,5-6)

IndexStree

(4)

up(1)[p]

(7-8/7-8)

up[p]

(7-8/-)

(7-8,-)

IndexSnode

7up8

(7-8,-)

Sub-synchronous SSTCs for the source sentence

Selected closed example

Sub-synchronous SSTCs derived from the example

0dia1kutip2bola3itu4

Our approach SSTC Annotation Schemaovercomes these problems

EBMT General Problems

• How to utilize more than one example to translate one source sentence

The construction of well-formed target language sentences from extracted fragments of a BKB.

• lack of flexibility in representing translation relations between source and target substrings

The treatment of wild linguistic phenomena, which are non-standard, e.g. crossed dependencies

Transfer Approach to MT SSTC Annotation Schema

transfer

Synthesis

Analysis

Target

Source

The general Architecture SSTC Annotation SchemaforEBMT

Find closest related SL examples

Retrieve Corresponding TL examples

Combination

Source

sentence

Target

sentence

For Source language

For Target language

correspondence

BKB

How to Construct The Bilingual Knowledge Bank SSTC Annotation Schema(BKB)or(Example-Base)

Substantial Reservation !!!

S: SSTC Annotation SchemaEnglish

T: Malay

Idea asas bagi penghuraian berasaskan-contoh adalah mudah: iaitu untuk mencari perwakilan yang sepadan bagi suatu ayat input berdasarkan perwakilan ayat yang serupa dalam pengkalan-contoh.

The basic idea of example-based parsing is very simple: it is to find the corresponding representation for an input sentence based on the representations of similar sentences in the example-base.

• The Construction of a BKB Based on the Synchronous SSTC

Based on Bitext Synchronous Parsing Technique

• BiText: Text that is available in two languages.

Bilingual dictionary SSTC Annotation Schema

Sentence level

Bi-text

Phrase level

Alignment Process

word level

English source

Malay target

English source

Malay target

English source

Malay target

( S ( NP . ( ..(..)))

( S ( VP …( ..(..)))

BKB

Synchronous SSTC

SSTC Editor

English source

Malay target

Apple Pie Parser

• Schema

Parsing & POS Tagging for the English source text

Build the SSTC for Malay target text based on the SSTC for the English source text using the word alignment

Compile the APP output into SSTC for the English source text

Bilingual dictionary SSTC Annotation Schema

Sentence level

Bi-text

Phrase level

Alignment Process

word level

English source

Malay target

English source

Malay target

English source

Malay target

( S ( NP . ( ..(..)))

( S ( VP …( ..(..)))

BKB

Synchronous SSTC

SSTC Editor

English source

Malay target

Apple Pie Parser

Bitext World-level Mapping ( SSTC Annotation SchemaWord Alignment)

Real texts are noisy:

- Fertility = A single word in the source sentence may correspond to zero, one, two or more words in the target sentence and vice versa.

- crossed dependencies (distortion) = Where human translators change and rearrange material so the target output text will not flow well according to the order of the source text.

S: SSTC Annotation SchemaEnglish

T: Malay

0The1basic2idea3of4example5-6based7parsing8is9very10simple11: 12It13is14to15find16the17corresponding18representation19for20an21input22sentence23based24on25the26representations27of28similar29sentences 30in31the32example33-34base35 .36

±n Context Window Word Alignment

The correspondence between the source and the target is denoted by an interval attached to each subtext according to its offset in the text.

Cognate words SSTC Annotation Schema

Computer

Komputer

Dice coefficient

Dice = 2prob(S,T) / [prob(S) + prob(T)]

• The probabilities of S and T to occur in the text.

• The probability of both to co-occur in the same bitext segment.

±n Context Window Word Alignment

Find the TPCs between the source and the target. (Bilingual dictionary)

Bilingual dictionary

contoh(6-7) SSTC Annotation Schema

Example(4-5)

contoh(28-29)

basic(1-2) idea(2-3) of(3-4) example(4-5) – (5-6) based (6-7) parsing (7-8)

bagi(2-3) penghuraian(3-4) berasaskan(4-5) – (5-6) contoh (6-7)

basic(1-2) idea(2-3) of(3-4) example(4-5) – (5-6) based (6-7) parsing (7-8)

– (27-28) contoh(28-29)

±n Context Window Word Alignment

Find out the chains for all possible TPCs for a source word.

For every chain, calculate the weight SSTC Annotation SchemaW:

len(seq): length of continuous sequence of words.

len(gap): length of the gaps between the words in the chain.

len(chain): length of the chain.

contoh(6-7)

W=1.39

Example(4-5)

contoh(28-29)

W=0.60

±n Context Window Word Alignment

S: SSTC Annotation SchemaEnglish

T: Malay

The basic idea of example-based parsing is very simple

0The1basic2idea3of4example5-6based7parsing8is9very10simple11: 12It13is14to15find16the17corresponding18representation19for20an21input22sentence23based24on25the26representations27of28similar29sentences 30in31the32example33-34base35 .36

• Bitext Synchronous Parsing Technique

Bilingual dictionary SSTC Annotation Schema

Sentence level

Bi-text

Phrase level

Alignment Process

word level

English source

Malay target

English source

Malay target

English source

Malay target

( S ( NP . ( ..(..)))

( S ( VP …( ..(..)))

BKB

Synchronous SSTC

SSTC Editor

English source

Malay target

Apple Pie Parser

Apple Pie Parser ( SSTC Annotation SchemaAPP)

• It is a bottom-up probabilistic chart parser to find the parse tree for an input text (English).

• It was developed at New York University.

• The parser generates a syntactic tree in PennTreeBank bracketing.

• It is Free, and available to download with the source code.

• http://cs.nyu.edu/cs/projects/proteus/sekine

APP SSTC Annotation Schema

(S (NP (NPL The basic idea) (PP of (NPL example-based parsing))) (VP is (ADJP very simple)))

Apple Pie Parser (APP)

The basic idea of example-based parsing is very simple

The representation structure and the POS for the source English is obtained

Bilingual dictionary SSTC Annotation Schema

Sentence level

Bi-text

Phrase level

Alignment Process

word level

English source

Malay target

English source

Malay target

English source

Malay target

( S ( NP . ( ..(..)))

( S ( VP …( ..(..)))

BKB

Synchronous SSTC

SSTC Editor

English source

Malay target

Apple Pie Parser

S SSTC Annotation Schema

(Ø/0-11)

Tree

NP

(Ø/0-8)

VP

(Ø/8-11)

is

(8-9/8-9)

PP(1)

(Ø/3-8)

(Ø/9-11)

NPL(1)

(Ø/0-3)

of

(3-4/3-4)

NPL(1)

(Ø/4-8)

Very simple

(9-11/9-11)

The basic idea

(0-3/0-3)

Example-based parsing

(4-8/4-8)

String

0the1basic2idea3of4example5-6based7parsing8is9very10simple11

Compile the APP output to SSTC structure

(S (NP (NPL The basic idea) (PP of (NPL example-based parsing))) (VP is (ADJP very simple)))

The basic idea of example-based parsing is very simple SSTC Annotation Schema

S

(Ø/0-9)

S

(Ø/0-11)

Tree

Tree

NP

(Ø/0-7)

NP

(Ø/0-8)

VP

(Ø/8-11)

VP

(Ø/7-9)

is

(8-9/8-9)

(7-8/7-8)

PP(1)

(Ø/3-8)

PP(1)

(Ø/2-7)

(Ø/9-11)

(Ø/8-9)

NPL(1)

(Ø/0-2)

NPL(1)

(Ø/0-3)

of

(3-4/3-4)

bagi

(2-3/2-3)

NPL(1)

(Ø/3-7)

NPL(1)

(Ø/4-8)

Very simple

(9-11/9-11)

mudah

(8-9/8-9)

The basic idea

(0-3/0-3)

Idea asas

(0-2/0-2)

(3-7/3-7)

Example-based parsing

(4-8/4-8)

String

String

0the1basic2idea3of4example5-6based7parsing8is9very10simple11

Lexical Transfer

Bilingual dictionary SSTC Annotation Schema

Sentence level

Bi-text

Phrase level

Alignment Process

word level

English source

Malay target

English source

Malay target

English source

Malay target

( S ( NP . ( ..(..)))

( S ( VP …( ..(..)))

BKB

Synchronous SSTC

SSTC Editor

English source

Malay target

Apple Pie Parser

File Edit Correspondences Windows SSTC Annotation Schema

S(Ø/0-11)

S(Ø/0-9)

NP

(Ø/0-8)

NP

(Ø/0-7)

VP

(Ø/8-11)

VP

(Ø/7-9)

NPL(1)

(Ø/0-3)

NPL(1)

(Ø/0-2)

PP(1)

(Ø/3-8)

is

(8-9/8-9)

PP(1)

(Ø/2-7)

(7-8/7-8)

(Ø/9-11)

(Ø/8-9)

of

(3-4/3-4)

bagi

(2-3/2-3)

NPL(1)

(Ø/4-8)

NPL(1)

(Ø/3-7)

The basic idea

(0-3/0-3)

Idea asas

(0-3/0-3)

Very simple

(9-11/9-11)

mudah

(8-9/8-9)

Example-based parsing

(4-8/4-8)