Human judgements in parallel treebank alignment
This presentation is the property of its rightful owner.
Sponsored Links
1 / 35

Human Judgements in Parallel Treebank Alignment PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Human Judgements in Parallel Treebank Alignment. Martin Volk, Torsten Marek, Yvonne Samuelsson University of Zurich and Stockholm University [email protected] English Syntax Tree. DE – EN Alignment. SMULTRON. S tockholm MUL tilingual TR eebank 1000 sentences in 3 languages (DE-EN-SV)

Download Presentation

Human Judgements in Parallel Treebank Alignment

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Human judgements in parallel treebank alignment

Human Judgements in Parallel TreebankAlignment

Martin Volk, Torsten Marek, Yvonne Samuelsson

University of Zurich and Stockholm University

[email protected]

English syntax tree

English Syntax Tree

Human judgements in parallel treebank alignment





  • Stockholm MULtilingual TReebank

  • 1000 sentences in 3 languages (DE-EN-SV)

    • 500 from Jostein Gaarder’s Sophie’s World(~ 7 500 tokens, 14 tokens/sentence) and

    • 500 from Economy texts (~ 11 000 tokens, 22 tokens/sentence)

      • ABB Quarterly report

      • Rainforest Alliance: Banana Certification Program

      • SEB Annual report

    • Released: January 2008

German annotation

German Annotation

German sentence flat annotation

German sentence: flat annotation

German sentence deepened

German sentence: deepened

English annotation

English Annotation

English syntax tree1

English Syntax Tree

English annotation1

English annotation

  • Follows the Penn Treebank guidelines

  • Slower annotation because of

    • insertion of traces

    • secondary edges

    • deeper trees

Tree alignment

Tree Alignment

Human judgements in parallel treebank alignment

  • Sentence alignment

  • Word alignment

    • input for Statistical MT

  • Phrase alignment

    • linguistically motivated phrases

    • input for Example-based MT

Alignment example

Alignment Example

Tools for parallel treebanks

Tools for Parallel Treebanks

  • creating and editing trees

    • from mono-lingual treebanks

    • PoS-taggers, chunkers, editor, ’tree-enricher’

  • aligning phrases

    • use of word alignment tools

    • tree alignment editor  Stockholm TreeAligner

  • searching across languages

    • TIGER-Search for parallel treebanks  Stockholm TreeAligner

Guidelines for alignment

Guidelines for Alignment

  • Align words and phrases that represent the same meaning and could serve as translation units in an MT system.

  • Align as many words and phrases as possible.

  • Distinguish between exact and approximate alignments.

  • 1:n word / phrase alignments are allowed, but not m:n word / phrase alignments.

  • m:n sentence alignments are allowed.



  • Do not align:

    • die Verwunderung über das Leben

    • their astonishment at the world

  • Do align:

    • was für eine seltsame Welt

    • what an extraordinary world

Specific rules

Specific rules

  • a pronoun in one language shall never be aligned with a full noun in the other

  • names are aligned regardless of spelling, unless the name is changed (fiction)

  • ignore number/case but not voice

Exact vs approximate alignment

Exact vs approximate alignment

  • best vs. ”second-best” translation

  • an acronym in one language shall be aligned as approximate (fuzzy) with a spelled-out term in the other

    • PT – Power Technologies

  • difficult distinctions

    • einer der ersten Tage im Mai – early May

Related research

Related Research

  • Blinker project (Melamed)

  • Prague Czech-English Treebank

  • Example-based MT in Dublin

  • Linköping English-Swedish Treebank



  • 12 students to align 20 tree pairs DE-EN

    • 10 tree pairs from Sophie’s world

    • 10 tree pairs from Economy text

  • advanced CL students

  • received

    • short introduction

    • the written guidelines

Gold standard alignment de en

Gold Standard Alignment (DE-EN)

Experiment results

Experiment: Results

The students created

  • a huge variety in number of alignments

  • Sophie part: from 47 to 125 (ø = 94.3)

  • Econ part: from 62 to 259 (ø = 186.9)

     the 3 students with the lowest numbers were non-native speakers of German

     1 student had misunderstood the task

Experiment results1

Experiment: Results

  • The remaining 8 students had a high overlap with the gold standard (Recall):

    • Sophie part: from 48% to 81% (ø = 68.7%)

    • Econ part: from 66% to 89% (ø = 75.5%)

  • Precision

    • Sophie part: from 81% to 97% (ø = 89.1%)

    • Econ part: from 78% to 94% (ø = 88.2%)



  • students sometimes aligned a word (or some words) with a node.

    • e.g. the word natürlichto the phrase of course

  • students sometimes aligned a German verb group with a single verb form in English

    • e.g. ist zurückzuführenvs. reflecting



based on different grammatical forms:

  • a definite single NP in German with an indefinite plural NP in English

    • der Umsatz vs. revenues

  • a German genitive NP with a PP in English

    • der beiden Divisionenvs. of the two divisions

Missed by all students

Missed by all students

  • alignment of German word to empty token in English

    • wenn sie die Hand ausstreckte vs.

    • herself shaking hands



  • Our alignment guidelines are sufficient for a core of clear alignment decisions.

  • Needed:

    • Better alignment rules with concrete examples.

    • Better support tools (consistency checking).

  • The distinction between exact alignment and approximate alignment is very tricky.

Thank you for your attention

Thank You for Your Attention!

  • Questions???

Applications of parallel treebanks

Applications of Parallel Treebanks

For the Translator

  • corpus for translation studies

    • search tools needed

      For the Computational Linguist

  • input for Example-based Machine Translation

  • evaluation corpus for word, phrase or clause alignment

  • training corpus for transfer rules

Alignment example1

Alignment Example

Parallel treebanking

Parallel Treebanking

SV sentence

DE sentence


- PoS tagger (STTS)

- Chunker (TIGER)

PoS tagger (SUC)

STTS conversion


- Chunker (SWE-TIGER)

flat DE tree

flat SV tree


Deepening + Back conv.

DE tree

SV tree

phrase alignment

  • Login