Human Judgements in Parallel Treebank Alignment - PowerPoint PPT Presentation

Human judgements in parallel treebank alignment
1 / 35

  • Uploaded on
  • Presentation posted in: General

Human Judgements in Parallel Treebank Alignment. Martin Volk, Torsten Marek, Yvonne Samuelsson University of Zurich and Stockholm University English Syntax Tree. DE – EN Alignment. SMULTRON. S tockholm MUL tilingual TR eebank 1000 sentences in 3 languages (DE-EN-SV)

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Human Judgements in Parallel Treebank Alignment

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Human judgements in parallel treebank alignment

Human Judgements in Parallel TreebankAlignment

Martin Volk, Torsten Marek, Yvonne Samuelsson

University of Zurich and Stockholm University

English syntax tree

English Syntax Tree

Human judgements in parallel treebank alignment





  • Stockholm MULtilingual TReebank

  • 1000 sentences in 3 languages (DE-EN-SV)

    • 500 from Jostein Gaarder’s Sophie’s World(~ 7 500 tokens, 14 tokens/sentence) and

    • 500 from Economy texts (~ 11 000 tokens, 22 tokens/sentence)

      • ABB Quarterly report

      • Rainforest Alliance: Banana Certification Program

      • SEB Annual report

    • Released: January 2008

German annotation

German Annotation

German sentence flat annotation

German sentence: flat annotation

German sentence deepened

German sentence: deepened

English annotation

English Annotation

English syntax tree1

English Syntax Tree

English annotation1

English annotation

  • Follows the Penn Treebank guidelines

  • Slower annotation because of

    • insertion of traces

    • secondary edges

    • deeper trees

Tree alignment

Tree Alignment

Human judgements in parallel treebank alignment

  • Sentence alignment

  • Word alignment

    • input for Statistical MT

  • Phrase alignment

    • linguistically motivated phrases

    • input for Example-based MT

Alignment example

Alignment Example

Tools for parallel treebanks

Tools for Parallel Treebanks

  • creating and editing trees

    • from mono-lingual treebanks

    • PoS-taggers, chunkers, editor, ’tree-enricher’

  • aligning phrases

    • use of word alignment tools

    • tree alignment editor  Stockholm TreeAligner

  • searching across languages

    • TIGER-Search for parallel treebanks  Stockholm TreeAligner

Guidelines for alignment

Guidelines for Alignment

  • Align words and phrases that represent the same meaning and could serve as translation units in an MT system.

  • Align as many words and phrases as possible.

  • Distinguish between exact and approximate alignments.

  • 1:n word / phrase alignments are allowed, but not m:n word / phrase alignments.

  • m:n sentence alignments are allowed.



  • Do not align:

    • die Verwunderung über das Leben

    • their astonishment at the world

  • Do align:

    • was für eine seltsame Welt

    • what an extraordinary world

Specific rules

Specific rules

  • a pronoun in one language shall never be aligned with a full noun in the other

  • names are aligned regardless of spelling, unless the name is changed (fiction)

  • ignore number/case but not voice

Exact vs approximate alignment

Exact vs approximate alignment

  • best vs. ”second-best” translation

  • an acronym in one language shall be aligned as approximate (fuzzy) with a spelled-out term in the other

    • PT – Power Technologies

  • difficult distinctions

    • einer der ersten Tage im Mai – early May

Related research

Related Research

  • Blinker project (Melamed)

  • Prague Czech-English Treebank

  • Example-based MT in Dublin

  • Linköping English-Swedish Treebank



  • 12 students to align 20 tree pairs DE-EN

    • 10 tree pairs from Sophie’s world

    • 10 tree pairs from Economy text

  • advanced CL students

  • received

    • short introduction

    • the written guidelines

Gold standard alignment de en

Gold Standard Alignment (DE-EN)

Experiment results

Experiment: Results

The students created

  • a huge variety in number of alignments

  • Sophie part: from 47 to 125 (ø = 94.3)

  • Econ part: from 62 to 259 (ø = 186.9)

     the 3 students with the lowest numbers were non-native speakers of German

     1 student had misunderstood the task

Experiment results1

Experiment: Results

  • The remaining 8 students had a high overlap with the gold standard (Recall):

    • Sophie part: from 48% to 81% (ø = 68.7%)

    • Econ part: from 66% to 89% (ø = 75.5%)

  • Precision

    • Sophie part: from 81% to 97% (ø = 89.1%)

    • Econ part: from 78% to 94% (ø = 88.2%)



  • students sometimes aligned a word (or some words) with a node.

    • e.g. the word natürlichto the phrase of course

  • students sometimes aligned a German verb group with a single verb form in English

    • e.g. ist zurückzuführenvs. reflecting



based on different grammatical forms:

  • a definite single NP in German with an indefinite plural NP in English

    • der Umsatz vs. revenues

  • a German genitive NP with a PP in English

    • der beiden Divisionenvs. of the two divisions

Missed by all students

Missed by all students

  • alignment of German word to empty token in English

    • wenn sie die Hand ausstreckte vs.

    • herself shaking hands



  • Our alignment guidelines are sufficient for a core of clear alignment decisions.

  • Needed:

    • Better alignment rules with concrete examples.

    • Better support tools (consistency checking).

  • The distinction between exact alignment and approximate alignment is very tricky.

Thank you for your attention

Thank You for Your Attention!

  • Questions???

Applications of parallel treebanks

Applications of Parallel Treebanks

For the Translator

  • corpus for translation studies

    • search tools needed

      For the Computational Linguist

  • input for Example-based Machine Translation

  • evaluation corpus for word, phrase or clause alignment

  • training corpus for transfer rules

Alignment example1

Alignment Example

Parallel treebanking

Parallel Treebanking

SV sentence

DE sentence


- PoS tagger (STTS)

- Chunker (TIGER)

PoS tagger (SUC)

STTS conversion


- Chunker (SWE-TIGER)

flat DE tree

flat SV tree


Deepening + Back conv.

DE tree

SV tree

phrase alignment

  • Login