Human judgements in parallel treebank alignment
This presentation is the property of its rightful owner.
Sponsored Links
1 / 35

Human Judgements in Parallel Treebank Alignment PowerPoint PPT Presentation


  • 49 Views
  • Uploaded on
  • Presentation posted in: General

Human Judgements in Parallel Treebank Alignment. Martin Volk, Torsten Marek, Yvonne Samuelsson University of Zurich and Stockholm University [email protected] English Syntax Tree. DE – EN Alignment. SMULTRON. S tockholm MUL tilingual TR eebank 1000 sentences in 3 languages (DE-EN-SV)

Download Presentation

Human Judgements in Parallel Treebank Alignment

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Human judgements in parallel treebank alignment

Human Judgements in Parallel TreebankAlignment

Martin Volk, Torsten Marek, Yvonne Samuelsson

University of Zurich and Stockholm University

[email protected]


English syntax tree

English Syntax Tree


Human judgements in parallel treebank alignment

DE – EN

Alignment


Smultron

SMULTRON

  • Stockholm MULtilingual TReebank

  • 1000 sentences in 3 languages (DE-EN-SV)

    • 500 from Jostein Gaarder’s Sophie’s World(~ 7 500 tokens, 14 tokens/sentence) and

    • 500 from Economy texts (~ 11 000 tokens, 22 tokens/sentence)

      • ABB Quarterly report

      • Rainforest Alliance: Banana Certification Program

      • SEB Annual report

    • Released: January 2008 www.ling.su.se/dali/research/smultron/index.htm


German annotation

German Annotation


German sentence flat annotation

German sentence: flat annotation


German sentence deepened

German sentence: deepened


English annotation

English Annotation


English syntax tree1

English Syntax Tree


English annotation1

English annotation

  • Follows the Penn Treebank guidelines

  • Slower annotation because of

    • insertion of traces

    • secondary edges

    • deeper trees


Tree alignment

Tree Alignment


Human judgements in parallel treebank alignment

  • Sentence alignment

  • Word alignment

    • input for Statistical MT

  • Phrase alignment

    • linguistically motivated phrases

    • input for Example-based MT


Alignment example

Alignment Example


Tools for parallel treebanks

Tools for Parallel Treebanks

  • creating and editing trees

    • from mono-lingual treebanks

    • PoS-taggers, chunkers, editor, ’tree-enricher’

  • aligning phrases

    • use of word alignment tools

    • tree alignment editor  Stockholm TreeAligner

  • searching across languages

    • TIGER-Search for parallel treebanks  Stockholm TreeAligner


Guidelines for alignment

Guidelines for Alignment

  • Align words and phrases that represent the same meaning and could serve as translation units in an MT system.

  • Align as many words and phrases as possible.

  • Distinguish between exact and approximate alignments.

  • 1:n word / phrase alignments are allowed, but not m:n word / phrase alignments.

  • m:n sentence alignments are allowed.


Examples

Examples

  • Do not align:

    • die Verwunderung über das Leben

    • their astonishment at the world

  • Do align:

    • was für eine seltsame Welt

    • what an extraordinary world


Specific rules

Specific rules

  • a pronoun in one language shall never be aligned with a full noun in the other

  • names are aligned regardless of spelling, unless the name is changed (fiction)

  • ignore number/case but not voice


Exact vs approximate alignment

Exact vs approximate alignment

  • best vs. ”second-best” translation

  • an acronym in one language shall be aligned as approximate (fuzzy) with a spelled-out term in the other

    • PT – Power Technologies

  • difficult distinctions

    • einer der ersten Tage im Mai – early May


Related research

Related Research

  • Blinker project (Melamed)

  • Prague Czech-English Treebank

  • Example-based MT in Dublin

  • Linköping English-Swedish Treebank


Experiment

Experiment

  • 12 students to align 20 tree pairs DE-EN

    • 10 tree pairs from Sophie’s world

    • 10 tree pairs from Economy text

  • advanced CL students

  • received

    • short introduction

    • the written guidelines


Gold standard alignment de en

Gold Standard Alignment (DE-EN)


Experiment results

Experiment: Results

The students created

  • a huge variety in number of alignments

  • Sophie part: from 47 to 125 (ø = 94.3)

  • Econ part: from 62 to 259 (ø = 186.9)

     the 3 students with the lowest numbers were non-native speakers of German

     1 student had misunderstood the task


Experiment results1

Experiment: Results

  • The remaining 8 students had a high overlap with the gold standard (Recall):

    • Sophie part: from 48% to 81% (ø = 68.7%)

    • Econ part: from 66% to 89% (ø = 75.5%)

  • Precision

    • Sophie part: from 81% to 97% (ø = 89.1%)

    • Econ part: from 78% to 94% (ø = 88.2%)


Discrepancies

Discrepancies

  • students sometimes aligned a word (or some words) with a node.

    • e.g. the word natürlichto the phrase of course

  • students sometimes aligned a German verb group with a single verb form in English

    • e.g. ist zurückzuführenvs. reflecting


Discrepancies1

Discrepancies

based on different grammatical forms:

  • a definite single NP in German with an indefinite plural NP in English

    • der Umsatz vs. revenues

  • a German genitive NP with a PP in English

    • der beiden Divisionenvs. of the two divisions


Missed by all students

Missed by all students

  • alignment of German word to empty token in English

    • wenn sie die Hand ausstreckte vs.

    • herself shaking hands


Conclusions

Conclusions

  • Our alignment guidelines are sufficient for a core of clear alignment decisions.

  • Needed:

    • Better alignment rules with concrete examples.

    • Better support tools (consistency checking).

  • The distinction between exact alignment and approximate alignment is very tricky.


Thank you for your attention

Thank You for Your Attention!

  • Questions???


Applications of parallel treebanks

Applications of Parallel Treebanks

For the Translator

  • corpus for translation studies

    • search tools needed

      For the Computational Linguist

  • input for Example-based Machine Translation

  • evaluation corpus for word, phrase or clause alignment

  • training corpus for transfer rules


Alignment example1

Alignment Example


Parallel treebanking

Parallel Treebanking

SV sentence

DE sentence

ANNOTATE

- PoS tagger (STTS)

- Chunker (TIGER)

PoS tagger (SUC)

STTS conversion

ANNOTATE

- Chunker (SWE-TIGER)

flat DE tree

flat SV tree

Deepening

Deepening + Back conv.

DE tree

SV tree

phrase alignment


  • Login