phrase alignment of estonian german parallel treebanks l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Phrase alignment of Estonian-German parallel treebanks PowerPoint Presentation
Download Presentation
Phrase alignment of Estonian-German parallel treebanks

Loading in 2 Seconds...

play fullscreen
1 / 15

Phrase alignment of Estonian-German parallel treebanks - PowerPoint PPT Presentation


  • 88 Views
  • Uploaded on

Phrase alignment of Estonian-German parallel treebanks. Heli Uibo and Krista Liin, University of Tartu Martin Volk, Stockholm University . Aim and motivation. Aim – the alignment of the phrases of two corpora that are each others' translations Motivation:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Phrase alignment of Estonian-German parallel treebanks' - jarah


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
phrase alignment of estonian german parallel treebanks

Phrase alignment of Estonian-German parallel treebanks

Heli Uibo and Krista Liin, University of Tartu

Martin Volk, Stockholm University

aim and motivation
Aim and motivation
  • Aim – the alignment of the phrases of two corpora that are each others' translations
  • Motivation:
    • Example-Based Machine Translation (EBMT)
    • Cross-language and translation studies
existing resource the sofie parallel treebank
Existing resource – The Sofie Parallel Treebank
  • http://omilia.uio.no/sofie/ (password protected)
  • 9 European languages, including German and Estonian
  • initiated by the Nordic Treebank Network
  • chapters 1-2 of Jostein Gaarder’snovel “Sophie’s World”
  • sentences aligned
  • syntactic structure and functions annotated, but different annotation schemes used:
    • German – TIGER (http://www.ims.uni-stuttgart.de/projekte/TIGER/ )
    • Estonian – VISL (http://beta.visl.sdu.dk)
automatic alignment of estonian german nps
Automatic alignment of Estonian-German NPs
  • This is the first automatic alignment of Estonian-X parallel corpora below the sentence level.
  • We started from the automatic alignment of NPs, because
    • an important part of the sentence's meaning is represented by noun phrases;
    • NPs are the most frequent phrase types in these languages.
the np alignment method
The NP alignment method

1. Find all noun phrases in the parallel sentences.

Sofie legte dann immer einen dicken Stapel Post auf den Küchentisch, ehe sie auf ihr Zimmer ging , um ihre Aufgaben zu machen .

Tavaliselt pani tapaksu pataka postiköögilauale , enne kui läks üles oma tuppakoolitöid tegema .

the np alignment method7
The NP alignment method

2. Find all correspondences between the noun phrases.

Sofie legte dann immer einen dicken Stapel Post auf den Küchentisch, ehe sie auf ihr Zimmer ging , um ihre Aufgaben zu machen .

Tavaliselt pani tapaksu pataka postiköögilauale , enne kui läks üles oma tuppakoolitöid tegema .

3. Remove overlapping correspondences.

the np alignment method8
The NP alignment method

To accomplish 2.-3. we used online dictionaries (ET-EN and DE-EN) and annotation information:

2. To set the correspondences between Estonian and German NPs

  • Translate all NP heads to English;
  • Find the intersections of translations;
  • If a pair of NPs are related by translation, then set a correspondence between them.

3. To remove overlapping correspondences

  • Use proper names as milestones;
  • Look at the locations of the NPs in the sentence.
results
Results
  • 53 sentence pairs
  • 134 possible NP matches were found, out of which 75 matches were selected.
  • precision 84%
  • recall 53%
sources of errors
Sources of errors
  • Different tree structures (German – deeper)
  • Translation problems. We used English as an intermediary language to find German-Estonian word correspondences (there is no free German-Estonian electronic dictionary).
  • An NP in one language may correspond to a different phrase type or to a part of an NP in the other language.
  • A PP in German often corresponds to an NP in Estonian
    • A lot of grammatical information that is expressed by prepositions in German or English is expressed by grammatical cases in Estonian.
alternative approach statistical
Alternative approach – statistical
  • An alternative to using bilingual electronic dictionaries is the use of statistical word alignment methods.
  • This method has been evaluated by Samuelsson (2004) for the phrase alignment of a German-Swedish parallel treebank.
  • We intend to test this method also for a German-Estonian treebank, although we are aware of the structural differences between German and Estonian which make automatic word alignment more difficult.
treebank tools
Treebank tools
  • There exist tools for monolingual treebanks:
    • editors, e.g. Annotate
    • treebank query tools (tgrep, TIGERSearch)
  • Special software tools for building and using of parallel treebanks are needed.
  • We have developed an alignment viewer based on SVG (Scalable Vector Graphics).
  • Need to implement:
    • alignment editor (currently being developed at Stockholm University)
    • phrase alignment test tool
conclusion and perspectives
Conclusion and perspectives
  • Our first attempt to align the noun phrases in the Estonian-German parallel treebank led to satisfactory results.
  • The results could be improved if
    • different phrase types would be taken into consideration;
    • a more exact dictionary could be used;
    • Estonian syntactic trees would be deepened, making their annotation depth more similar to that of the German trees.