1 / 13

Combining Word-Alignment Symmetrizations in Dependency Tree Projection

Combining Word-Alignment Symmetrizations in Dependency Tree Projection. David Mare č ek marecek@ufal.mff.cuni.cz Charles University in Prague Institute of Formal and Applied Linguistics CICLING conference Tokyo, Japan, February 21, 2011. Motivation.

derron
Download Presentation

Combining Word-Alignment Symmetrizations in Dependency Tree Projection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Combining Word-Alignment Symmetrizations in Dependency Tree Projection David Mareček marecek@ufal.mff.cuni.cz Charles University in Prague Institute of Formal and Applied Linguistics CICLING conference Tokyo, Japan, February 21, 2011

  2. Motivation • Let’s have a text in a language which is not very common... • We would like to parse it, but we do not have any parser • no manually annotated treebank • But we do have a parallel corpus with another language • English

  3. Our goal – To create a parser • Take the parallel corpus with English • Make a word-alignment on it • GIZA++ • Parse the English side of the corpus • MST dependency parser • Transfer the dependencies from English to the target language using the word-alignment • Train the parser on the resulting trees

  4. Previous works • Rebecca Hwa (2002, 2005) • Simple algorithm for projecting trees from English to Spanish and Chinesse • Only one type of alignment used and not specified which one • K. Ganchev, J. Gillenwater, B. Taskar (2009) • Unsuprevised parser with posterior regularization, in which inferred dependencies should correspond to projected ones • English to Bulgarian

  5. Our contribution • To show that utilization of various types of alignment improves the quality of dependency projection • GIZA++ [Och and Ney, 2003] • two uni-directonal asymmetric alignments • symmetrization methods • Simple algorithm for projecting dependencies using different types of alignment links • Training and evaluating MST parser

  6. Word alignment • GIZA++ toolkit has asymmetric output • For each word in one language just one counterpart from the other language is found Coordination of fiscal policies indeed , can be counterproductive . Eine Koordination finanzpolitischer Maßnahmen kann in der Tat kontraproduktiv sein . ENGLISH-to-X Coordination of fiscal policies indeed , can be counterproductive . Eine Koordination finanzpolitischer Maßnahmen kann in der Tat kontraproduktiv sein . X-to-ENGLISH

  7. Symmetrization methods Coordination of fiscal policies indeed , can be counterproductive . • Combinations of previous two unidirectional alignments Eine Koordination finanzpolitischer Maßnahmen kann in der Tat kontraproduktiv sein . INTERSECTION Coordination of fiscal policies indeed , can be counterproductive . Eine Koordination finanzpolitischer Maßnahmen kann in der Tat kontraproduktiv sein . GROW-DIAG-FINAL

  8. Which alignment to use for the projection? • We have presented four different types of alignment • ENGLISH-to-X, X-to-ENGLISH, INTERSECTION, GROW-DIAG-FINAL • We prefer X-to-ENGLISH alignment • we need to find a parent for each token in the language X • we don’t mind English words that are not aligned • We recognize three types of links • A: links that appeared in INTERSECTION alignment (red) • B: links that appeared in GROW-DIAG-FINAL and also in X-to-ENGLISH alignment (orange) • C: links that appeared only in X-to-ENGLISH alignment (blue) Coordination of fiscal policies indeed , can be counterproductive . Eine Koordination finanzpolitischer Maßnahmen kann in der Tat kontraproduktiv sein .

  9. Algorithm - example Coordination of fiscal policies indeed , can be counterproductive . Eine Koordination finanzpolitischer Maßnahmen kann in der Tat kontraproduktiv sein .

  10. Results • The best results for each of the testing languages: • English parser trained on CoNLL-X data • The projection was made on first 100.000 sentence pairs from News-commentaries (or Acquis-communautaire) parallel corpus • We used McDonald’s maximum spaning tree parser • Why is the accuracy so low? • Treebanks in CoNLL differ in annotation guidelines • Different handling of coordination structures, auxiliary verbs, noun phrases, ...

  11. Comparison with previous work • We have run our projection method on the same datasets as in the previous work by Ganchev et al. (2009) • Bulgarian, OpenSubtitles parallel corpus • English parser trained on PennTreebank • Tested on Bulgarian CoNLL-X train sentences up to 10 words • Our results are slightly better • we did NOT use any unsupervised inference of dependency edges • we utilized better the word aligment

  12. Conclusions • We proved that using combination of different word-alignment improves dependency tree projection • We outperform the state-of-the art results • The problem of testing is in a different anotation guidelines for each treebank

  13. Thank you for your attention

More Related