1 / 54

Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance

Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance. Shay B. Cohen Dipanjan Das Noah A. Smith Carnegie Mellon University. July 27 EMNLP 2011. Goal: . Learn linguistic structure for a language without any labeled data in that language.

hovan
Download Presentation

Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unsupervised Structure Predictionwith Non-Parallel Multilingual Guidance Shay B. Cohen Dipanjan Das Noah A. Smith Carnegie Mellon University July 27 EMNLP 2011

  2. Goal: Learn linguistic structure for a language without any labeled data in that language The Skibo Castle is close by . VERB . ADJ DET NOUN NOUN ADP Dependency Parsing Part-of-Speech Tagging EMNLP 2011

  3. Multilingual Unsupervised Learning (hard) no parallel data using parallel data supervision in source language(s) supervision in source language(s) joint learning for multiple languages joint learning for multiple languages Yarowsky and Ngai (2001) Cohen and Smith (2009) Snyder et al. (2009) This work! Xi and Hwa (2005) Berg-Kirkpatrick and Klein (2010) Naseem et al. (2010) Smith and Eisner (2009) Das and Petrov (2011) McDonald et al. (2011) EMNLP 2011

  4. In a Nutshell Portuguese parameters Annotated data Unlabeled data in Portuguese = + Spanish Italian Monolingual unsupervised training in Portuguese Coarse, universal parameters Interpolation (unsupervised training) Coarse-to-fine expansion and initialization Coarse, universal parameters coarse parameters of Portuguese EMNLP 2011

  5. Assumptions for a given problem: 1. Underlying model is generative close by The Skibo is Castle HMM Merialdo (1994) EMNLP 2011

  6. Assumptions for a given problem: 1. Underlying model is generative ROOT DMV Klein and Manning (2004) ADP ADJ NOUN VERB NOUN DET EMNLP 2011 6

  7. Assumptions for a given problem: 1. Underlying model is generative Composed of multinomial distributions close by The Skibo is Castle HMM Merialdo (1994) EMNLP 2011 7

  8. Assumptions for a given problem: 1. Underlying model is generative Composed of multinomial distributions ROOT DMV Klein and Manning (2004) ADP ADJ NOUN VERB NOUN DET EMNLP 2011 8

  9. Assumptions for a given problem: 1. Underlying model is generative In general, unlexicalized parameters look like: kth multinomial in the model ith event in the multinomial e.g. transition from ADJ ( )to NOUN ( ) EMNLP 2011 9

  10. Assumptions for a given problem: 1. Underlying model is generative The lexicalized parameters take a similar form (No lexicalized parameters for the DMV) EMNLP 2011 10

  11. Assumptions for a given problem: 1. Underlying model is generative number of times event i of multinomial k fires in the derivation unlexicalized lexicalized EMNLP 2011 11

  12. Assumptions for a given problem: 2. Coarse, universal part-of-speech tags EMNLP 2011

  13. Assumptions for a given problem: 2. Coarse, universal part-of-speech tags For each language , there is a mapping Treebank tagset EMNLP 2011

  14. Assumptions for a given problem: 3. helper languages For each: coarse conversion Coarse treebank Treebank MLE unlexicalized parameters EMNLP 2011

  15. Multilingual Modeling EMNLP 2011

  16. Multilingual Modeling For a target language, unlexicalized parameters: mixture weight for kthmultinomial for the th helper language kth multinomial in the model (say, the transitions from the ADJ tag in an HMM) EMNLP 2011

  17. Multilingual Modeling e.g., two helper languages: Spanish and Italian ADJ → . ADJ → . ADJ → . ADJ → . ADJ → . 0.7 0.3 EMNLP 2011

  18. Multilingual Modeling e.g., two helper languages: Spanish and Italian ADJ → . ADJ → . ADJ → . ADJ → . ADJ → . ? ? unknown EMNLP 2011

  19. Learning and Inference EMNLP 2011

  20. Learning and Inference normal learning EMNLP 2011

  21. Learning and Inference multilingual learning are fixed! EMNLP 2011

  22. Learning and Inference Multilingual learning learning with EM: M-step: Number of times is used in a derivation EMNLP 2011

  23. Learning and Inference Multilingual learning What about feature-rich generative models? Locally normalized log-linear model Berg-Kirkpatrick et al. (2010) EMNLP 2011

  24. Multilingual Modeling e.g., two helper languages: Spanish and Italian ADJ → . ADJ → . ADJ → . ADJ → . ? ? unknown EMNLP 2011

  25. Multilingual Modeling e.g., two helper languages: Spanish and Italian ADJ → . ADJ → . ADJ → . ADJ → . ADJ → . 0.6237 0.3763 learned EMNLP 2011

  26. Learning and Inference Coarse-to-fine expansion (for English) ADJ → . JJR → . JJS → . JJ → . identical copies Step 1 EMNLP 2011

  27. Learning and Inference Coarse-to-fine expansion (for English) JJ → . EMNLP 2011

  28. Learning and Inference Coarse-to-fine expansion (for English) Step 2 Monolingual unsupervised training JJ → . JJ → . Equal division Initializer . . . . . . . . . . . . new, fine EMNLP 2011

  29. Experiments EMNLP 2011

  30. Two Problems • Unsupervised • Part-of-Speech • Tagging • Model: • feature-based HMM • (Berg-Kirkpatrick et al., 2010) • Learning: • L-BFGS • Unsupervised • Dependency • Parsing • Model: • DMV • (Klein and Manning, 2004) • Learning: • EM EMNLP 2011

  31. Languages Target Languages: Bulgarian, Danish, Dutch, Greek, Japanese, Portuguese, Slovene, Spanish, Swedish, and Turkish Helper Languages: English, German, Italian and Czech (CoNLLTreebanks from 2006 and 2007) EMNLP 2011

  32. Results: POS Tagging Full model Uniform mixture parameters (no learning) Monolingual baseline (Berg-Kirkpatrick et al., 2010) (without tag dictionary) EMNLP 2011

  33. Results: POS Tagging (without tag dictionary) EMNLP 2011

  34. Results: Dependency Parsing Phylogenetic Grammar Induction (Berg-Kirkpatrick and Klein, 2010) Posterior Regularization (Gillenwater et al, 2010) Monolingual EM (Klein and Manning, 2004) EMNLP 2011

  35. Results: Dependency Parsing Uniform mixture parameters Coarse-to-fine expansion → monolingual learning Learned mixture parameters Coarse-to-fine expansion → monolingual learning Learned mixture parameters No coarse-to-fine expansion 1. Uniform mixture parameters 2. No coarse-to-fine expansion (no learning) EMNLP 2011

  36. Results: Dependency Parsing EMNLP 2011

  37. Results: Dependency Parsing EMNLP 2011

  38. Analyzing with Principal Component Analysis Two principal components EMNLP 2011

  39. From Words to Dependencies EMNLP 2011

  40. From Words to Dependencies Use induced tags to induce dependencies In a pipeline Using the posteriors over tagsin a sausage lattice(Cohen and Smith, 2007) EMNLP 2011

  41. From Words to Dependencies Joint Decoding: DET : 0.95 DET : 0.0 DET : 0.01 ADJ: 0.03 ADJ: 0.3 ADJ: 0.1 Parsing a lattice 1 2 3 4 NOUN: 0.02 NOUN: 0.7 NOUN: 0.89 Skibo Castle The DMV EMNLP 2011

  42. Results: Words to Dependencies EMNLP 2011

  43. Results: Words to Dependencies EMNLP 2011

  44. Results: Words to Dependencies Best average result with gold tags: 62.2 Interesting result: Auto tags perform better for Turkish and Slovene EMNLP 2011

  45. Conclusions EMNLP 2011

  46. Conclusions • Improvements for two major tasks using non-parallel multilingual guidance • In general grammar induction results better than POS tagging • Joint POS and dependency parsing performs surprisingly well • For a few languages, results are better than using gold tags • Joint decoding performs better than a pipeline EMNLP 2011

  47. Questions? EMNLP 2011

  48. Results: POS Tagging (without tag dictionary) EMNLP 2011

  49. Results: POS Tagging (without tag dictionary) EMNLP 2011

  50. Results: POS Tagging (with tag dictionary) EMNLP 2011

More Related