1 / 29

Improving Vector Space Word Representations Using Multilingual Correlation

Improving Vector Space Word Representations Using Multilingual Correlation. Manaal Faruqui and Chris Dyer Language Technologies Institute Carnegie Mellon University. Distributional Semantics. “You shall know a word by the company it keeps”. (Harris 1954; Firth, 1957).

nadda
Download Presentation

Improving Vector Space Word Representations Using Multilingual Correlation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving Vector Space Word Representations Using Multilingual Correlation Manaal Faruqui and Chris Dyer Language Technologies Institute Carnegie Mellon University

  2. Distributional Semantics “You shall know a word by the company it keeps” (Harris 1954; Firth, 1957) …I will take what is mine with fire and blood… …the end battle would be between fire and ice… …My dragons are large and can breathe fire now… …flame is the visible portion of a fire… …take place whereby fires can sustain their own heat…

  3. Translational Semantics What other Information? That plane can seat more than 300 people तीनसौसेअधिकलोगोंकोबैठानेवालावायुयान … रूसीवायुयानबहुतबड़ेहैं Russian airplanesare huge plane ≅airplane Multilingual Information! (Bannard & Callison-Burch, 2005)

  4. Outline • Distributional Semantics • Monolingual context • Translational Semantics • Multilingual context • Better Semantic Representations • Using Distributional + Translational semantics

  5. Word Vector Representations How to encode such co-occurrences? contexts words

  6. Word Vector Representation Latent Semantic Analysis (Deerwester et al., 1990) context words words Singular Value Decomposition

  7. Multilingual Information English German French Spanish dragon Drache dragon dragón = Append Problem ?

  8. Multilingual Information Disadvantages of Vector Concatenation • Vector Size Increases • Idiosyncratic Info. • What if word is OOV ? ✗ ?

  9. Multilingual Information So, what can we do? …I will take what is mine with fire and blood… …the end battle would be between fire and ice… …My dragons are large and can breathe fire now… ... Das Ende der Schlacht würde zwischen Feuer und Eis ... ... gesehen ist Feuer eine Oxidationsreaktion mit... ... Das Licht des Feuers ist eine physikalische Erscheinung… Two Views: Canonical Correlation Analysis !

  10. Canonical Correlation Analysis (CCA) Project two sets of vectors (equal cardinality) in a space where they are maximally correlated Ω Θ Ω Θ CCA ≅ Convex Optimization Problem with Exact Solution !

  11. Canonical Correlation Analysis (CCA) W, V = CCA(Ω, Θ) X W Y V d1 × n1 n2 d2 × k k d2 d1 X” Y” n1 n2 k = min(r(Ω), r(Θ)) k k X” and Y” are now maximally correlated !

  12. Canonical Correlation Analysis (CCA) Problems Addressed? • Vector Size Increases, Doesn’t increase • Idiosyncratic Information, Lets you choose! • What if word is OOV?, Projection vectors for everyone!

  13. Canonical Correlation Analysis (CCA) Ok, but equal cardinality sets Ω& Θ? • The vocabularies cant be of equal size ! • Get word alignments from a parallel corpus • Preserve only words in the original vocabulary • For every word in English, select the best foreign word

  14. Experimental Setup LSA Word Vector Learning Tokenizer and Lowercasing: WMT scripts

  15. Experimental Setup LSA Word Vector Learning Word Alignment Tool: fast_align (Dyer et al, 2013)

  16. Experimental Setup LSA Word Vector Learning Corpus Preprocessing ...hello… …hello… …hello… …hello… …hello… Context : 23.45 , 21st , 10-20-2014 , 0.5e10 NUM anchfgugsjh, wekjfbg, bhguyq UNK

  17. Experimental Setup Word Similarity Evaluation WS-353 (Finkelstein et al, 2001) WS-353-SIM (Agirre et al, 2009) WS-353-REL (Agirreet al, 2009) RG-65 (Rubenstein and Goodenough, 1965) MC-30 (Miller and Charles, 1991) MTurk-287 (Radinsky et al, 2011) Word Relation Evaluation Semantic Relations (Mikolov et al, 2013) Syntactic Relations (Mikolov et al, 2013) Evaluation Benchmarks

  18. Experimental Setup Monolingual Vector Length: 80 Multilingual Vector Length: ? Multilingual Vector Learning • The length in projected space can be chosen: ‘k’ • Choose the best value of ‘k’ for WS-353 k ε[0.1, 0.2, …, 1.0]

  19. Experimental Setup Multilingual Vector Learning Spearman’s correlation Dimensions Performance on WS-353; k = 0.6

  20. Experimental Setup Multilingual Vector Learning Spearman’s correlation

  21. Experimental Setup Multilingual Vector Learning Accuracy

  22. Experimental Setup RNNLM (Mikolov et al, 2011) Predict next word given the history Neural language model Recurrent hidden layer connections Skip-Gram, word2vec (Mikolov et al, 2013) Predict context given the word Removes hidden layer Vocabulary represented in Huffman coding Multilingual Vectors: Neural Networks

  23. Experimental Setup Multilingual Vector Learning RNNLM Skip-Gram

  24. Experimental Setup Multilingual Vectors: Scaling Spearman’s correlation on WS-353

  25. Experimental Setup Multilingual Vectors: Qualitative Analysis Antonyms and Synonyms of “Beautiful”: Monolingual Setting t-SNE tool (van der Maaten and Hinton, 2008)

  26. Experimental Setup Multilingual Vectors: Qualitative Analysis Antonyms and Synonyms of “Beautiful”: Multilingual Setting t-SNE tool (van der Maaten and Hinton, 2008)

  27. Conclusion • CCA: Easy to use tool in MATLAB • Take vectors from two languages and improve them. • Multilingual Information is Important • Even if the problems are inherently monolingual. • More Effective for Distributional Vectors • Semantics generalizes better than Syntax. • Vectors available at: http://cs.cmu.edu/~mfaruqui

  28. Related Work • Bilingual word vectors • Klementiev et al 2012 • Zou et al, 2013 • Translation Models • Kalbrenner & Blunsom, 2013 • Compositional Semantics • Hermann & Blunsom, 2014 • Document representation • Vinokourov et al, 2002, • Platt et al, 2010 • Synonymy and Paraphrasing • Bannard and Burch, 2005, • Ganitkevitch et al, 2013 • Bilingual lexicon induction • Haghighi et al, 2008 • Vulic and Moens, 2013

  29. Thanks!Visit us at ACL-demo: wordvectors.org

More Related