1 / 21

Word2Vec Explained

Word2Vec Explained. Jun Xu Harbin Institute of Technology China. Word Similarity & Relatedness. How similar is pizza to pasta ? How related is pizza to Italy ? Representing words as vectors allows easy computation of similarity Measure the semantic similarity between words

homerr
Download Presentation

Word2Vec Explained

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Word2Vec Explained Jun Xu Harbin Institute of Technology China

  2. Word Similarity & Relatedness • How similar is pizza to pasta? • How related is pizzato Italy? • Representing words as vectors allows easy computation of similarity • Measure the semantic similarity between words • As features for various supervised NLP tasks such as document classification, named entity recognition, and sentiment analysis

  3. What is word2vec?

  4. What is word2vec? • word2vec is not a single algorithm • word2vec is not deep learning • It is a software package for representing words as vectors, containing: • Two distinct models • CBoW • Skip-Gram (SG) • Various training methods • Negative Sampling (NS) • Hierarchical Softmax

  5. Why Hierarchical Softmax?

  6. Why Hierarchical Softmax? • Turn multinomial classification problem into multiple binomial classification problem

  7. Why Negative Sampling?

  8. Why Negative Sampling? • Increase positive samples’ probability while decrease negative samples’ probability • Hidden assumption: decrease negative samples’ probability means Increase positive samples’ probability • Right? • Maybe not! • The Objective Function has changed already!! • Vectors: word vector and parameter vector, not w_in and w_out

  9. Put it all together • Goal: Learn word vectors • Similar semantic means similar word vector • Maximum likelihood estimation: • MLE on words • multinomial classification -> multiple binomial classification • Hierarchical softmax • MLE on word-context pairs • Negative sampling

  10. Hierarchical Softmax Rethink

  11. Hierarchical Softmax Rethink • Huffman tree and Hidden layers • Change another tree structure other than Huffman tree • Change the way that Huffman tree is built • What is frequency?

  12. Discuss

  13. What is SGNS learning?

  14. What is SGNS learning? “Neural Word Embeddings as Implicit Matrix Factorization” Levy & Goldberg, NIPS 2014

  15. What is SGNS learning? “Neural Word Embeddings as Implicit Matrix Factorization” Levy & Goldberg, NIPS 2014

  16. What is SGNS learning? “Neural Word Embeddings as Implicit Matrix Factorization” Levy & Goldberg, NIPS 2014

  17. What is SGNS learning? “Neural Word Embeddings as Implicit Matrix Factorization” Levy & Goldberg, NIPS 2014

  18. What is SGNS learning? “Neural Word Embeddings as Implicit Matrix Factorization” Levy & Goldberg, NIPS 2014

  19. What is SGNS learning? “Neural Word Embeddings as Implicit Matrix Factorization” Levy & Goldberg, NIPS 2014

  20. What is SGNS learning? • SGNS is doing something very similar to the older approaches • SGNS is factorizing the traditional word-context PMI matrix • So does SVD! • GloVe factorizes a similar word-context matrix

  21. That’s allThanks for coming

More Related