1 / 43

Morphological Smoothing and Extrapolation of Word Embeddings

Morphological Smoothing and Extrapolation of Word Embeddings. Ryan Cotterell , Hinrich Schütze , Jason Eisner. Words in the Lexicon are R elated!. Word embeddings are already good at this!. running. running. running. Relatedness. Morphology. Similarity. sprinting. shoes. ran.

lakeisha
Download Presentation

Morphological Smoothing and Extrapolation of Word Embeddings

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Morphological Smoothing and Extrapolation of Word Embeddings Ryan Cotterell, HinrichSchütze, Jason Eisner

  2. Words in the Lexicon are Related! Word embeddings are already good at this! running running running Relatedness Morphology Similarity sprinting shoes ran Goal: Put morphology into word embeddings PAST TENSE

  3. GERUND PRES Inflectional Morphology is Highly-Structured running run RUN LEMMA runs ran 3rd PRES SG PAST TENSE

  4. GERUND PRES sprinting sprint SPRINT LEMMA sprinted sprints 3rd PRES SG PAST TENSE

  5. GERUND PRES wugging wug Same Structure Across Paradigms! WUG LEMMA wugged wugs 3rd PRES SG PAST TENSE

  6. Research Question How do we exploit structured morphologicalknowledge in models of word embedding?

  7. A Morphological Paradigm – Strings Tenses Suffixes /Ø/ /s/ /ed/ /ing/ Gerund Pres 3P Pres Sg Past [run] RUN /run/ [runs] [running] [ran] [sprint] [sprints] [sprinting] [sprinted] SPRINT /sprint/ [wug] [wugs] [wugging] [wugged] WUG /wug/ Stems Verbs [codes] CODE [coding] /code/ LOVE [love] /love/ [loved] BAT [bats] /bat/ [bating] PLAY [played] /play/

  8. A Morphological Paradigm – Strings Suffixes /Ø/ /s/ /ed/ /ing/ Gerund Pres 3P Pres Sg Past [run] RUN /run/ [runs] [running] [ran] [sprint] [sprints] [sprinting] [sprinted] SPRINT /sprint/ [wug] [wugs] [wugging] [wugged] WUG /wug/ Verbs [codes] CODE [coding] /code/ LOVE [love] /love/ [loved] BAT [bats] /bat/ [bating] PLAY [played] /play/

  9. Why is “running” written like that? run ing Concatenate run#ing Orthography (stochastic) Orthographic Change: Doubled n! running Modeling word forms using latent underlying morphs and phonology. Cotterell et. al. TACL 2015

  10. A Morphological Paradigm – Strings Suffixes /Ø/ /s/ /ed/ /ing/ Gerund Pres 3P Pres Sg Past [run] RUN /run/ [runs] [running] [ran] [sprint] [sprints] [sprinting] [sprinted] SPRINT /sprint/ [wug] [wugs] [wugging] [wugged] WUG /wug/ Verbs [code] [codes] CODE [coding] [coded] /code/ [loves] LOVE [love] /love/ [loving] [loved] BAT [bat] [bated] [bats] /bat/ [bating] PLAY [play] [played] /play/ [plays] [playing] Prediction!

  11. Matrix Completion: Collaborative Filtering Movies -37 29 29 19 -36 67 22 77 Users -24 61 12 74 -79 -41 -52 -39

  12. Matrix Completion: Collaborative Filtering Movies [ [ [ [ -6 -3 2 9 -2 1 9 -7 2 4 3 -2 [ [ [ [ -37 29 29 19 [ 4 1 -5] -36 67 22 77 [ 7 -2 0] -24 61 12 74 [ 6 -2 3] Users -79 -41 [-9 1 4] -52 -39 [ 3 8 -5]

  13. Matrix Completion: Collaborative Filtering [1,-4,3] [-5,2,1] Dot Product -10 Gaussian Noise -11

  14. Matrix Completion: Collaborative Filtering Movies [ [ [ [ -6 -3 2 9 -2 1 9 -7 2 4 3 -2 [ [ [ [ [ -37 29 29 19 [ 4 1 -5] -36 67 22 77 [ 7 -2 0] -24 61 12 74 [ 6 -2 3] Users 59 -79 -41 -80 [-9 1 4] -52 6 46 -39 [ 3 8 -5] Prediction!

  15. Morphological Paradigm – Vectors Tenses Suffixes New: This Work Gerund Pres 3P Pres Sg Past RUN SPRINT WUG Stems Verbs CODE LOVE BAT word2vec embeddings PLAY

  16. Things word2vec doesn’t know… • Words with the same stem like “running” and “ran” are related • Words with the same inflection like “running” and ”sprinting” are related • Our Goal: Put this information into word embedding for improved embeddings!

  17. Morphological Paradigm – Vectors Suffixes Gerund Pres 3P Pres Sg Past RUN SPRINT WUG Stems CODE LOVE BAT PLAY

  18. Morphological Paradigm – Vectors Suffixes Pres Past RUN Stems LOVE

  19. Morphological Paradigm – Vectors Suffixes Pres Past RUN Stems LOVE

  20. Morphological Paradigm – Vectors Suffixes Pres Past RUN Stems LOVE

  21. Why does “running” mean that? Add Gaussian Noise

  22. Morphological Paradigm – Vectors Suffixes Pres Past RUN Stems LOVE Same Offset!

  23. Additive Model running RUN GERUND ran RUN PAST

  24. Relation to Vector Offset Method ran RUN PAST 1) RUN SPRINT GERUND GERUND 2) ran SPRINT PAST 3) sprinted ran running sprinting

  25. Step 1: Sample morphemevectors from prior Step 2: Sample true word vector Generating A Type Vector in the Lexicon RUN GERUND running Step 3: Sample observed vector running

  26. Generating A Type Vector in the Lexicon RUN GERUND running running

  27. Directed Graphical Model SPRINT RUN GERUND PAST running ran sprinted sprinting sprinted running sprinting ran

  28. Smoothing and Extrapolation • All word embeddings are noisy! • Optimization during training incomplete • Only observed a few tokens • Our model smooths all of the word embeddings jointly based on morphological information • Note: Extrapolation is extreme smoothing: when you’ve never seen the word!

  29. Gaussian Graphical Model • All Conditionals (probability of child given parents) are Gaussian distributed: • Exact inference is always tractable (through matrix inversion) • General framework for reasoning over word embeddings! (Not limited to morphology) • Post-processing model = lightning fast (10 seconds for 1-best joint inference of embeddings)

  30. Where did we get the graph structure? Answer: from morphological lexicons

  31. Where did we get the graph structure? SPRINT RUN PAST GERUND sprinted sprinting ran running sprinted sprinting ran running

  32. Why you should care! • You aren’t going to see all the words! • Too many, thanks to Zipf’s law • But we know some words must exist:` • Every English verb has a gerund, even if you didn’t see it in a corpus • Can we guess its meaning? • Open vocabulary word embeddings • Simple to implement, to train and to extend!

  33. How do we learn the embeddings? • Learn the model parameters with Viterbi EM • E-step: simple coordinate descent (10 sec.) • M-step: update covariance matrix See paper for more details!

  34. Training Set-up • Experimented on 5 languages: Czech, English, German, Spanish and Turkish • Varying degrees of morphology: English → German → Spanish → Czech → Turkish • Initial Embeddings are trained on Wikipedia • tokenized text • skip-gram with negative sampling • 200 dimensions

  35. Experiment 1: Vector Prediction SPRINT RUN GERUND PAST Task: predict this vector! sprinted running ??? sprinting sprinted running sprinting ???

  36. Experiment 1: Vector Prediction • Choose closest vector in space under cosine distance • Baseline: standard analogies • All details in the paper! Analogies also predict forms! sprinted ran running sprinting Predicts “ran” from “running”, ”sprinting” and ”sprinted”

  37. Experiment 2: Perplexity • How perplexed is skip-gram on held-out data • Just like standard language model evaluation • Question: Do our smoothed and extrapolated word embeddings improve prediction?

  38. Experiment 2: Perplexity Unsmoothed Perplexity (bits) # Observed Tokens Take Away: Smoothing helps! (More with fewer tokens). See paper for more details Smoothed

  39. Experiment 3: Word Similarity • Task: Spearman’s ρ between human judgements and cosine between vectors • Similarity is about lemmata, not inflected forms • Use the latent lemma embedding!

  40. Directed Graphical Model SPRINT RUN GERUND PAST Lemmata Word Embeddings running ran sprinted sprinting sprinted running sprinting ran

  41. Experiment 3: Word Similarity • Task: Spearman’s ρ between human judgements and cosine between vectors • Similarity is about lemmata, not inflected forms • Use the latent lemma embedding!

  42. Future Work • Integrate morphological information with character-level models! • Research Questions: • Are character-level models enough or do we need structured morphological information? • Can morphology help character-level neural networks?

  43. Fin Thanks for your attention!

More Related