1 / 33

Neural Machine Translation by Jointly Learning to Align and Translate

Neural Machine Translation by Jointly Learning to Align and Translate. Presented by: Minhao Cheng, Pan Xu, Md Rizwan Parvez. Outline . Problem setting Seq2seq model RNN/LSTM/GRU Autoencoder A ttention Mechanism Pipeline and Model Architecture Experiment Results Extended Method

ebonyr
Download Presentation

Neural Machine Translation by Jointly Learning to Align and Translate

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Neural Machine Translation by Jointly Learning to Align and Translate Presented by: Minhao Cheng, Pan Xu, Md Rizwan Parvez

  2. Outline • Problem setting • Seq2seq model • RNN/LSTM/GRU • Autoencoder • Attention Mechanism • Pipeline and Model Architecture • Experiment Results • Extended Method • Self-attentive (transformer)

  3. Problem Setting • Input: sentence (word sequence) in the source language • Output: sentence (word sequence) in the target language • Model: seq2seq

  4. History of Machine Translation

  5. History of Machine Translation • Rule-based • used mostly in the creation of dictionaries and grammar programs • Example-based • on the idea of analogy. • Statistical • using statistical methods based on bilingual text corpora • Neural • Deep learning

  6. Neural machine translation (NMT) • RNN/LSTM/GRU • Why? • Input/output length variant • Order dependent • Seq2Seq Model

  7. Recurrent Neural Network • RNN unit structure:

  8. Vanilla RNN • Problem: vanishing gradient

  9. Long Short Term Memory (LSTM) • Forget gate • Input gate • Updating cell state • Updating hidden state

  10. Long Short Term Memory (LSTM) • Forget gate • Input gate • Updating cell state • Updating hidden state

  11. Long Short Term Memory (LSTM) • Forget gate • Input gate • Updating cell state • Updating hidden state

  12. Long Short Term Memory (LSTM) • Forget gate • Input gate • Updating cell state • Updating hidden state

  13. Gated Recurrent Unit (GRU)

  14. Using RNN • Text classification • Output dim = 1 • Machine translation • Output dim ≠ 1 (Autoencoder)

  15. Autoencoder

  16. Target sequence Seq2Seq model y1 y2 y3 y4 h1 h2 h3 c s1 s2 s3 s4 x1 x2 x3 x4 __ y1 y2 y3 Input sequence

  17. Pipeline: Seq2Seq Fixed-length representation is a potential bottleneck for long sentences Decoder Nonlinear Function Encoder Image: https://courses.engr.illinois.edu/cs546/sp2018/Slides/Mar15_Bahdanau.pdf

  18. Decoder Alignment While generating yt, searches in x=(x1 , …, xT ) where the most relevant information is concentrated. Attention Decoder Alignment model ct =∑αt,j hj Encoder p(yi) = g(yi−1, si, ci)

  19. Decoder Alignment αij: How relatively well the match between inputs around position j and the output at position i si = f(yi−1, si-1, ci) • Use simple feedforward NN to compute eij based on si-1 and hj • Compute relative alignment ct =∑αt,j hj Image: https://courses.engr.illinois.edu/cs546/sp2018/Slides/Mar15_Bahdanau.pdf

  20. The Full Pipeline Word embedding: I love Sundays

  21. The Full Pipeline Output Hidden states Annotations: each word only summarizes the information of its preceding words Alignment Hidden states Input I love Sundays

  22. The Full Pipeline Output Hidden states Bidirectional RNNs for the annotation hidden states Alignment Forward True annotations can be obtained by concatenating the forward and backward annotations Backward Input

  23. Experiments Dataset: ACL WMT ‘14 (850M words, tokenized by Moses, 30000 words kept) Baselines: two models with different sentence length • RNNenc-30, RNNenc-50 (RNN Encoder-Decoder in Cho et al., 2014a) • RNNsearch-30, RNNsearch-50 (The proposed model in this paper) • RNNsearch-50* (trained until the performance on the development set stopped improving) Training: • Random initialization (orthogonal matrices for weights) • Stochastic gradient descent (SGD) with adaptive learning rates (Adadelta) • Minibatch = 80 • Log Likelihood, by chain rule is sum of next step log likelihoods

  24. Inference Beam Search (Size = 2) 2 partial hypothesis expand hypotheses 2 new partial hypotheses I My I decided My decision I thought I tried My thinking My direction I decided My decision prune … expand and sort

  25. Experiment Results Sample alignments calculated based on RNNsearch-50

  26. Experiment Results BLEU score: How close the candidate translation is to the reference translations. • c: length of candidate translation • r: length of reference translation • p: n-gram precision • w: weight parameters Table: BLEU scores computed on the test set. (◦) Only sentences without [UNK] tokens

  27. Reference https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/ http://colah.github.io/posts/2015-08-Understanding-LSTMs/ https://arxiv.org/pdf/1409.0473.pdf https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf https://docs.google.com/presentation/d/1quIMxEEPEf5EkRHc2USQaoJRC4QNX6_KomdZTBMBWjk/edit#slide=id.g1f9e4ca2dd_0_23 https://pdfs.semanticscholar.org/8873/7a86b0ddcc54389bf3aa1aaf62030deec9e6.pdf https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/ https://courses.engr.illinois.edu/cs546/sp2018/Slides/Mar15_Bahdanau.pdf

  28. Self Attention [Vaswani et al. 2017] slides are adopted from https://people.cs.umass.edu/~strubell/doc/lisa-final.key

  29. Self Attention slides are adopted from https://people.cs.umass.edu/~strubell/doc/lisa-final.key

  30. Self Attention slides are adopted from https://people.cs.umass.edu/~strubell/doc/lisa-final.key

  31. Self Attention slides are adopted from https://people.cs.umass.edu/~strubell/doc/lisa-final.key

  32. Self Attention slides are adopted from https://people.cs.umass.edu/~strubell/doc/lisa-final.key

  33. Thank You!!!

More Related