1 / 23

Recurrent Neural Networks & LSTM

Recurrent Neural Networks & LSTM. Advisor: S. J. Wang, F. T. Chien Student: M. C. Sun 20150226. Outline. Neural Network Recurrent Neural Network (RNN) Introduction Training Long Short-Term Memory (LSTM) Evolution Architecture Connectionist Temporal Classification (CTC) Application

ricketson
Download Presentation

Recurrent Neural Networks & LSTM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recurrent Neural Networks & LSTM Advisor: S. J. Wang, F. T. Chien Student: M. C. Sun 20150226

  2. Outline • Neural Network • Recurrent Neural Network (RNN) • Introduction • Training • Long Short-Term Memory (LSTM) • Evolution • Architecture • Connectionist Temporal Classification (CTC) • Application • Speech recognition • Architecture

  3. Neural Network • Inspired by human’s neural system • A complicated architecture • With some specific limitations • Ex. DBN, CNN…

  4. Feedforward Neural Network … • Define width & depth of neural net • In training step • Given input data and their targets • Learn weights 1 output hidden …… 1 hidden 1 Input • h : activation function • σ : sigmoid function/ softmax function

  5. Training in Neural Network output unit k compared with target at t error function … 1 output hidden ‧ …… 1 hidden 1 Input Forward propagation

  6. Training in Neural Network output unit k compared with target at t error function … 1 output hidden …… 1 hidden 1 Input Forward propagation Backpropagation

  7. RNN A deepest neural network

  8. Recurrent Neural Network • A network of neurons with feedback connections • For time-varying input • It’s good at temporal processing and sequence learning input input time time

  9. Recurrent Neural Network • For supervised learning • Training: back propagation through time output output • hidden Unfolding over time • hidden input input t=5 t=4 t=1 t=2 t=3

  10. Unidirectional RNN Output Unroll Hidden t-1 t Input t+1 State of hidden node can represent short-term memory by connecting to itself at previous state

  11. Bidirectional RNN

  12. Bidirectional RNN Output Forward States Backward States Input t-1 t t+1

  13. Training in RNN output unit k compared with target at t error function • Feedforward • f(x): activation function • Back propagation through time output k • hidden j input t=4 t=1 t=2 t=3 t=5

  14. Training in RNN output unit k compared with target at t error function • Feedforward • f(x): logistic function • Back propagation through time output k • hidden j input t=5 t=4 t=1 t=2 t=3

  15. Training in RNN output unit k compared with target at t error function • Feedforward • f(x): logistic function • Back propagation through time output k • hidden j input t=5 t=4 t=1 t=2 t=3 Maximal value f’=0.25 Vanishing gradient problem

  16. LSTM One solution to vanishing gradient when training RNN

  17. Long Short-Term Memory (LSTM) • For conventional RNN, • Hidden state with self-cycle can only represents short short-term memory due to the vanishing gradient problem • Invented by Hochreiter & Schmidhuber (1997) • Longshort-term memoryis designed to allow hidden state to retain important information over longer period of time • Replace neurons’ structure of hidden layer in RNN • Reference: Long Short-Term Memory. Sepp Hochreiter, JuergenSchmidhuber. (1997)

  18. Long Short-Term Memory (LSTM)

  19. Constant Error Carrousel (CEC) • Let error flow passes through CEC • Enforce non-decay error flow back into time • Connection to the unit itself = 1 • Introduce cell concept Self-cycle = 1 t=4 t=5 t=1 t=2 t=3

  20. Gate • Weight conflict: all data come into nodes included important or irrelevant information • Introduce gate concept • From input to hidden • To protect cell from irrelevant input & storing value in cell • From hidden to output • To protect output from irrelevant cell & output cell’s value

  21. Long Short-Term Memory Structure • LSTM (CEC+gate) • Cell: storing memory • Input gate: protect the cell from irrelevant input • Output gate: protect irrelevant other units from cell • Forget gate: reset storing memory in cell • Peephole connection: improve precise timing

  22. Long Short-Term Memory Structure • By using CEC to achieve long short-tem memory • Learning gates’ weights to get the model

More Related