1 / 78

TrellisNet: Sequences Modelling with RNN and Convolutional Models

This presentation introduces TrellisNet, a network architecture that combines the strengths of Recurrent Neural Networks (RNNs) and Temporal Convolution Networks (TCNs) for sequence modelling. TrellisNet achieves state-of-the-art results in various sequence tasks, including language modelling and modelling long-range dependencies.

bettywalker
Download Presentation

TrellisNet: Sequences Modelling with RNN and Convolutional Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OTHER NETWORKS ARCHITECTURE Yuguang Lin, Aashi Jain , Siddarth Ravichandran Presenters CSE 291 G00 Deep Learning for Sequences

  2. TRELLIS NETWORK FOR SEQUENCE MODELLING Yuguang Lin, presenter

  3. Sequence Model

  4. Background • Three popular ways for sequence modelling • Temporal Convolution Networks (TCN) • Recurrent Networks (LSTM, GRU, Tree-structured LSTM) • Self-attention (Transformer)

  5. Motivation • TCN can give good empirical results • RNNs with many tricks can give state-of-the-art results in different tasks, but none of them seem to dominate multiple tasks • Can we combine TCN with RNNs so that we can use techniques from both sides?

  6. Previous Work • Convolutional LSTMs: combine convolutional and recurrent units (Donahue et al., 2015) • Quasi-recurrent neural networks interleave convolutional and recurrent layers (Bradbury et al., 2017) • Dilation applied in RNNs (Chang et al., 2017) • This paper follows the direction of previous paper: An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

  7. This work – Main Contribution • TrellisNet, the proposed network architecture, serves as a bridge between recurrent and convolutional models • Performed very well and achieved state-of-the-art results on different sequence tasks including language modelling and modelling long-range dependencies

  8. This presentation • TCN • TrellisNet • TrellisNet and TCN • RNN • TrellisNet and RNN • TrellisNet as a bridge between recurrent and convolutional models • An LSTM as a TrellisNet • Experiments • Results

  9. TCN – A Quick Overview • A special kind of CNN in which • 1) causality; no information from the future is used in computation • 2) take a sequence with variable length and map it to a same-length output sequence – just like RNN • In summary, TCN = 1 Fully-convolutional network with causal convolution

  10. TrellisNet

  11. TrellisNet

  12. TrellisNet

  13. TrellisNet

  14. Trellis Network at an atomic level

  15. Trellis Network on a sequence of units

  16. TrellisNet and TCN

  17. RNN

  18. RNN • Processes one input element at a time and unrolls in the time dimension • Non-parallel operation, unlike CNN and TCN that operate on elements in parallel in each layer

  19. TrellisNet and RNN

  20. TrellisNet and RNN

  21. TrellisNet and RNN

  22. TrellisNet and RNN

  23. TrellisNet and RNN

  24. TrellisNet and RNN • Induction step: • Suppose equation (6) holds, we need to show that at level j + 1 is also true • At level j + 1: use • W1 and W2 are sparse matrices now • And now we use f in the TrellisNet which is f(a, b) = g(a); (we only take the first input) • We have the following equations

  25. TrellisNet and RNN

  26. TrellisNet and RNN

  27. TrellisNet and RNN

  28. TrellisNet and RNN Injected input at each layer Hidden units from previous layer at current step and ones from previous layer at previous steps Causality condition is met Weights are shared for each layer Note: non-linearity after linear combination is skipped for clarity

  29. TrellisNet and RNN • At each hidden unit at time t at layer i, hidden unit is computed by using a hidden unit from previous layers i - j with history started at t – j and hidden unit from previous layers i – 1 with history started at t • Mixed group convolution, and can be represented with L = 2 in equation 5 • Notice here we have 4 layers of hidden units; this will become clear when we move to an LSTM as a TrellisNet (No, that is just the activation!)

  30. TrellisNet as a bridge between recurrent and convolutional models • TrellisNet is a special kind of TCN • TrellisNet is a generalization of truncated RNN, and Theorem 1 allows to benefit significantly from techniques developed for RNNs • From recurrent networks: • Structured nonlinear activations (e.g. LSTM and GRU gates) • Variational RNN dropout • Recurrent DropConnect • History compression and repackaging • From convolutional networks: • Large kernel and dilated convolution • Auxiliary losses at intermediate layers • Weight normalization • Parallel convolutional processing

  31. Questions for Discussion • Can you think of some ways to establish connection between the Trellis network with the self-attention architecture? • What are some drawbacks for this model? • Do you think this architecture has potential, and would you like to try it in your research / project? Why and why not.

  32. Expressing a TrellisNet as an LSTM

  33. Expressing a TrellisNet as an LSTM

  34. Benchmark Tasks • Word-level language modelling • Penn Treebank (PTB), WikiText-103 (110 times larger) • Character-level language modelling • PTB • Long-range modelling • Sequential MNIST, PMNIST, and CIFAR-10

  35. Results

  36. Results

  37. Questions?

  38. Thank you!

  39. References • Bai, S., Kolter, J. Z., & Koltun, V. (2018). An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling • Trellis Networks for Sequence Modeling, Shaojie Bai, J. Zico Kolter, Vladlen Koltun 2018

  40. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks By Kai Sheng Tai, Richard Socher, Christopher D. Manning, 2015 Aashi JainFeb 13, 2019

  41. LSTMs • A type of RNNs • Preserves sequence information over time • Address the problem of exploding or vanishing gradients by introducing a memory cell that is able to preserve state over long periods of time.

  42. LSTMs so far…. • Are being explored in a linear chain. • However, language is not really sequential (Linear LSTMs) - (e.g., “My dog, who I rescued in the past, eats rawhide”) • So, we turn to tree-structured models!

  43. Standard LSTM transition equations

  44. Limitation of Standard LSTM Only allow for strictly sequential information propagation

  45. Why Tree-Structured? • Linguistically attractive for syntactic interpretations of sentence structure. • To model word/phrase dependencies in a tree-like format instead of a linear fashion.

  46. That’s exactly what this paper has done!

  47. Differences? • Standard LSTM- hidden state from the input at the current time step and the hidden state of the LSTM unit in the previous time step. • Tree-LSTM- hidden state from an input vector and the hidden states of arbitrarily many child units.

  48. In more detail. In Tree-LSTMs: • Gating vectors and memory cell updates are dependent on the states of possibly many child units. • Contains one forget gate for each child k. Allows selective incorporation of information from each child.

  49. Types of tree-LSTMs Two variants: • Child-Sum Tree-LSTM (Dependency based, # dependents highly variable) • N-ary Tree-LSTM (Constituency based, left v/s right dependents)

More Related