1 / 17

Dr. Itamar Arel College of Engineering Department of Electrical Engineering and Computer Science

ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 17: TRTRL, Implementation Considerations, Apprenticeship Learning. November 3, 2010. Dr. Itamar Arel College of Engineering Department of Electrical Engineering and Computer Science The University of Tennessee Fall 2010.

fala
Download Presentation

Dr. Itamar Arel College of Engineering Department of Electrical Engineering and Computer Science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 17: TRTRL, ImplementationConsiderations, Apprenticeship Learning November 3, 2010 Dr. Itamar Arel College of Engineering Department of Electrical Engineering and Computer Science The University of Tennessee Fall 2010

  2. Outline • Recap on RNNs • Implementation and usage issues with RTRL • Computational complexity and resources required • Vanishing gradient problem • Apprenticeship learning

  3. Recap on RNNs • RNNs are potentially much stronger than FFNN • Can capture temporal dependencies • Embed complex state representation (i.e. memory) • Models of discrete-time dynamic systems • They are (very) complex to train • TDNN – limited performance based on window • RTRL – calculates a dynamic gradient on-line

  4. RTRL reviewed • RTRL is a gradient descent based method • It relies on sensitivities expressing the impact of any weight wij on the activation of neuron k. • The algorithm then consists of computing weight changes • Let’s look at the resources involved …

  5. Implementing RTRL – computations involved • The key component in RTRL is the sensitivities matrix • Must be calculated for each neuron • RTRL, however, is NOT local … • Can the calculations be efficiently distributed? N3 N N4

  6. Implementing RTRL – storage requirements • Let’s assume a fully-connected network of N neurons • Memory resources • Weights matrix, wij N2 • Activations, yk N • Sensitivity matrix N3 • Total memory requirements O(N3) • Let’s go over an example: • Let’s assume we have 1000 neurons in the system • Each value requires 20 bits to represent •  ~20 Gb of storage!!

  7. Possible solutions – static subgrouping • Zipser et. al (1989) suggested static grouping of neurons • Relaxing the “fully-connected” requirement • Has backing in neuroscience • Average “branching factor” in the brain ~ 1000 • Reduced the complexity by simply leaving out elements of the sensitivity matrix based upon subgrouping of neurons • Neurons are subgrouped arbitrarily • Sensitivities between groups are ignored • All connections still exist in the forward path • If g is the number of subgroups then … • Storage is O(N3/g2 ) • Computational speedup is g3 • Communications  each node communicates with N/gnodes

  8. Possible solutions – static subgrouping (cont.) • Zipser’s empirical tests indicate that these networks can solve many of the problems full RTRL solves • One caveat of the subgrouped RTRL training is that each subnet must have at least one unit for which a target exists (since gradient information is not exchanged between groups) • Others have proposed dynamic subgrouping • Subgrouping based on maximal gradient information • Not realistic for hardware realization • Open research question: how to calculate gradient without the O(N3) storage requirement?

  9. Truncated Real Time Recurrent Learning (TRTRL) Motivation: To obtain a scalable version of the RTRL algorithm while minimizing performance degradation How? Limit the sensitivities of each neuron to its ingress (incoming) and egress (outgoing) links

  10. Performing Sensitivity Calculations in TRTRL • For all nodes that are not in the output set, the egress sensitivity values for node i are calculated by imposing k=jin the original RTRL sensitivity equation, such that • Similarly, the ingress sensitivity values for node j are given by • For output neurons, a nonzero sensitivity element must exist in order to update the weights

  11. Resource Requirements of TRTRL • The network structure remains the same with TRTRL, only the calculation of sensitivities is reduced • Significant reduction in resource requirements … • Computational load for each neuron drops to from O(N3) to O(2KN), where K denotes the number of output neurons • Total computational complexity is now O(2KN2) • Storage requirements drop from O(N3) to O(N2) • Example revisited: For N=100, 10 outputs  100k multiplications and only 20kB of storage!

  12. Input Output Further TRTRL Improvements – Clustering of Neurons • TRTRL introduced localization and memory improvement • Clustered TRTRL adds scalability by reducing the number of long connection lines between processing elements

  13. Test case #1: Frequency Doubler • Input: sin(x), target output sin(2x) • Both networks had 12 neurons

  14. Vanishing Gradient Problem • Recap on goals: • Find temporal dependencies in data with a RNN • The idea behind RTRL: when an error value is found, apply it to inputs seen an indefinite number of epochs ago • In 1994 (Bengio et. al) it has been shown that both BPTT and RTRL suffer from the problem of vanishing gradient information • When using gradient based training rules, the “error signal” that is applied to previous inputs tends to vanish • Because of this, long-term dependencies in the data are often overlooked • Short-term memory is ok, long-term (>10 epochs) – lost

  15. Vanishing Gradient Problem (cont.) • A learning error yields gradients on outputs, and therefore on the state variables st • Since the weights (parameters) are shared across time xt yt st RNN

  16. What is Apprenticeship Learning • Many times we want to train an agent based on a reference controller • Riding a bicycle • Flying a plane • Starting from scratch may take a very long time • Particularly for large state/action spaces • May cost a lot (e.g. helicopter crashing) • Process: • Train agent on reference controller • Evaluate trained agent • Improve trained agent • Note: reference controller can be anything (e.g. heuristic controller for Car Race problem)

  17. Formalizing Apprenticeship Learning • Let’s assume we have a reference policy p from which we want our agent to learn • We would first like to learn the (approx.) value function, Vp • Once we have Vp, we can try an improve it based on the policy improvement theorem, i.e. • By following the original policy greedily we obtain a better policy! • In practice, many issues should be considered such as state space coverage and exploration/exploitation • Train on zero exploration, then explore gradually …

More Related