1 / 45

Model-Free Stochastic Perturbative Adaptation and Optimization

Model-Free Stochastic Perturbative Adaptation and Optimization. Gert Cauwenberghs Johns Hopkins University gert@jhu.edu 520.776 Learning on Silicon http://bach.ece.jhu.edu/gert/courses/776. Model-Free Stochastic Perturbative Adaptation and Optimization OUTLINE. Model-Free Learning

Download Presentation

Model-Free Stochastic Perturbative Adaptation and Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Model-Free Stochastic PerturbativeAdaptation and Optimization Gert Cauwenberghs Johns Hopkins University gert@jhu.edu 520.776 Learning on Silicon http://bach.ece.jhu.edu/gert/courses/776

  2. Model-Free Stochastic PerturbativeAdaptation and OptimizationOUTLINE • Model-Free Learning • Model Complexity • Compensation of Analog VLSI Mismatch • Stochastic Parallel Gradient Descent • Algorithmic Properties • Mixed-Signal Architecture • VLSI Implementation • Extensions • Learning of Continuous-Time Dynamics • Reinforcement Learning • Model-Free Adaptive Optics • AdOpt VLSI Controller • Adaptive Optics “Quality” Metrics • Applications to Laser Communication and Imaging

  3. The Analog Computing Paradigm • Local functions are efficiently implemented with minimal circuitry, exploiting the physics of the devices. • Excessive global interconnects are avoided: • Currents or charges are accumulated along a single wire. • Voltage is distributed along a single wire. Pros: • Massive Parallellism • Low Power Dissipation • Real-Time, Real-World Interface • Continuous-Time Dynamics Cons: • Limited Dynamic Range • Mismatches and Nonlinearities (WYDINWYG)

  4. SYSTEM INPUTS OUTPUTS { } p i e ( p ) REFERENCE Effect of Implementation Mismatches Associative Element: • Mismatches can be properly compensated by adjusting the parameters pi accordingly, provided sufficient degrees of freedom are available to do so. Adaptive Element: • Requires precise implementation • The accuracy of implemented polarity (rather than amplitude) of parameter update increments Dpi is the performance limiting factor.

  5. ( k ) S ( k ) y = p x ij i j j 2 1 S S target ( k ) ( k ) y - y i i 2 k j Example: LMS Rule A linear perceptron under supervised learning: with gradient descent: reduces to an incremental outer-product update rule, with scalable, modular implementation in analog VLSI. ) ( e =

  6. S ( ) x = f p x i ij j j S × e = f ' p e j j ij i i Incremental Outer-Product Learning in Neural Nets Multi-Layer Perceptron: Outer-Product Learning Update: • Hebbian (Hebb, 1949): • LMS Rule (Widrow-Hoff, 1960): • Backpropagation (Werbos, Rumelhart, LeCun):

  7. e e ¶ ¶ ¶ ¶ y x l m S S × × = ¶ ¶ ¶ ¶ p y x p i m l m i l Gradient Descent Learning Minimize e(p) by iterating: from calculation of the gradient: Implementation Problems: • Requires an explicit model of the internal network dynamics. • Sensitive to model mismatches and noise in the implemented network and learning system. • Amount of computation typically scales strongly with the number of parameters.

  8. Gradient-Free Approach to Error-Descent Learning • Avoid the model sensitivity of gradient descent, by observing the parameter dependence of the performance error on the network directly, rather than calculating gradient information from a pre-assumed model of the network. Stochastic Approximation: • Multi-dimensional Kiefer-Wolfowitz (Kushner & Clark 1978) • Function Smoothing Global Optimization (Styblinski & Tang 1990) • Simultaneous Perturbation Stochastic Approximation (Spall 1992) Hardware-Related Variants: • Model-Free Distributed Learning (Dembo & Kailath 1990) • Noise Injection and Correlation (Anderson & Kerns; Kirk & al. 1992-93) • Stochastic Error Descent (Cauwenberghs 1993) • Constant Perturbation, Random Sign (Alspector & al. 1993) • Summed Weight Neuron Perturbation (Flower & Jabri 1993)

  9. Stochastic Error-Descent Learning Minimize e(p) by iterating: from observation of the gradient in the direction of p(k): with random uncorrelated binary components of the perturbation vector p(k): Advantages: • No explicit model knowledge is required. • Robust in the presence of noise and model mismatches. • Computational load is significantly reduced. • Allows simple, modular, and scalable implementation. • Convergence properties similar to exact gradient descent.

  10. Stochastic Perturbative LearningCell Architecture

  11. Stochastic Perturbative LearningCircuit Cell

  12. Charge Pump Characteristics (b) (a)

  13. Supervised Learning of Recurrent Neural Dynamics

  14. { } p i The Credit Assignment Problemor How to Learn from Delayed Rewards • External, discontinuous reinforcement signal r(t). • Adaptive Critics: • Discrimination Learning (Grossberg, 1975) • Heuristic Dynamic Programming (Werbos, 1977) • Reinforcement Learning (Sutton and Barto, 1983) • TD(l) (Sutton, 1988) • Q-Learning (Watkins, 1989) SYSTEM INPUTS OUTPUTS r*(t) r(t) ADAPTIVE CRITIC

  15. Reinforcement Learning(Barto and Sutton, 1983) Locally tuned, address encoded neurons: Adaptation of classifier and adaptive critic: • eligibilities: • internal reinforcement:

  16. Reinforcement Learning Classifier for Binary Control State Eligibility Neuron Select (State) Adaptive Critic 64 Reinforcement Learning Neurons State (Quantized) Action Network Action (Binary)

  17. brain A Biological Adaptive Optics System iris retina lens zonule fibers cornea optic nerve

  18. Wavefront Distortion and Adaptive Optics • Imaging • - defocus • - motion • Laser beam • - beam wander/spread • - intensity fluctuations

  19. Adaptive OpticsConventional Approach • Performs phase conjugation • assumes intensity is unaffected • Complex • requires accurate wavefront phase sensor (Shack-Hartman; Zernike nonlinear filter; etc.) • computationally intensive control system

  20. Incoming wavefront Wavefront corrector with N elements: u1 ,…,un ,…,uN Adaptive OpticsModel-Free Integrated Approach • Optimizes a direct measure J of optical performance (“quality metric”) • No (explicit) model information is required • any type of quality metric J, wavefront corrector (MEMS, LC, …) • no need for wavefront phase sensor • Tolerates imprecision in the implementation of the updates • system level precision limited by accuracy of the measured J

  21. Adaptive Optics Controller ChipOptimization by Parallel Perturbative Stochastic Gradient Descent image F(u) J(u) wavefront corrector performance metric sensor J(u) u AdOpt VLSI wavefront controller

  22. Adaptive Optics Controller ChipOptimization by Parallel Perturbative Stochastic Gradient Descent image F(u) F(u+du) J(u+du) J(u) wavefront corrector performance metric sensor J(u+du) u+du AdOpt VLSI wavefront controller

  23. Adaptive Optics Controller ChipOptimization by Parallel Perturbative Stochastic Gradient Descent image F(u) F(u+du) F(u-du) J(u+du) dJ J(u) wavefront corrector performance metric sensor J(u-du) J(u-du) u-du uj(k+1) = uj(k) + gdJ(k)duj(k) AdOpt VLSI wavefront controller

  24. Parallel Perturbative Stochastic Gradient DescentArchitecture g dJ {-1,0,+1} <duiduj > = s2 dij uncorrelated perturbations duj –1 z J(u) uj uj(k+1) = uj(k) + gdJ(k)duj(k) 2

  25. Parallel Perturbative Stochastic Gradient DescentMixed-Signal Architecture digital sgn(dJ) duj = ±1 {-1,0,+1} analog Bernoulli duj g |dJ| –1 z J(u) uj uj(k+1) = uj(k) + g |dJ(k)| sgn(dJ(k) ) duj(k) Polarity Amplitude

  26. AdOpt mixed-mode chip 2.2 mm2, 1.2mm CMOS • Controls 19 channels • Interfaces with LC SLM or MEMS mirrors Wavefront ControllerVLSI ImplementationEdwards, Cohen, Cauwenberghs, Vorontsov & Carhart (1999)

  27. Wavefront ControllerChip-level Characterization

  28. Wavefront ControllerSystem-level Characterization (LC SLM) HEX-127 LC SLM

  29. Wavefront ControllerSystem-level Characterization (SLM) Maximized and minimized J(u) 100 times max min

  30. Wavefront ControllerSystem-Level Characterization (MEMS)Weyrauch, Vorontsov, Bifano, Hammer, Cohen & Cauwenberghs Response function MEMS mirror (OKO)

  31. Wavefront ControllerSystem-level Characterization(over 500 trials)

  32. imaging: Horn(1968), Delbruck (1999) Muller et al.(1974) Vorontsov et al.(1996) laser beam focusing: laser comm: Quality Metrics

  33. Image Quality Metric ChipCohen, Cauwenberghs, Vorontsov & Carhart (2001) pixel 22 x 22 pixel array 2mm 120mm 120mm

  34. Architecture (3 x 3) Edge and Corner Kernels -1 -1 +4 -1 -1

  35. Image Quality Metric Chip CharacterizationExperimental Setup

  36. uniformly illuminated Image Quality Metric Chip Characterization Experimental Results dynamic range ~ 20

  37. Image Quality Metric Chip CharacterizationExperimental Results

  38. Beam Variance Metric ChipCohen, Cauwenberghs, Vorontsov & Carhart (2001) + + + + perimeter of dummy pixels 22x 22 array 20 x 20 Pixel array 22x22 Pixel array 70mm

  39. Beam Variance Metric Chip CharacterizationExperimental Setup

  40. Beam Variance Metric Chip CharacterizationExperimental Results dynamic range ~ 5

  41. Beam Variance Metric Chip CharacterizationExperimental Results

  42. Beam Variance Metric Sensor in the Loop Laser Receiver Setup

  43. Beam Variance Metric Sensor in the Loop

  44. Conclusions • Computational primitives of adaptation and learning are naturally implemented in analog VLSI, and allow to compensate for inaccuracies in the physical implementation of the system under adaptation. • Care should still be taken to avoid inaccuracies in the implementation of the adaptive element. Nevertheless, this can easily be achieved by ensuring the correct polarity, rather than amplitude, of the parameter update increments. • Adaptation algorithms based on physical observation of the “performance” gradient in parameter space are better suited for analog VLSI implementation than algorithms based on a calculated gradient. • Among the most generally applicable learning architectures are those that operate on reinforcement signals, and those that blindly extract and classify signals. • Model-free adaptive optics leads to efficient and robust analog implementation of the control algorithm using a criterion that can be freely chosen to accommodate different wavefront correctors, and different imaging or laser communication applications.

More Related