1 / 1

Reinforcement Learning Control with Robust Stability

Trajectory of Weights and Bounds on Regions of Stability. Reinforcement Learning Agent in Parallel with Controller. Conclusions. Incorporating Time-Varying IQC in Reinforcement Learning. Neural Net and Robust Control with IQCs. discount factor. reinforcement (|error|). action. state.

nia
Download Presentation

Reinforcement Learning Control with Robust Stability

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Trajectory of Weights and Bounds on Regions of Stability Reinforcement Learning Agent in Parallel with Controller Conclusions Incorporating Time-Varying IQC in Reinforcement Learning Neural Net and Robust Control with IQCs discount factor reinforcement (|error|) action state value function • IQC bounds on parameters of tanh and sigmoid networks exist for which the combination of a reinforcement learning agent and feedback control system satisfy the requirements of robust stability theorems. (static and dynamic stability) • Robust reinforcement learning algorithm improves control performance while avoiding instability on several simulated problems. • Reinforcement learning is now more acceptable in practical applications as an adaptive controller that modifies its behavior over time, due to the guarantees of stability. • Initial, conservative robust controller becomes more aggressive through adaptation to actual physical system. • See http://www.cs.colostate.edu/~anderson/res/rl/ C policy function Integral Quadratic Constraints E Uncertainties (D) w v Reference 1 1 2 2 D2 D1 Step 4 final weight vector Contoller/Plant (M) Step 1 UNSTABLE REGION ! next guaranteed-stable region D Output initial guaranteed-stable region Step 5 … Now learning can continue until edge of new bounding box is encountered. Motivation B Subtract right side from left to get algorithm for updating Q weight trajectory without robust contstraints A must find new stable region initial weight vector weight trajectory with robust contstraints Nominal Perturbed Step 3 Step 0 trajectory of weights while learning Guarantees stability Step 2 Replace expectation with sample (Monte Carlo approach) M Good response Terrible response ) ( Robust control theory P1 0 0 P2 initial weight vector weight space (high-dimensional) D1 Results in less aggressive controllers ) ( D1 0 0 D2 1 1 M 2 2 Optimizes the performance of a controller M D2 Reinforcement learning No guarantee of stability while learning Experimental HVAC System Robust Control based on IQCs An Integral Quadratic Constraint (IQC) describes the relationship between signals as Stability of the closed loop system is guaranteed if for all w and for e > 0. Given specific IQCs for a particular system, this inequality problem becomes a linear, matrix inequality (LMI) problem. Perturbed case, no learning Perturbed case, with learning Bounds on neural net weight adjustment in green Neural net as reinforcement learning actor in blue Robust controller and plant in red Reinforcement Learning Through learning, controller has been fine-tuned to actual dynamics of real plant without losing guarantee of stability ! Sum Squared Error Nominal Controller 0.646 Robust Controller 0.286 Robust RL Controller 0.243 Reinforcement learning algorithm guides adjustment of actor's weights. IQC places bounding box in weight space of actor network, beyond which stability has not been verified. Temporal-difference error weight space (high-dimensional) Reinforcement Learning Control with Robust Stability Chuck Anderson, Matt Kretchmar, Department of Computer Science, Peter Young, Department of Electrical and Computer Engineering Douglas Hittle, Department of Mechanical Engineering Colorado State University, Fort Collins, CO First Example Third Example Fourth Example 1st Order Distillation Column Robust Reinforcement Learning Second Example 2nd Order Without robust constraints, becomes unstable before learning final stable solution. Without robust constraints, becomes unstable before learning final stable solution.

More Related