1 / 24

Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs

Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs. UC Berkeley ActionWebs Meeting November 03, 2010 By Jeremy Gillula. [ Some rights reserved unless otherwise noted; see http://tinyurl.com/2qn665 ]. TexPoint fonts used in EMF.

nnelson
Download Presentation

Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fusing Machine Learning & Control TheoryWith Applications to Smart Buildings & ActionWebs UC Berkeley ActionWebs Meeting November 03, 2010 By Jeremy Gillula [Some rights reserved unless otherwise noted; see http://tinyurl.com/2qn665] TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAAAAAAA

  2. Talk Outline • Current “State of the Art” • Reinforcement learning and apprenticeship learning • Reachability for guaranteed safe mode switching • Motivation • Goals – Combining Machine Learning and Control Theory • Existing Approaches • Current Research • Extensions • Conclusions • Questions ActionWebs Talk (J. Gillula)

  3. “An Application of Reinforcement Learning toAerobatic Helicopter Flight” (Abbeel et al., 2007) • Analysis: • Great performance • No formal safety analysis • Required some hand-tweaking for stability (e.g. hand-chosen reward weights) • Easily generalizable • Use linear regression to learn parameters of given model • Use differential dynamic programming to solve the MDP • Generate trajectory using current policy and nonlinear dynamics • Compute new policy using LQR and linearized dynamics around that trajectory • Reward function generated using apprenticeship learning [Video from Abbeel et. al. 2007] ActionWebs Talk (J. Gillula)

  4. “Design of Guaranteed Safe Maneuvers Using Reachable Sets…” (Gillula et al., 2010) • Safe given accuracy of model and worst-case disturbances • Used reachability analysis via level-set methods to design and perform a safe backflip ActionWebs Talk (J. Gillula)

  5. Create a level set function such that: Boundary of keep-out set K is defined implicitly by is negative inside region and positive outside Reachability as game: Disturbance attempts to force system into unsafe region, control attempts to stay safe Solution can be found via Hamilton-Jacobi-Bellman PDE: “A…Hamilton–Jacobi Formulation of Reachable Sets for Continuous Dynamic Games” (Mitchell et al., 2005) [Figure from Tomlin 2009] ActionWebs Talk (J. Gillula)

  6. “Design of Guaranteed Safe Maneuvers Using Reachable Sets…” (Gillula et al., 2010) • Analysis: • Decent performance • Formal safety analysis • Required human input for choosing design parameters • Difficult to generalize Recovery Drift Impulse ActionWebs Talk (J. Gillula)

  7. Motivation: “Machine Learning” Techniques vs. “Control Theory” Techniques ActionWebs Talk (J. Gillula)

  8. Goals/Research Statement • How can we get high-performance on complicated systems while still guaranteeing safety • Take advantage of “Machine Learning” techniques for performance • Data-driven models (potentially nonparametric) • Data-driven, sampling-based techniques for estimation and control • While getting “Control Theory”-style safety guarantees • Formal, principled analyses of safety • Several Possible Approaches • Adapt data-driven methods to existing safety-analysis techniques • Closely couple data-driven methods with techniques for generating safety guarantees • Use data-driven techniques in the context of existing safety-analysis techniques • Other alternatives ActionWebs Talk (J. Gillula)

  9. Talk Outline • Current “State of the Art” • Reinforcement learning and apprenticeship learning • Reachability for guaranteed safe mode switching • Motivation • Goals – Combining Machine Learning and Control Theory • Existing Approaches • Current Research • Extensions • Conclusions • Questions ActionWebs Talk (J. Gillula)

  10. “System Identification of Post Stall Aerodynamics forUAV Perching” (Hoburg and Tedrake, 2009) • Nonlinear and transient aerodynamics in perching • Need to learn model from data • Use physically-inspired basis functions • Nonlinear functions of state x, z, µ, etc. • Compute least-squares fit for every combination of n basis functions: Adapt data-driven methods to existing safety-analysis techniques [Figures from Hoburg and Tedrake 2009] ActionWebs Talk (J. Gillula)

  11. “System Identification of Post Stall Aerodynamics forUAV Perching” (Hoburg and Tedrake, 2009) • Nonlinear and transient aerodynamics in perching • Need to learn model from data • Use physically-inspired basis functions • Nonlinear functions of state x, z, µ, etc. • Compute least-squares fit for every combination of n basis functions: • Analysis/Extensions: • Use standard control theory techniques to generate safety guarantees • Use lasso or other regularization to choose basis functions Adapt data-driven methods to existing safety-analysis techniques [Figures from Hoburg and Tedrake 2009] ActionWebs Talk (J. Gillula)

  12. “Predictive Guidance Intercept Using The Neural EKF Tracker” (Stubberud and Kramer, 2007) • Augmented process model is: • Use an adaptive EKF to learn the error: • Let augmented state be: • Then: Closely couple data-driven methods with techniques for generating safety guarantees NN weights ActionWebs Talk (J. Gillula)

  13. “Predictive Guidance Intercept Using The Neural EKF Tracker” (Stubberud and Kramer, 2007) • Then associated Jacobian is: so state estimation and NN training are coupled • Normal EKF analysis follows • Analysis: • Learns model error • Learning done online • But combining ML and control theory tools can be tricky • E.g. augmented system is not observable Closely couple data-driven methods with techniques for generating safety guarantees ActionWebs Talk (J. Gillula)

  14. Talk Outline • Current “State of the Art” • Reinforcement learning and apprenticeship learning • Reachability for guaranteed safe mode switching • Motivation • Goals – Combining Machine Learning and Control Theory • Existing Approaches • Current Research • Extensions • Conclusions • Questions ActionWebs Talk (J. Gillula)

  15. [Pioneer image courtesy University of Queensland, http://tinyurl.com/38dje6f] Safely Learning A Bounded System Use data-driven techniques in the context of existing safety-analysis techniques • Learning unknown dynamics of a target vehicle via observation • Limited field of view • Safety = always keeping target in view, i.e. • Bounded system • Assume target dynamics are autonomous and bounded, i.e. • Measurement model given by: ActionWebs Talk (J. Gillula)

  16. [Pioneer image courtesy University of Queensland, http://tinyurl.com/38dje6f] Safely Learning A Bounded System Use data-driven techniques in the context of existing safety-analysis techniques • Problem statement • Learn target dynamics • Minimize error: • Maintain target in view: • For (1) use machine learning: • Fixed model w/linear regression • Physically inspired basis functions • Neural network • (1) leads to (2) via EKF, UKF, or PF • (3) requires controlling our vehicle’s position and height ActionWebs Talk (J. Gillula)

  17. [Pioneer image courtesy University of Queensland, http://tinyurl.com/38dje6f] Safely Learning A Bounded System Use data-driven techniques in the context of existing safety-analysis techniques • For (3) use reachability: • Unsafe set • Treat target motion as adversarial disturbance • Augmented system dynamics: • Result: • Can use any learning/tracking algorithm • Reachability only kicks in on border of unsafe sets ActionWebs Talk (J. Gillula)

  18. Caveat • What follows is pure brainstorming • Feedback and suggestions are welcome ActionWebs Talk (J. Gillula)

  19. Safely Learning A Bounded System Use data-driven techniques in the context of existing safety-analysis techniques • Possible extension: safe autonomous data collection/learning • Attempt to learn/modify building model (or control policy) online • Start w/basic physics model (or control policy) • Assume bounded errors as disturbance • Reachability enables following any exploration policies when safe [Image courtesy Jorge Ortiz, http://tinyurl.com/2dnz5jl] ActionWebs Talk (J. Gillula)

  20. Safely Learning A Bounded System Use data-driven techniques in the context of existing safety-analysis techniques • Limited acceptable range • Safety = always keeping target states within acceptable tolerances, i.e. • Bounded system • Assume target dynamics are bounded, i.e. • Problem statement • Learn system dynamics • Minimize error: • Maintain target states in safe region: • Proposed Approach • Use machine learning • Use the results of (1) with optimal control • Use reachability [Image courtesy Jorge Ortiz, http://tinyurl.com/2dnz5jl] ActionWebs Talk (J. Gillula)

  21. Safely Learning A Bounded System ActionWeb Use data-driven techniques in the context of existing safety-analysis techniques • Difficulties: • Reachable set calculations for high dimensions • And they need to be online [Image courtesy David Culler, http://tinyurl.com/2bcaqnh] ActionWebs Talk (J. Gillula)

  22. Safely Learning A Bounded System ActionWeb Use data-driven techniques in the context of existing safety-analysis techniques • Solution: Building decomposition • Decompose building into separate rooms • Model each room in parallel • Treat interactions between rooms as bounded adversarial inputs • Still fits in machine learning framework (can still model interactions) • Still fits in reachability framework (can still calculate safe sets) [Image courtesy Claire Tomlin, http://tinyurl.com/26bpcl8] ActionWebs Talk (J. Gillula)

  23. NN weights Conclusions • Combining Machine Learning and Control Theory • Achieving high-performance on complicated systems while still guaranteeing safety • Possible Approaches: • Adapt data-driven methods to existing safety-analysis techniques • Closely couple data-driven methods with techniques for generating safety guarantees • Use data-driven techniques in the context of existing safety-analysis techniques • Extension to smart buildings and ActionWebs ActionWebs Talk (J. Gillula)

  24. Questions? ActionWebs Talk (J. Gillula)

More Related