1 / 17

Biological Arm Motion through Reinforcement Learning

Biological Arm Motion through Reinforcement Learning. by Jun Izawa, Toshiyuki Kondo, Koji Ito. Presented by Helmut Hauser. Overview. biological motivation and basic idea biological muscle force model mathematical formulations reaching task results and conclusions. Biological Motivation.

wanda-ortiz
Download Presentation

Biological Arm Motion through Reinforcement Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Biological Arm Motion through Reinforcement Learning by Jun Izawa, Toshiyuki Kondo, Koji Ito Presented by Helmut Hauser

  2. Overview • biological motivation and basic idea • biological muscle force model • mathematical formulations • reaching task • results and conclusions helmut hauser @ igi

  3. Biological Motivation (1) Reinforcement Learning in biology (Dopamine,…)  In the framework we have a big state and action space (Curse of dimensionality) (2) Multiple muscles produce joint torques • High redundancy • enables the system to maintain robustness and flexibility • increases space Humans can deal with that, but how ?? helmut hauser @ igi

  4. Basic Idea How do humans learn a new motion ? • We coactivate muscles and stiff our joint • Stiffness decreases while learning (feeling „safer“) • Our motions get smoother Maybe there exists some preferred domain in the action space with higher priority in the learning process. Idea: Restricting the learning domain for the action space while learning and then soften restrictions when improving. helmut hauser @ igi

  5. „stiffness“ Muscle force elasticity viscosity lr equilibrium length Muscle force model helmut hauser @ igi

  6. Lower arm 5 θ2 1 3 4 θ1 2 6 upper arm Biological Model helmut hauser @ igi

  7. and some transformations R =GTKG… elasticity λR-1GTK ……Θv D=GTBG … viscosity Merging two worlds Muscle force model Dynamic 2-link model helmut hauser @ igi

  8. Mathematical Formulation Remember: G is constant K = diag (k0+kiui) R = GTKG Θv = λR-1GTK D = GTBG constant helmut hauser @ igi

  9. . Mathematical Formulation pseudoinverse: Orthogonal decomposition: u = u1‘ + u2‘ n = n1‘ + n2‘ Note: 0 ≤ c ≤1 ň = n1‘ + c* n2‘ helmut hauser @ igi

  10. ρ R(J) action space u N(J) θv helmut hauser @ igi

  11. ρ R(J) action space u c N(J) θv helmut hauser @ igi

  12. reward Critic network qt-1 TD error Actor network Noise generator motor command ut Architecture helmut hauser @ igi

  13. goal (GA) Reward model: 1 - cErE for r -cErE for -1 for S start with rE=Σui2 over all 6 muscles Reaching Task helmut hauser @ igi

  14. Some implementation facts • extended input q, since reward model needs u too ! • stiffness R set to rather „high“ values • Neural Network (proposed by Shibata) as a function approximator (backpropagation) • as a second experiment and a load with arbitrary orientation (which stays the same in one trial) is applied within a certain region • Parameter (like noise-parameter, cE of the reward model,…) have to be tuned. helmut hauser @ igi

  15. Results Proposed architecture (compared to a standard approach) • gets more reward • Cummulative reward doesn‘t tend to zero • Energy doesn‘t change in the early stage, decreases after hitting the target. • With extra force: peak of stiffness moves to this area helmut hauser @ igi

  16. Conlusions • Can deal with redundant systems (typical case in nature) • The search noise is restricted to a subspace • A robust controller has been achieved • Some extra tuning was needed (made by evolution ?) Future outlook: • Applying to hierarchical system (more stages) • How to prevent extra tuning ? helmut hauser @ igi

  17. Literature „Biological Robot Arm Motion through Reinforcement Learning“ Jun Izawa, Toshiyuki Kondo, Koji Ito Proceedings of the 2002 IEEE International Conference on Robotics & Automation „Motor Learning Model using Reinforcement Learning with Neural Internal Model“ Jun Izawa, Toshiyuki Kondo, Koji Ito Department of Computational Intelligence and Systems „Biological Robot Arm Motion through Reinforcement Learning“ Jun Izawa, Toshiyuki Kondo, Koji Ito Biol.Xabern. 91, 10.22 (2004) Springer-Verlag 2004 helmut hauser @ igi

More Related