Task Encoding and Strategy Learning in Large Worlds Dagstuhl Workshop, 29 July 2010 - PowerPoint PPT Presentation

Task encoding and strategy learning in large worlds dagstuhl workshop 29 july 2010
1 / 34

  • Uploaded on
  • Presentation posted in: General

Task Encoding and Strategy Learning in Large Worlds Dagstuhl Workshop, 29 July 2010. Subramanian Ramamoorthy Institute of Perception, Action and Behaviour School of Informatics University of Edinburgh. Motivation: Simon’s Ant.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Task Encoding and Strategy Learning in Large Worlds Dagstuhl Workshop, 29 July 2010

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Task encoding and strategy learning in large worlds dagstuhl workshop 29 july 2010

Task Encoding and Strategy Learning in Large WorldsDagstuhl Workshop, 29 July 2010

Subramanian Ramamoorthy

Institute of Perception, Action and Behaviour

School of Informatics

University of Edinburgh

Motivation simon s ant

Motivation: Simon’s Ant

What does this tell you about robust autonomy in large worlds?

A robust autonomous agent

A Robust Autonomous Agent

  • How does she represent the task in order to be able to deploy it in a wide

  • variety of previously unseen and unmodelled environments?

  • 2. How is this efficiently utilized for learning?

Some hypotheses my approach

Some Hypotheses (my approach)

  • Agent represents tasks/environments in terms of a hierarchy of abstractions, ranging from weak sufficient conditions (qualitative information) to detailed quantitative information

    • Qualitative descriptions define an abstract problem that is useful for coarse reasoning about the large world

    • Quantitative information can be dealt with locally or at a slower time scale

    • Variety of learning methods can be combined to leverage their strengths

  • An ideal abstraction is such that one can make many useful inferences at the abstract level, without recourse to quantitative details that are uncertain/unobserved/undefined

    • Different from ‘mere’ clustering of states, etc.

    • We want a decision making strategy to be fully defined at each level

So what does an abstraction look like worked example global control of cart pole

So, what does an abstraction look like?Worked Example: Global Control of Cart-Pole

Introducing the cart pole system

Introducing the Cart-Pole System

  • System consists of two subsystems – pendulum and cart on finite track

  • Only one actuator – cart

  • We want global asymptotic stability of 4-dim system

    • The Game: Experimenter hits the pole with arbitrary velocity at any time, system picks controls

    • What are the weak sufficient conditions defining this task?

Phase space of the pendulum

Dealing with the adversary global structure

Dealing with the 'Adversary'- Global Structure

Adversary could push

system anywhere,

e.g., here

Can describe

global strategy

as a qualitative

transition graph

Larger disturbances

could truly change

quantitative details,

e.g., any number of

rotations around origin

The uncontrolled system

converges to this point

We want to reach and

stay here

Describing local behaviour templates

Describing Local Behaviour: Templates

Lemma (Spring – Mass - Positive Damping):

Let a system be described by

where, and

Then it is asymptotically stable at (0,0).

Lemma (Spring – Mass - Negative Damping):

Let a system be described by

where, and

Then it has an unstable fixed-point at (0,0), and no limit cycle.

Global controller for pendulum

The control law:

if Balance

else if Pump

else Spin


Global Controller for Pendulum

The global control law

The Global Control Law

The switching strategy:

If then Balance

else if then Pump

else Spin

Demonstration on a physical set up

Demonstration on a physical set-up

S. Ramamoorthy, B.J. Kuipers, Qualitative heterogeneous control of higher order systems, Hybrid Systems: Computation and Control (2003)

A few points to take away

A Few Points to Take Away

  • No learning in this example but we can still learn things from it

  • ‘Symbol’  local system with well defined dynamical properties

    • could also do this using automated formal methods [Shults+Kuipers,AIJ97]

  • We can talk about task achievement for an entire family of dynamical systems

    • Weak commitment to functional forms of f and g(also very large parameter intervals, etc.)

    • Possibility for composition and interactive strategies at symbolic level

  • Current work: how does one make general relational/logical statements about the behaviour of such models – so that we can use ‘reasoning’ tools at abstract levels

    • Could enable greedy learning of local models with interesting predicates

What does this have to do with learning control strategies

What does this have to do with learning control strategies?

Task walking on irregular terrain

Task: Walking on Irregular Terrain

  • No detailed models of dynamics

  • Precisely specified footfalls

  • Height/length variations

  • Hard to represent & achieve with state of the art methods!

Good ideas in two legged locomotion

Good Ideas in Two-legged Locomotion

Compass gait walking a conceptual view

Compass Gait Walking: A Conceptual View

[Kuo, Science ’05]

Natural parameterization of gait

Natural Parameterization of Gait

Abstract plan

Abstract Plan

Define qualitative strategy

in low-dimensions (finite

horizon optimal control)





Lift resulting strategy to

the more complex c-space

(presently unknown!)

S. Ramamoorthy, B.J. Kuipers, Qualitative hybrid control of dynamic bipedal walking, Robotics: Science and Systems II, pp. 89-96 (2006)

Trajectory generation multi link legged robot

Trajectory Generation: Multi-link Legged Robot

  • Random actions

  • Imperfect gait

  • Active learning

Known Analytically

Approximating unknown manifold from data

Approximating Unknown Manifold from Data

Organize data in a k-NN graph

Where is manifoldin the graph?

  • Manifold  Set of geodesic trajectories restricted to it

  • If the manifold encodes task – every geodesic must behave like template plan

  • Diagram must commute!

    • Minimize commutativity error

Result controlled dynamic walking

Result: Controlled Dynamic Walking

S. Ramamoorthy, B. Kuipers, Trajectory generation for dynamic bipedal walking through qualitative model based manifold learning, ICRA 08

Can we proceed without the low dim model

Can We Proceed Without the Low-dim Model?

  • Consider high-dim data drawn from an unknown low-dim manifold

  • We can approximate the tangent space:

  • This can be learnt with a pair of optimization steps

  • Simple example: 3-link arm

  • The following error term defines the manifold:

  • Another error minimization defines geodesic paths:

Learnt skill manifold 3 link arm

Learnt Skill Manifold: 3-link Arm

  • The grey mesh is the Delaunay triangulation of the 100 data points

  • shown for visualization of the desired manifold

  • (from which curves in fig. c are drawn)

I. Havoutis, S. Ramamoorthy, Geodesic trajectory generation on learnt skill manifolds, ICRA 2010

Constrained trajectory generation on skill manifolds

Constrained Trajectory Generation on Skill Manifolds

Constrained walking variable foot placement

Constrained Walking:Variable Foot Placement

Following the unconstrained geodesics, oblivious to obstacles

Constrained geodesic trajectory – avoid obstacles, while staying within demonstrated class

I. Havoutis, S. Ramamoorthy, Constrained geodesic trajectory generation on approximately optimal skill manifolds, IROS 2010

What about really dynamic environments

What about really dynamic environments?

An adversarial navigation problem

An Adversarial Navigation Problem

  • Let us make the abstract spaces concrete

    • you are driving over a network of highways

  • Two sources of uncertainty:

    • Oncoming traffic (changing goals)

    • Changing dynamics, navigability/costs

Solution strategy

Solution Strategy

  • In ‘simple’/reasonably well understood worlds, acquire basis strategies

    • e.g., imitation learning

    • Could also be more bottom-up exploratory learning

  • In a continually changing complex world, learn strategies in a game against a (fictitious) adversary

One way to learning primitive strategies

One Way to Learning Primitive Strategies

  • Learn policy from expert:

    • RL problem

    • Reward as weighted combination of features

  • 2-player zero-sum game

    • select distribution over actions to maximise V(ψ) – V(πE)

    • nature varies R(s) through weights w

[Syed & Schapire 2008]

Game theoretic strategy learning

Game-theoretic Strategy Learning

  • Environment picks transition function, reward

  • You pick mixture over basis strategies (finite horizon)

  • Online regret minimization to compute strategies

    • Composing elemental strategies in response to changing environment

Learning to drive in novel scenarios

Learning to Drive in Novel Scenarios

B. Rosman, S. Ramamoorthy, A game theoretic procedure for learning hierarchically structured strategies, ICRA 2010.

Learning to drive in novel scenarios1

Learning to Drive in Novel Scenarios



  • Many autonomous agent behaviours admit efficient descriptions in terms of consistent hierarchy of abstractions

  • Challenges for learning:

    • What unsupervised learning methods can we use to extract base concepts (how descriptive are these models)?

    • Are there principled ways to refine these models over time?

    • Efficient methods for online strategy learning: how best to define games at the abstract level so they are consistent with fully quantitative local problems?

    • Life-long and social learning

One future direction

One Future Direction

  • Complex manipulation problems in terms of hierarchies of abstractions

    • At one level, one is only thinking of relational concepts: single hole

    • At another level, one if faced with the full challenge of robotics – grasping, etc.

  • Ability to work with partial specifications and concepts

  • Ability to refine representations as we have more and more experience

B. Rosman, S. Ramamoorthy, Learning spatial relationships

between objects (Under Review)

  • Login