1 / 13

An Object-oriented Representation for Efficient Reinforcement Learning

An Object-oriented Representation for Efficient Reinforcement Learning. Carlos Diuk, Andre Cohen and Michael L. Littman Rutgers Laboratory for Real-Life Reinforcement Learning (RL) 3 Department of Computer Science Rutgers University (New Jersey, USA). ICML 2008 – Helsinki, Finland.

birch
Download Presentation

An Object-oriented Representation for Efficient Reinforcement Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Object-oriented Representation for Efficient Reinforcement Learning Carlos Diuk, Andre Cohen and Michael L. Littman Rutgers Laboratory for Real-Life Reinforcement Learning (RL)3 Department of Computer Science Rutgers University (New Jersey, USA) ICML 2008 – Helsinki, Finland

  2. Motivation How would YOU play this game?

  3. What’s in a state? s1 -> a0 -> s5 s5 -> a2 -> s24 s24 -> a1 -> s1 If we know that our agents are interacting in a spatial relation with objects, let’s just tell them so. A simple hash code that tells you if you’ve been “there” before. What we (the agent) can actually “see”: objects, interactions, spatial relationships.

  4. What we did • Grab ideas from Relational RL and come up with a representation that: • is suitable for a wide-enough range of domains • is tractable • provides opportunities for generalization • enables smart exploration • Strike a balance between generality and tractability.

  5. OO representation • Problem defined by a set of objects and their attributes. • Example: Objects in Pitfall defined by a bounding box on a set of pixels based on color. Man.<x,y> Log.<x,y> Hole.<x,y> Ladder.<x,y> Wall.<x,y> • State is the union of all objects’ attribute values.

  6. OO representation • For any given state s, there is a function c(s) that tells us which relations occur under s. • Dynamics defined by preconditions and effects. • Preconditions are conjunctions of terms: • Relations between objects: • touchN/S/E/W(objecti, objectj) • on(objecti, objectj) • Any (boolean) function on the attributes. • Any other function encoding prior knowledge. • Actions have effects that determine how objects’ attributes get modified. on(Man, Ladder) Action Up Man.y = Man.y + 8

  7. DOORMax • An algorithm for efficient learning of deterministic OO-MDPs. • When objects interact, and an effect is observed, DOORMax learns the conjunction of terms that enabled the effect. • Belongs to the R-Max family of algorithms: • Guides exploration to make objects interact

  8. Pitfall video

  9. DOORMax Analysis • Let n be the number of terms. • Assume that: • The number of effects per action is bounded by a (small) constant m. • Each effect has a unique conjunctive condition. • As long as effects are observed (that is, some effect occurs given an action a), DOORMax will learn the condition-effect pairs that determine the dynamics of a in O(nm). There is a worst-case bound, when lots of no-effects are observed, of O(nm).

  10. Results What about this game? Videogame

  11. Representations in Taxi

  12. Bigger Taxi

  13. Conclusions and future work • OO-MDPs provide a natural way of modeling an interesting set of domains, while enabling generalization and smart exploration. • DOORMax learns deterministic OO-MDPs outperforming state-of-the-art algorithms for factored-state representations. • DOORMax scales very nicely with respect to the size of the state space, as long as transition dynamics between objects do not change. • We do not have a provably efficient algorithm for stochastic OO-MDPs. • We do not yet handle inheritance between classes of objects.

More Related