1 / 16

Learning Equivalent Action Choices from Demonstration (S. Chernova and M. Veloso )

Learning Equivalent Action Choices from Demonstration (S. Chernova and M. Veloso ). Basia Korel Brown University cs2950-z February 15, 2010. Outline. Overview Demonstration Learning Algorithm Confident Execution Corrective Demonstration Limitations Option Class Algorithm

frey
Download Presentation

Learning Equivalent Action Choices from Demonstration (S. Chernova and M. Veloso )

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Equivalent Action Choices from Demonstration(S. Chernova and M. Veloso) Basia Korel Brown University cs2950-z February 15, 2010

  2. Outline • Overview • Demonstration Learning Algorithm • Confident Execution • Corrective Demonstration • Limitations • Option Class Algorithm • Experiments and Results • Conclusion

  3. Overview • Addressing: equivalent action choices • The context: learning from demonstration • In the real world: equivalent actions demonstrated arbitrarily and inconsistently

  4. Overview • Resulting problem: labeled training data lacks consistency • Contribution: identify, represent and enact equivalent action choices • Identify conflicting demonstrations • Represent choice of multiple actions in the policy • Common assumption of previous approaches: each state maps to one best action

  5. Demonstration Learning Algorithm • Learning equivalent actions is built upon: • Confident Execution: to obtain teacher demonstrations and learn the action policy • Corrective Demonstration: to correct execution mistakes by additional demonstrations

  6. Confident Execution • An interactive learning algorithm. Given the current world state, the robot: • Determines the need for a demonstration based on a confidence • May request demonstrations to improve policy

  7. Confident Execution • Robot’s policy represented by classifier C : s(a,c,db) • Trained using states as inputs and actions as labels • Measure of action selection confidence

  8. Corrective Demonstration • An algorithm to correct unwanted actions by providing the teacher with supplementary corrective demonstrations

  9. Limitations • Assumptions made: • One-to-one state-action mapping • Consistent demonstrations • A complete policy given enough demonstrations • Assumptions may fail in the real world! • Multiple equivalent actions cause ambiguity • Robot sensor noise may cause inconsistency

  10. Option Class Algorithm • Option class: a cluster of data points that have been labeled with at least two different actions • Algorithm: extracts and explicitly models option classes in the robot’s policy

  11. Option Class Algorithm given demonstration dataset D MPointsInLowConfidenceRegion(D) dMeanNearestNeighborDist(D) CConnectedComponents(M,d) forc ∈ Cdo AActionClasses(c) if Size(c) > 3 and Size(A) > 1 then CreateClass(D, c, Option-A) UpdateClassifier(D) ResetClass(D)

  12. Experiment • Obstacle avoidance domain: • Gathered data:

  13. Evaluation • Evaluation: Confident Execution with and without option classes • Metrics: • % of complete policies • # of demonstrations • NOT classification accuracy • Results (with respect to option classes): • Converge to complete policy with much higher frequency • Required demonstrations much lower

  14. Example Option Class Policies

  15. Conclusion • Multiple equivalent actions exist in the real world • Model action choices explicitly in the policy • Domain limitations: discrete action labels

  16. Thanks • Chad Jenkins, Brown RLAB and cs2950-z course staff/leaders

More Related