1 / 13

Learning Probabilistic Hierarchical Task Networks to Capture User Preferences

A riddle for you: What is the magic idea in planning that is at once more efficient and has higher complexity than vanilla planners?. Learning Probabilistic Hierarchical Task Networks to Capture User Preferences. Nan Li, Subbarao Kambhampati, and Sungwook Yoon

isla
Download Presentation

Learning Probabilistic Hierarchical Task Networks to Capture User Preferences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A riddle for you: What is the magic idea in planning that is at once more efficient and has higher complexity than vanilla planners? Learning Probabilistic Hierarchical Task Networks to Capture User Preferences Nan Li, Subbarao Kambhampati, and Sungwook Yoon School of Computing and Informatics Arizona State University Tempe, AZ 85281 USA nan.li.3@asu.edu, rao@asu.edu, Sungwook.Yoon@asu.edu Special Thanks to William Cushing

  2. Two Tales Of HTN Planning • Abstraction • Efficiency • Top-down • Preference handling • Quality • Bottom-up Learning • Most work • Our work

  3. Hitchhike? No way! Learning User Plan Preferences • Pbus: Getin(bus, source), Buyticket(bus), Getout(bus, dest) 2 • Ptrain: Buyticket(train), Getin(train, source), Getout(train, dest) 8 • Phike: Hitchhike(source, dest) 0

  4. Learning User Preferences as pHTNs • Given a set O of plans executed by the user • Find a generative model, Hl Hl = argmaxH p (O |H) Probabilistic Hierarchical Task Networks (pHTNs) S  0.2, A1 B1 S  0.8, A2 B2 B1  1.0, A2 A3 B2  1.0, A1 A3 A1  1.0, Getin A2  1.0, Buyticket A3  1.0, Getout

  5. LEARNING pHTNs • HTNs can be seen as providing a grammar of desired solutions • Actions  Words • Plans  Sentences • HTNs  Grammar • HTN learning  Grammar induction • pHTN learning by probabilistic context free grammar (pCFG) induction • Assumptions: parameter-less, unconditional S  0.2, A1 B1 S  0.8, A2 B2 B1  1.0, A2 A3 B2  1.0, A1 A3 A1  1.0, Getin A2  1.0, Buyticket A3  1.0, Getout

  6. A Two-Step Algorithm • Greedy Structure Hypothesizer: • Hypothesizes the schema structure • Expectation-Maximization (EM) Phase: • Refines schema probabilities • Removes redundant schemas Generalizes Inside-Outside Algorithm (Lary & Young, 1990)

  7. Greedy Structure Hypothesizer • Structure learning • Bottom-up • Prefer recursive to non-recursive

  8. EM Phase • E Step: • Plan parse tree computation • Most probable parse tree • M Step: • Selection probabilities update • s: ai p, aj ak

  9. Evaluation H* P1, P2, … Pn Learner • Ideal: User studies (too hard) • Our approach: • Assume H* represents user preferences • Generate observed plans using H* (H*  O) • Learn Hl from O (O  Hl) • Compare H* and Hl (H*  T*, Hl Tl) • Syntactic similarity is not important, only distribution is • Use KL-Divergence between distributions T*, Tl • KL-Divergence measures distance between distributions • Domains • Randomly Generated • Logistics Planning, Gold Miner Hl

  10. RATE OF LEARNING AND CONCISENESS Randomly Generated Domains Rate of Learning Conciseness More training plans, better schemas. • Small domains, 1 or 2 more non-primitive actions • Large domains, much more non-primitive actions • Refine structure learning?

  11. EFFECTIVENESS OF EM Randomly Generated Domains • Compare greedy schemas with learned schemas • EM step is very effective in capturing user preferences

  12. “BENCHMARK” DOMAINS Logistics Planning Gold Miner • H*: • Move by plane or truck • Prefer plane • Prefer fewer steps • KL Divergence: 0.04 • Recovers • plane > truck • less steps > more steps • H*: • Get the laser cannon • Shoot rock until adjacent to gold • Get a bomb • Use the bomb to remove last wall • KL Divergence: 0.52 • Reproduces basic strategy

  13. Conclusions & Extensions • Learn user plan preferences • Learned HTNs capture preferences rather than domain abstractions • Evaluate predictive power • Compare distributions rather than structure • Preference obfuscation • Poor graduate student who prefers to travel by plane usually travels by car • Learning user plan preferences obfuscated by feasibility constraints. ICAPS’09

More Related