1 / 15

LEARNING USER PLAN PREFERENCES OBFUSCATED BY FEASIBILITY CONSTRAINTS

LEARNING USER PLAN PREFERENCES OBFUSCATED BY FEASIBILITY CONSTRAINTS. Nan Li, William Cushing, Subbarao Kambhampati, and Sungwook Yoon School of Computing and Informatics Arizona State University Tempe, AZ 85281 USA nan.li.3@asu.edu , wcushing@asu.edu , rao@asu.edu , Sungwook.Yoon@asu.edu.

sidone
Download Presentation

LEARNING USER PLAN PREFERENCES OBFUSCATED BY FEASIBILITY CONSTRAINTS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LEARNING USER PLAN PREFERENCES OBFUSCATED BY FEASIBILITY CONSTRAINTS Nan Li, William Cushing, Subbarao Kambhampati, and Sungwook Yoon School of Computing and Informatics Arizona State University Tempe, AZ 85281 USA nan.li.3@asu.edu, wcushing@asu.edu, rao@asu.edu, Sungwook.Yoon@asu.edu

  2. USER PLAN PREFERENCES OBFUSCATED BY FEASIBILITY CONSTRAINTS Preferred Behavior: • Pbus: Getin(bus, source), Buyticket(bus), Getout(bus, dest) 1 • Ptrain: Buyticket(train), Getin(train, source), Getout(train, dest) 4 • Pplane: Buyticket(plane), Getin(plane, source), Getout(plane, dest) 12 Train tickets are too expensive for me. Maybe I should just take bus. I prefer travel by train. • Obfuscated Behavior: • Pbus: Getin(bus, source), Buyticket(bus), Getout(bus, dest) 6 • Ptrain: Buyticket(train), Getin(train, source), Getout(train, dest) 5 • Pplane: Buyticket(plane), Getin(plane, source), Getout(plane, dest)3

  3. LEARNING USER PLAN PREFERENCES OBFUSCATED BY FEASIBILITY CONSTRAINTS Rescale observed plans Undo the filtering caused by feasibility constraints Base learner Acquires true user preferences based on adjusted plan frequencies Rescaled Plans: Pplane * 12 Ptrain * 4 Pbus * 1 Input Plans: Pplane * 3 Ptrain * 5 Pbus * 6 User Preference Model Base Learner IJCAI ‘09

  4. RESCALE OBSERVED PLANS Situation All: Pplane * 12 Ptrain * 4 Pbus * 1 Situation 1: Ptrain Situation 1: Ptrain Situation 2: Ptrain Situation 1: Ptrain Situation 2: Ptrain Situation 2: Ptrain Situation 1: Ptrain Situation 2: Ptrain Situation 1: Pplane * 3 Ptrain * 1 Situation 2: Ptrain * 4 Pbus * 1 Situation 3: Pbus * 5 Situation 2: Pbus Situation 3: Pbus Situation 3: Pbus Situation 3: Pbus Situation 3: Pbus Situation 3: Pbus Clustering Transitive Closure

  5. EVALUATION • Ideal: User studies (too hard) • Our approach: • Assume H* represents user preferences • Generate “worst case” random solution plans using H* (–H*  Sol) • Pick the selected plan using H* (Sol O) • From O, learn HlOusing the original algorithm, HlEusing the extended algorithm (O  HlO,O  HlE ) • Compare HlOand HlE • Randomly generate plan pairs • Ask HlOand HlE to pick the preferred plan • Use H* to check whether the answer is correct or not H* S1: P1, S2: P2, … Sn: Pn Learner Hl H*

  6. RATE OF LEARNING AND SIZE DEPENDENCE Randomly Generated Domains Rate of Learning Size Dependence • Extended algorithm captures nearly full user preferences, with increasing training data. • Original algorithm performs slightly worse than random chance. Extended algorithm outperforms original algorithm with various domain sizes

  7. “BENCHMARK” DOMAINS Logistics Planning Gold Miner • H*: • Move by plane or truck • Prefer plane • Prefer fewer steps • Score: • No rescaling: 0.342 • Rescaling: 0.847 • H*: • Get the laser cannon • Shoot rock until adjacent to gold • Get a bomb • Use the bomb to remove last wall • Score: • No rescaling: 0.605 • Rescaling: 0.706

  8. CONCLUSIONS Learn user plan preferences obfuscated by feasibility constraints Adjust the observed frequencies of plans to fit user’s true preference Evaluate predictive power Use “worst case” model Show rescaling before learning is significantly more effective

  9. LEARNING USER PLAN PREFERENCES Hitchhike? No way! • Pbus: Getin(bus, source), Buyticket(bus), Getout(bus, dest) 2 • Ptrain: Buyticket(train), Getin(train, source), Getout(train, dest) 8 • Phike: Hitchhike(source, dest) 0

  10. TWO TALES OF HTN PLANNING • Abstraction • Efficiency • Top-down • Preference handling • Quality • Bottom-up How should Learning proceed? Most existing work Our work

  11. LEARNING USER PLAN PREFERENCES AS pHTNs • Given a set O of plans executed by the user • Find a generative model, Hl Hl = argmaxH p (O |H) Probabilistic Hierarchical Task Networks (pHTNs) S  0.2, A1 B1 S  0.8, A2 B2 B1  1.0, A2 A3 B2  1.0, A1 A3 A1  1.0, Getin A2  1.0, Buyticket A3  1.0, Getout

  12. LEARNING pHTNs • HTNs can be seen as providing a grammar of desired solutions • Actions  Words • Plans  Sentences • HTNs  Grammar • HTN learning  Grammar induction • pHTN learning by probabilistic context free grammar (pCFG) induction • Assumptions: parameter-less, unconditional S  0.2, A1 B1 S  0.8, A2 B2 B1  1.0, A2 A3 B2  1.0, A1 A3 A1  1.0, Getin A2  1.0, Buyticket A3  1.0, Getout

  13. A TWO-STEP ALGORITHM • Greedy Structure Hypothesizer: • Hypothesizes the schema structure • Expectation-Maximization (EM) Phase: • Refines schema probabilities • Removes redundant schemas Generalizes Inside-Outside Algorithm (Lary & Young, 1990)

  14. GREEDY STRUCTURE HYPOTHESIZER • Structure learning • Bottom-up • Prefer recursive to non-recursive

  15. EM PHASE • E Step: • Plan parse tree computation • Most probable parse tree • M Step: • Selection probabilities update • s: ai p, aj ak

More Related