Iteration Learning by Demonstration

Iteration Learning by Demonstration Thomas J. Lee Steven Eker, Melinda Gervasio SRI International

generalization dataflow completion execution LAPDOG Overview • learning from demonstration • "Watch me…" generalized program Task Registry program actions demonstration(s) USER Instrumented App(s) program execution Task Manager results primitive execution results

CPOF WebTAS • Command Post of the Future: a collaborative system for sharing and visualizing data • 1500+ systems online and growing Purpose: automate routine tasks; share best practices Use Cases: • Workspace Configuration and Monitoring • Emergency Response Procedures • Mission Rehearsal and Briefing • GOTS suite of data analysis, data mining, and visualization tools • Installed on over 2000 workstations in theater (Iraq, Afghanistan) Purpose: improve response time; capture experience/expertise Use Cases: • Route analysis: 8 mins → 8 secs • Continuous IPB • Web data integration

Demonstration parameter generalization dataflow completion A(-$X) B(-$Y) C(+$X +$Y–$Z) D(+first($Z) -$U) E(+last($Z) -$V) F(-$W) G(+list($U $V $W)) A(-a) B(-m) C(+a +m–[b c d]) D(+b -j) E(+d -k) F(–l) G(+[j k l]) A1 A2 A1 A2 Ak unsupported input inserted actions Aj perform dynamic programming search in space of information-producing actions; search over KB relations find all unifications/variablizations across values and functional expressions over values structure generalization alternative completion paths ABCDBCDBCDBCDE A(BCD)*E heuristic filtering • induce all possible loops: • over sets or lists • over multiple lists in parallel • over functional expns over lists • generating lists Hypothesis Space prefer fewer targets, common paths; remove redundant paths, longcuts, repeated subpaths prefer shorter procedures, direct supports, existing supports, closest support heuristic selection LearnedProcedure

Parameter Generalization Action Model Convert(+Infile, +Format, -Outfile) Delete(+File) GetCreationDate(+File, -Date) List(+Directory,-FileList)

Parameter Generalization Action Model Convert(+Infile, +Format, -Outfile) Delete(+File) GetCreationDate(+File, -Date) List(+Directory,-FileList) Demonstration • Convert(+“manual.pdf”, +“HTML”, -“manual.html”) • GetCreationDate(+”manual.html”, -“2009-03-23”)

Parameter Generalization Action Model Convert(+Infile, +Format, -Outfile) Delete(+File) GetCreationDate(+File, -Date) List(+Directory,-FileList) Demonstration Generalization • Convert(+“manual.pdf”, +“HTML”, -“manual.html”) • GetCreationDate(+”manual.html”, -“2009-03-23”) • Convert(Infile, “HTML”, File) • GetCreationDate(File, Date)

Parameter Generalization Action Model Convert(+Infile, +Format, -Outfile) Delete(+File) GetCreationDate(+File, -Date) List(+Directory,-FileList) Demonstration Generalization • Convert(+“manual.pdf”, +“HTML”, -“manual.html”) • GetCreationDate(+”manual.html”, -“2009-03-23”) • Convert(Infile, “HTML”, File) • GetCreationDate(File, Date) • List(+“doc”, -[“manual.pdf”,”other.pdf”]) • Convert(+“manual.pdf”, +“HTML”, -“manual.html”) • GetCreationDate(+”manual.html”, -“2009-03-23”) • List(+”doc”,-List) • Convert(first(List),“HTML”,File) • GetCreationDate(File, Date)

demonstration: A(-[1 2 3]) B(-[a b c]) C(+a, +1, -”a1”) C(+b, +2, -”b2”) C(+c, +3, -”c3”) D(+[“a1” “b2” “c3”]) generalization: A(-X) B(-Y) for U in X, V in Y building Z do C(+U,+V,-W) W accumulate Z od D(+Z) Structure Generalization • Learn iterations over(only) collections (lists or sets) • Learn multiple sequential and/or parallel • (but not nested) iterations

Simple example: A(-k) B(-k) C(+k) With expression: A(-[l m n], -k) B(+k, -l) C(+l, +l) C(+l, +m) C(+l, +n) Generalization: A(-U) B(-V) C(+{U,V}) Generalization: A(-W, -X) B(+X, -Y) for Z in W do C(+Y,first(W)},+Z) od Support Ambiguity

Example (1) Idea: Leverage known collections to find loop A(-[a,e],-g) B(-[a,b,c,d]) C(+g) D(+a,-h) ... use of an element of a collection → anchor for potential iteration

Example (2) Idea: Leverage known collections to find loop A(-[a,e],-g) B(-[a,b,c,d]) C(+g) D(+a,-h) E(+h) C(+g) D(+b,-i) ... remove invalidated support hypothesis use of another element of the collection → potential iteration boundaries identified

Example (3) Idea: Leverage known collections to find loop A(-[a,e],-g) B(-[a,b,c,d]) C(+g) D(+a,-h) E(+h) C(+g) D(+b,-i) E(+i) C(+g) ... adjacent second sequence with matching actions and inputs → potential loop

Example (4) Idea: Leverage known collections to find loop A(-[a,e],-g) B(-[a,b,c,d]) C(+g) D(+a,-h) E(+h) C(+g) D(+b,-i) E(+i) C(+g) D(+c,-j) E(+j) C(+g) D(+d,-k) E(+k) F(+[h,i,j,k]) matching iterations for remaining elements → loop remove invalidated iteration hypothesis A(-L1,-W) B(-L2) for X in L2 building L3 do C(+W) D(+X,-Y) E(+Y) Y accumulate L3 od F(+L3)

Alternatives for Iteration Learning Container identification • Infer implicit C by seeing • all of its elements • some of its elements • Require user to specify C • Relax requirement of explicit container Reduce number of demonstrations of body • Use surrogate (smaller) container C’: |C’| < |C| • Proactive iteration completion (Eager 1991) Encapsulate body into a subprogram • Demonstrate |C| simple bodies (|B’| = 1) • Specify the loop manually No loop learning – wrap program in loop at runtime

Demonstration vs. Specification LAPDOG uses pure PBD approach • Advantages: • Intuitive to wide range of users • User performs normal workflow using familiar application • No knowledge of programming required • Programmers needn’t switch to “programmer mentality” • Disadvantages: • Long and/or multiple demonstrations required • Specification can be concise • Intent may be known only to the user Many opportunities exist for eliciting guidance from the user

Summary of Loop Learning Capabilities LAPDOG can learn • loops (not nested) over sets or lists • multiple sequential or parallel loops over lists • multiple sequential loops over sets • to generate list outputs, usable by subsequent loops • from a single or multiple examples Simultaneous parameter and structure generalization → LAPDOG can further learn • loops with expressions over arbitrarily structured data • to support subsequent inputs with loop outputs • loop and non-loop alternatives for an action sequence

Iteration Learning by Demonstration

Iteration Learning by Demonstration

Presentation Transcript

Iteration

Learning by Demonstration for the Masses

Iteration

Iteration

Iteration

Programming by Demonstration

Confidence Based Autonomy: Policy Learning by Demonstration

Iteration

Iteration

ITERATION

Iteration

Demonstration by Gypsies

Reinforcement Learning in MDPs by Lease-Square Policy Iteration

Iteration

Iteration

Python - Iteration Iteration

ITERATION

Solution of Equations by Iteration

Iteration

Iteration

Perceptron Learning Demonstration