1 / 37

The Consolidation of Task Knowledge for Lifelong Machine Learning

The Consolidation of Task Knowledge for Lifelong Machine Learning. Daniel L. Silver … in collaboration with Ryan Poirier, Robert O’Quinn, Duane Currie , Rick Alisch , Peter McCracken, and Ben Fowler Acadia University, Wolfville , NS, Canada danny.silver@acadiau.ca. Domain

sezja
Download Presentation

The Consolidation of Task Knowledge for Lifelong Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Consolidation of Task Knowledge for Lifelong Machine Learning Daniel L. Silver … in collaboration with Ryan Poirier, Robert O’Quinn, Duane Currie, Rick Alisch, Peter McCracken,and Ben Fowler Acadia University, Wolfville, NS, Canada danny.silver@acadiau.ca

  2. Domain Knowledge long-term memory Retention Knowledge Transfer Inductive Bias Selection Lifelong Machine LearningRetention and Transfer Testing Examples Instance Space X (x, f(x)) Model of Classifier h Inductive Learning System short-term memory Training Examples h(x) ~ f(x)

  3. Knowledge Transfer Inductive Bias Selection f5(x) f1(x) f2(x) x1 xn Lifelong Machine Learning with MTL and Task Reheasal Testing Examples Instance Space X f1(x) f5(x) fk(x) f2(x) Individual Models Domain Knowledge long-term memory Retention (x, f(x)) Model of Classifier h Training Examples Multiple Task Learning (MTL) h(x) ~ f(x)

  4. Lifelong Learning with MTL Band Domain Mean Percent Misclass. Logic Domain Coronary Artery Disease A B C D

  5. AI’2004 Why Consolidated? Retention is necessary but not sufficient • Effective Retention • Maintain or increase accuracy of all tasks • Facilitate practice of tasks over time • Efficient Retention • Space – Mitigate redundant representation • Time – Must scale with number of tasks

  6. AI’2004 Why Consolidated? Retention is necessary but not sufficient • Effective Retrieval • Must support a method of explicitly or implicitly selecting the most related prior knowledge • Efficient Retrieval • Representational transfer preferred • Selecting prior knowledge must scale with number of tasks retained

  7. Major Problem • Consolidation requires overcoming the stability-plasticity problem • the challenge of adding new information to a system without the loss of prior information (Grossberg 1987).

  8. Knowledge Transfer Inductive Bias Selection f5(x) f1(x) f2(x) x1 xn Lifelong Machine Learning with MTL and Task Reheasal Testing Examples Instance Space X fk(x) f2(x) f5(x) f1(x) Individual Models Retention (x, f(x)) Model of Classifier h Training Examples Multiple Task Learning (MTL) h(x) ~ f(x)

  9. Knowledge Transfer Inductive Bias Selection f5(x) f1(x) f2(x) x1 xn Lifelong Machine Learning Consolidation and Transfer (Silver and McCracken 2003) enfluenced by (McClelland, McNaughton, O’Reilly 1994) Testing Examples Instance Space X … fk(x) f2(x) f3(x) f9(x) f1(x) Consolidated MTL Consolidation (x, f(x)) Model of Classifier h Training Examples Multiple Task Learning (MTL) h(x) ~ f(x)

  10. Lifelong Machine Learning via Consolidation and Transfer Rehearsal of virtual examples for f2 –f6 ensures knowledge retention Virtual examples from related prior tasks for knowledge transfer f1(x) f2 f3 f4 f5 f6 • Four Requirements: • (Silver and Poirer 2004) • Lots of internal representation • Rich set of virtual training examples • Small learning rate = slow learning • Method to prevent growth of high magnitude weights [Poirier04] Long-term Consolidated Domain Knowledge x1 xn f1(x) f2 f3 f5 Virtual Examples of f1(x) for Long-term Consolidation x1 xn Short Term Learning Network

  11. Empirical Findings Consolidated task accuracy as new tasks added

  12. Effect of Curriculum (Poirier and Silver 2005) • Finding: curriculum matters per task, mean task accuracy converges as number of tasks increase • Considered two curricula, synthetic domain of tasks: • rapid feature capture • gradual feature capture • Mean classification accuracy versus • the number of tasks for eight curricula.

  13. Continued Practice of Tasks • LML must alternate between model training and application on tasks that vary in relatedness • Interested in conditions under which practice leads to accurate consolidated hypotheses • Factors considered: • Number of practice sessions • Number of examples per practice session • Mix of related/unrelated tasks being practiced • Sequence of practice (curriculum)

  14. Lifelong Machine Learning via Consolidation and Transfer Rehearsal of virtual examples for f2 –f6 ensures knowledge retention Virtual examples from related prior tasks for knowledge transfer f1(x) f2 f3 f4 f5 f6 Long-term Consolidated Domain Knowledge • Study: • Practiced two tasks (T0, T1) that shared no features • Varied number of practice sessions • Varied size of practice example sets x1 xn f1(x) f2 f3 f5 Virtual Examples of f1(x) for Long-term Consolidation x1 xn (O’Quinn, Silver, and Poirier 2005) Short Term Learning Network

  15. Practice of T0 and T110 sets of 50 examples Pro: Continue improvement through practice occurs Acc. Con: Loss of accuracy during consolidation Problem: Multiple outputs Number of training examples practiced

  16. One output for all tasks y’=f’(c,x) c1 ck x1 xn Context Inputs c Primary Inputs x Context Sensitive MTL (csMTL) Overcomes limitations of standard MTL for long-term consolidation of tasks: • Eliminates redundant outputs for the same task • Facilitates accumulation of knowledge through practice • Examples can be associated with tasks directly by the environment • Develops a fluid domain of task • knowledge index by the context inputs • Acommodatestasks that have • multiple outputs (Silver, Poirier and Currie, 2008)

  17. f1(c,x) Short-term Learning Network Representational transfer from CDK for rapid learning Task Context Standard Inputs A LML based on csMTL Stability-Plasticity Problem One output for all tasks Functional transfer (virtual examples) for consolidation f’(c,x) Long-term Consolidated Domain Knowledge Network c1 ck x1 xn Work with Ben Fowler, 2010

  18. Consolidation based on csMTLAccuracy vsnumber of virtual examples (Fowler and Silver 2011)

  19. Consolidation based on csMTLVirtual Instances vs Real Instances Recent efforts have focused on ways to betterresults by improving the method of task rehearsal of prior tasks. 36

  20. Thank You! • danny.silver@acadiau.ca • http://plato.acadiau.ca/courses/comp/dsilver/ • http://LML.acadiau.ca

  21. EXTRA SLIDES

  22. Domain Knowledge long-term memory Retention & Consolidation Knowledge Transfer Inductive Bias Selection Lifelong Machine Learning Non-consolidated Knowledge Testing Examples Instance Space X (x, f(x)) Model of Classifier h Inductive Learning System short-term memory Training Examples h(x) ~ f(x)

  23. Effect of Curriculum • Mean classification accuracy versus the number of tasks for three curricula. • Finding: Curriculum matters per task (Poirier and Silver 2005)

  24. Knowledge Transfer Inductive Bias Selection f5(x) f1(x) f2(x) x1 xn Lifelong Machine Learning Consolidation and Transfer Testing Examples Instance Space X … fk(x) f2(x) f3(x) f9(x) Domain Knowledge long-term memory Consolidated MTL Consolidation (x, f(x)) Model of Classifier h Training Examples Multiple Task Learning (MTL) h(x) ~ f(x)

  25. x = weather data f(x) = flow rate An Environmental Example Stream flow rate prediction [Lisa Gaudette, 2006]

  26. fk(x) f1(x) f2(x) Task specific representation Common feature layer Common internal Representation [Caruana, Baxter] x1 xn Limitations of MTL for LML • Problems with multiple outputs: • Training examples must have matching target values • Redundant representation • Frustrates practice of a task • Prevents a fluid development of domain knowledge • No way to naturally associate examples with tasks • Inductive transfer limited to sharing of hidden node weights • Inductive transfer relies on selecting related secondary tasks

  27. csMTL and Tasks with Multiple Outputs • Liangliang Tu (2010) • Image Morphing: Inductive transfer between tasks that have multiple outputs • Transforms 30x30 grey scale images using inductive transfer

  28. csMTL and Tasks with Multiple Outputs

  29. csMTL and Tasks with Multiple Outputs Demo

  30. Challenges and Benefits • Stability-Plasticity problem - How do we integrate new knowledge in with old? • No loss of new knowledge • No loss or prior knowledge • Efficient methods of storage and recall • LML methods that can efficiently and effectively retain learned knowledge will suggest approaches to “common knowledge” representation – a “Big AI” problem

  31. Challenges and Benefits • Practice makes perfect ! • An LML system must be capable of learning from examples of tasks over a lifetime • Practice should increase model accuracy and overall domain knowledge • How can this be done? • Research important to AI, Psych, and Education

  32. Limitations of csMTL ML3 • Consolidation is space and time complex: • Rehearsal of all tasks means lots of virtual training examples required • Back-propagation of error computational complexity = O(W3); where W = # of weights

  33. One output for all tasks y’ c1 ck x1 xn Context Inputs Primary Inputs Context Sensitive MTL (csMTL) We are currently investigating the theory of Hints [Abu-Mostafa 93-96] for formalization of: • How each task of the domain can be seen as a Hint task for learning the domain of tasks • How the VC dimension for learning a particular task, fk(c,x), is reduced by learning others

  34. Requirements for a ML3 System:Req. for Long-term Retention … • Effective Retention • Resist introduction and accumulation of error • Retention of new task knowledge should improve related prior task knowledge (practice should improve performance) • Efficient Retention • Minimize redundant use of memory via consolidation • Meta-knowledge Collection • e.g. Example distribution over the input space • Ensures Effective and Efficient Indexing • Selection of related prior knowledge for inductive bias should be accurate and rapid

  35. Requirements for a ML3 System:Req. for Short-term Learning … • Effective (transfer) Learning • New learning should benefit from related prior task knowledge • ML3 hypotheses should meet or exceed accuracy of those hypotheses developed without benefit of transfer • Efficient (transfer) Learning • Transfer should reduce training time • Increase in space complexity should be minimized • Transfer versus Training Examples • Must weigh relevance and accuracy of prior knowledge, against • Number and accuracy of available training examples

  36. Benefits of csMTL ML3: Long-term Consolidation … • Effective Retention • Rehearsal overcomes stability-plasticity problem [Robins95] • Increases accuracy of all related tasks [Silver04] • Facilitates practice of same task [O’Quinn05] • Efficient Retention • Eliminates redundant, inaccurate, older hypotheses [Silver04] • Meta-knowledge Collection • Focus is on learning a continuous domain of tasks • Changes in context inputs selects task domain knowledge • Ensures Effective and Efficient Indexing • Conjecture: Prior knowledge selection is made implicitly by training examples. Indexing occurs as connection weights between long-term and short-term are learned.

  37. Benefits of csMTL ML3: Short-term Learning … • Effective (transfer) Learning • Accurate inductive bias via transfer from long-term net • A measure of task relatedness is not required • Efficient (transfer) Learning • Rapid inductive bias via transfer from long-term net • Short-term network weights are reusable • Transfer versus Training Examples • If new task previously learned, weights between long-term and short-term network are quickly learned • If new task is different but related to prior task most appropriate features from long-term network will be selected • If new task is unrelated to prior tasks, the supplemental hidden nodes will developed needed features.

More Related