1 / 33

Inductive Transfer and Lifelong Learning with Context-sensitive Neural Networks

Inductive Transfer and Lifelong Learning with Context-sensitive Neural Networks. Danny Silver, Ryan Poirier, & Duane Currie Acadia University, Wolfville, NS, Canada danny.silver@acadiau.ca. Outline. Machine Lifelong Learning (ML3) and Inductive Transfer Multiple Task Learning (MTL)

Download Presentation

Inductive Transfer and Lifelong Learning with Context-sensitive Neural Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inductive Transfer and Lifelong Learning with Context-sensitive Neural Networks Danny Silver, Ryan Poirier, & Duane Currie Acadia University, Wolfville, NS, Canada danny.silver@acadiau.ca

  2. Outline • Machine Lifelong Learning (ML3) and Inductive Transfer • Multiple Task Learning (MTL) and its Limitations • csMTL – context sensitive MTL • Empirical Studies of csMTL • Conclusions and Future Work

  3. Machine Lifelong Learning (ML3) • Considers methods of retaining and using learned knowledge to improve the effectiveness and efficiency of future learning [Thrun97] • We investigate systems that must learn: • From impoverished training sets • For diverse domains of related/unrelated tasks • Where practice of the same task is possible • Applications: IA, User Modeling, Robotics, DM

  4. Domain Knowledge long-term memory Retention & Consolidation Knowledge Transfer Inductive Bias Selection Knowledge-Based Inductive Learning: An ML3Framework Testing Examples Instance Space X (x, f(x)) Model of Classifier h Inductive Learning System short-term memory Training Examples h(x) ~ f(x)

  5. Domain Knowledge long-term memory Retention & Consolidation Knowledge Transfer Inductive Bias Selection fk(x) f1(x) f2(x) x1 xn Knowledge-Based Inductive Learning: An ML3Framework Testing Examples Instance Space X (x, f(x)) Model of Classifier h Training Examples Multiple Task Learning (MTL) h(x) ~ f(x)

  6. fk(x) f1(x) f2(x) Task specific representation Common feature layer Common internal Representation x1 xn Multiple Task Learning (MTL) • Multiple hypotheses develop in parallel within one back-propagation network [Caruana, Baxter 93-95] • An inductive bias occurs through shared use of common internal representation • Knowledge or Inductive transfer to primary task f1 (x) depends on choice of secondary tasks

  7. Single Task Learning (STL) y=f(x) x1 xn

  8. fk(x) f1(x) f2(x) Task specific representation Common feature layer Common internal Representation [Caruana, Baxter] x1 xn Multiple Task Learning (MTL)

  9. Consolidation and Transfer via MTL & Task Rehearsal Rehearsal of virtual examples for y1 -y5 ensures knowledge retention Virtual examples from related task sessions for knowledge transfer f1(x) y2 y3 y4 y5 y6 • Lots of internal representation • Rich set of virtual training examples • Small learning rate = slow learning • Validation set to prevent growth of high magnitude weights [Poirier04] Long-term Consolidated Domain Knowledge I0 In f1(x) y2 y3 y5 Virtual Examples of T0for Long-term Consolidation I0 In Short Term Learning Network

  10. Research Software

  11. Lifelong Learning with MTL Band Domain Mean Percent Misclass. Logic Domain Coronary Artery Disease A B C D

  12. x = weather data f(x) = flow rate MTL – A Recent Example Stream flow rate prediction [Lisa Gaudette, 2006]

  13. fk(x) f1(x) f2(x) Task specific representation Common feature layer Common internal Representation [Caruana, Baxter] x1 xn Limitations of MTL for ML3 • Problems with multiple outputs: • Training examples must have matching target values • Redundant representation • Frustrates practice of a task • Prevents a fluid development of domain knowledge • No way to naturally associate examples with tasks • Inductive transfer limited to sharing of hidden node weights • Inductive transfer relies on selecting related secondary tasks

  14. One output for all tasks y’=f’(c,x) c1 ck x1 xn Context Inputs c Primary Inputs x Context Sensitive MTL (csMTL) • Recently developed an alternative approach that is meant to overcome these limitations: • Uses a single output neural network structure • Context inputs associate an example with a task • All weights are shared - focus shifts from learning separate tasks to learning a domain of tasks • No measure of task relatedness is required

  15. Primary Inputs x Context Sensitive MTL (csMTL) One output for all tasks y’=f’(c,x) c1 ck x1 xn Context Inputs c

  16. One output for all tasks y’=f’(c,x) c1 ck x1 xn Context Inputs c Primary Inputs x Context Sensitive MTL (csMTL) Recently, have shown that csMTL has two important constraints: • Context and bias weights • Context and output weights VC(csMTL) < VC(MTL) k j

  17. T3 T1 T0 T2 T4 T6 T5 Band of positive examples csMTL Empirical StudiesTask Domains • Band • 7 tasks, 2 primary inputs • Logic T0 = (x1 > 0.5  x2 > 0.5)  (x3 > 0.5  x4 > 0.5) • 6 tasks, 10 primary inputs • fMRI • 2 tasks, 24 primary inputs

  18. csMTL Empirical StudiesMethod Objective: To compare hypotheses developed by csMTL to STL, MTL & MTL • Three layer networks used for all methods: • Example: Band Domain • STL: 2 inputs, 30 hidden nodes, 1 output, or 2-30-1 architecture • MTL and MTL: 2-30-7 architecture • csMTL: 9-30-1 (2 primary, 7 context inputs) • Number of training examples for main/secondary tasks: Band – 10/50, Logic – 20/50, fMRI – 48/48 • Tuning set is used to prevent over-fitting • Band – 10, Logic – 15, fMRI - 8

  19. csMTL Empirical StudiesResults

  20. csMTL Empirical StudiesResults (Logic domain)

  21. csMTL Empirical StudiesResults (2 more domains)

  22. y’ c1 ck x1 xn Why is csMTL doing so well? • Consider two unrelated tasks: • From a task relatedness perspective - correlation or mutual information over all examples is 0 • From an example by example perspective - 50% of examples have matching target values • csMTL transfers knowledge at the example level • Greater sharing of representation

  23. csMTL Results – SameTask • Learn primary task with transfer from 5 secondary tasks • 20 training examples per tasks, all examples drawn from same function f’(c,x) f1(x) f2(x) f5(x) c1 c5 x1 x10 x1 x10 MTL csMTL

  24. csMTL Results – SameTask • Learn primary task with transfer from 5 secondary tasks • 20 training examples per tasks, all examples drawn from same function

  25. One output for all tasks y’ c1 ck x1 xn Context Inputs Primary Inputs Measure of Task Relatedness? Early conjecture:Context to hidden node weight vectors can be used to measure task relatedness Not true:Two hypotheses for the same examples can develop that • have equivalent function • use different representation Transfer is functional in nature.

  26. Conclusions • csMTL is a method of inductive transfer using multiple tasks: • Single task output, additional context inputs • Shifts focus to learning a continuous domain of tasks • Eliminates redundant task representation (multiple outputs) • Empirical studies: • csMTL performs inductive transfer at or above level of MTL • Without measure of relatedness • A machine life-long learning (ML3) system based on two csMTL networks is also proposed in paper

  27. Future Work • Relationship between theory of Hints [Abu-Mostafa], secondary tasks (inductive bias, VCD) • Conditions under which csMTL ANNs succeed / fail • Exploring domains with real-valued context inputs • Will csMTL work with other ML methods? • Develop and test csMTL ML3 system

  28. csMTL Using IDT (Logic Domain)

  29. csMTL Using kNN (Logic Domain, k=5)

  30. f1(c,x) Short-term Learning Network Representational transfer from CDK for rapid learning Task Context Standard Inputs A ML3 based on csMTL One output for all tasks Functional transfer (virtual examples) for slow consolidation f’(c,x) Long-term Consolidated Domain Knowledge Network c1 ck x1 xn

  31. Benefits of csMTL ML3: • Long-term Consolidation … • Effective retention (all tasks in DK net improve) • Efficient retention (redundancy eliminated) • Meta-knowledge collection (context cues) • Short-term Learning … • Effective learning (inductive transfer) • Efficient learning (representation + function) • Transfer / training examples used appropriately

  32. Limitations of csMTL ML3 • Consolidation is space and time complex: • Rehearsal of all tasks means lots of virtual training examples required • Back-propagation of error computational complexity = O(W3); where W = # of weights

  33. Thank You! • danny.silver@acadiau.ca • http://plato.acadiau.ca/courses/comp/dsilver/ • http://birdcage.acadiau.ca:8080/ml3/

More Related