Inductive Transfer and Lifelong Learning with Context-sensitive Neural Networks

Inductive Transfer and Lifelong Learning with Context-sensitive Neural Networks Danny Silver, Ryan Poirier, & Duane Currie Acadia University, Wolfville, NS, Canada danny.silver@acadiau.ca

Outline • Machine Lifelong Learning (ML3) and Inductive Transfer • Multiple Task Learning (MTL) and its Limitations • csMTL – context sensitive MTL • Empirical Studies of csMTL • Conclusions and Future Work

Machine Lifelong Learning (ML3) • Considers methods of retaining and using learned knowledge to improve the effectiveness and efficiency of future learning [Thrun97] • We investigate systems that must learn: • From impoverished training sets • For diverse domains of related/unrelated tasks • Where practice of the same task is possible • Applications: IA, User Modeling, Robotics, DM

Domain Knowledge long-term memory Retention & Consolidation Knowledge Transfer Inductive Bias Selection Knowledge-Based Inductive Learning: An ML3Framework Testing Examples Instance Space X (x, f(x)) Model of Classifier h Inductive Learning System short-term memory Training Examples h(x) ~ f(x)

Domain Knowledge long-term memory Retention & Consolidation Knowledge Transfer Inductive Bias Selection fk(x) f1(x) f2(x) x1 xn Knowledge-Based Inductive Learning: An ML3Framework Testing Examples Instance Space X (x, f(x)) Model of Classifier h Training Examples Multiple Task Learning (MTL) h(x) ~ f(x)

fk(x) f1(x) f2(x) Task specific representation Common feature layer Common internal Representation x1 xn Multiple Task Learning (MTL) • Multiple hypotheses develop in parallel within one back-propagation network [Caruana, Baxter 93-95] • An inductive bias occurs through shared use of common internal representation • Knowledge or Inductive transfer to primary task f1 (x) depends on choice of secondary tasks

Single Task Learning (STL) y=f(x) x1 xn

fk(x) f1(x) f2(x) Task specific representation Common feature layer Common internal Representation [Caruana, Baxter] x1 xn Multiple Task Learning (MTL)

Consolidation and Transfer via MTL & Task Rehearsal Rehearsal of virtual examples for y1 -y5 ensures knowledge retention Virtual examples from related task sessions for knowledge transfer f1(x) y2 y3 y4 y5 y6 • Lots of internal representation • Rich set of virtual training examples • Small learning rate = slow learning • Validation set to prevent growth of high magnitude weights [Poirier04] Long-term Consolidated Domain Knowledge I0 In f1(x) y2 y3 y5 Virtual Examples of T0for Long-term Consolidation I0 In Short Term Learning Network

Research Software

Lifelong Learning with MTL Band Domain Mean Percent Misclass. Logic Domain Coronary Artery Disease A B C D

x = weather data f(x) = flow rate MTL – A Recent Example Stream flow rate prediction [Lisa Gaudette, 2006]

fk(x) f1(x) f2(x) Task specific representation Common feature layer Common internal Representation [Caruana, Baxter] x1 xn Limitations of MTL for ML3 • Problems with multiple outputs: • Training examples must have matching target values • Redundant representation • Frustrates practice of a task • Prevents a fluid development of domain knowledge • No way to naturally associate examples with tasks • Inductive transfer limited to sharing of hidden node weights • Inductive transfer relies on selecting related secondary tasks

One output for all tasks y’=f’(c,x) c1 ck x1 xn Context Inputs c Primary Inputs x Context Sensitive MTL (csMTL) • Recently developed an alternative approach that is meant to overcome these limitations: • Uses a single output neural network structure • Context inputs associate an example with a task • All weights are shared - focus shifts from learning separate tasks to learning a domain of tasks • No measure of task relatedness is required

Primary Inputs x Context Sensitive MTL (csMTL) One output for all tasks y’=f’(c,x) c1 ck x1 xn Context Inputs c

One output for all tasks y’=f’(c,x) c1 ck x1 xn Context Inputs c Primary Inputs x Context Sensitive MTL (csMTL) Recently, have shown that csMTL has two important constraints: • Context and bias weights • Context and output weights VC(csMTL) < VC(MTL) k j

T3 T1 T0 T2 T4 T6 T5 Band of positive examples csMTL Empirical StudiesTask Domains • Band • 7 tasks, 2 primary inputs • Logic T0 = (x1 > 0.5  x2 > 0.5)  (x3 > 0.5  x4 > 0.5) • 6 tasks, 10 primary inputs • fMRI • 2 tasks, 24 primary inputs

csMTL Empirical StudiesMethod Objective: To compare hypotheses developed by csMTL to STL, MTL & MTL • Three layer networks used for all methods: • Example: Band Domain • STL: 2 inputs, 30 hidden nodes, 1 output, or 2-30-1 architecture • MTL and MTL: 2-30-7 architecture • csMTL: 9-30-1 (2 primary, 7 context inputs) • Number of training examples for main/secondary tasks: Band – 10/50, Logic – 20/50, fMRI – 48/48 • Tuning set is used to prevent over-fitting • Band – 10, Logic – 15, fMRI - 8

csMTL Empirical StudiesResults

csMTL Empirical StudiesResults (Logic domain)

csMTL Empirical StudiesResults (2 more domains)

y’ c1 ck x1 xn Why is csMTL doing so well? • Consider two unrelated tasks: • From a task relatedness perspective - correlation or mutual information over all examples is 0 • From an example by example perspective - 50% of examples have matching target values • csMTL transfers knowledge at the example level • Greater sharing of representation

csMTL Results – SameTask • Learn primary task with transfer from 5 secondary tasks • 20 training examples per tasks, all examples drawn from same function f’(c,x) f1(x) f2(x) f5(x) c1 c5 x1 x10 x1 x10 MTL csMTL

csMTL Results – SameTask • Learn primary task with transfer from 5 secondary tasks • 20 training examples per tasks, all examples drawn from same function

One output for all tasks y’ c1 ck x1 xn Context Inputs Primary Inputs Measure of Task Relatedness? Early conjecture:Context to hidden node weight vectors can be used to measure task relatedness Not true:Two hypotheses for the same examples can develop that • have equivalent function • use different representation Transfer is functional in nature.

Conclusions • csMTL is a method of inductive transfer using multiple tasks: • Single task output, additional context inputs • Shifts focus to learning a continuous domain of tasks • Eliminates redundant task representation (multiple outputs) • Empirical studies: • csMTL performs inductive transfer at or above level of MTL • Without measure of relatedness • A machine life-long learning (ML3) system based on two csMTL networks is also proposed in paper

Future Work • Relationship between theory of Hints [Abu-Mostafa], secondary tasks (inductive bias, VCD) • Conditions under which csMTL ANNs succeed / fail • Exploring domains with real-valued context inputs • Will csMTL work with other ML methods? • Develop and test csMTL ML3 system

csMTL Using IDT (Logic Domain)

csMTL Using kNN (Logic Domain, k=5)

f1(c,x) Short-term Learning Network Representational transfer from CDK for rapid learning Task Context Standard Inputs A ML3 based on csMTL One output for all tasks Functional transfer (virtual examples) for slow consolidation f’(c,x) Long-term Consolidated Domain Knowledge Network c1 ck x1 xn

Benefits of csMTL ML3: • Long-term Consolidation … • Effective retention (all tasks in DK net improve) • Efficient retention (redundancy eliminated) • Meta-knowledge collection (context cues) • Short-term Learning … • Effective learning (inductive transfer) • Efficient learning (representation + function) • Transfer / training examples used appropriately

Limitations of csMTL ML3 • Consolidation is space and time complex: • Rehearsal of all tasks means lots of virtual training examples required • Back-propagation of error computational complexity = O(W3); where W = # of weights

Thank You! • danny.silver@acadiau.ca • http://plato.acadiau.ca/courses/comp/dsilver/ • http://birdcage.acadiau.ca:8080/ml3/

Inductive Transfer and Lifelong Learning with Context-sensitive Neural Networks

Inductive Transfer and Lifelong Learning with Context-sensitive Neural Networks

Presentation Transcript

Learning and Lifelong learning

Learning in Neural and Belief Networks

Inductive Power Transfer

Learning with Neural Networks

Optimization with Neural Networks

Learning Neural Networks: Perceptron and Backpropagation

Learning via Neural Networks

The Lifelong Learning Context

Machine Learning and Neural Networks

Neural Networks and Deep Learning

Learning Neural Networks (NN)

Lifelong Learning Networks (LLNs)

The European Context of Lifelong Learning

Learning Algorithm and Neural Networks

Inductive Transfer With Context-sensitive Neural Networks

Competitive Learning Neural Networks

Learning Neural Networks: Perceptron and Backpropagation

Learning Neural Networks: Perceptron and Backpropagation

Learning in Neural Networks

Learning with Perceptrons and Neural Networks

EE645 Neural Networks and Learning Theory