1 / 75

Inductive Transfer Retrospective & Review

Inductive Transfer Retrospective & Review. Rich Caruana Computer Science Department Cornell University. Inductive Transfer: a.k.a. …. Bias Learning Multitask learning Learning (Internal) Representations Learning-to-learn Lifelong learning Continual learning Speedup learning Hints

lael-potts
Download Presentation

Inductive Transfer Retrospective & Review

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inductive Transfer Retrospective & Review Rich Caruana Computer Science Department Cornell University

  2. Inductive Transfer: a.k.a. … • Bias Learning • Multitask learning • Learning (Internal) Representations • Learning-to-learn • Lifelong learning • Continual learning • Speedup learning • Hints • Hierarchical Bayes • …

  3. Rich Sutton [1994] Constructive Induction Workshop: “Everyone knows that good representations are key to 99% of good learning performance. Why then has constructive induction, the science of finding good representations, been able to make only incremental improvements in performance? People can learn amazingly fast because they bring good representations to the problem, representations they learned on previous problems. For people, then, constructive induction does make a large difference in performance. … The standard machine learning methodology is to consider a single concept to be learned. That itself is the crux of the problem… This is not the way to study constructive induction! … The standard one-concept learning task will never do this for us and must be abandoned. Instead we should look to natural learning systems, such as people, to get a better sense of the real task facing them. When we do this, I think we find the key difference that, for all practical purposes, people face not one task, but a series of tasks. The different tasks have different solutions, but they often share the same useful representations. … If you can come to the nth task with an excellent representation learned from the preceding n-1 tasks, then you can learn dramatically faster than a system that does not use constructive induction. A system without constructive induction will learn no faster on the nth task than on the 1st. …”

  4. Transfer through the Ages • 1986: Sejnowski & Rosenberg – NETtalk • 1990: Dietterich, Hild, Bakiri – ID3 vs. NETtalk • 1990: Suddarth, Kergiosen, & Holden – rule injection (ANNs) • 1990: Abu-Mostafa – hints (ANNs) • 1991: Dean Pomerleau – ALVINN output representation (ANNs) • 1991: Lorien Pratt – speedup learning (ANNs) • 1992: Sharkey & Sharkey – speedup learning (ANNs) • 1992: Mark Ring – continual learning • 1993: Rich Caruana – MTL (ANNs, KNN, DT) • 1993: Thrun & Mitchell – EBNN • 1994: Virginia de Sa – minimizing disagreement • 1994: Jonathan Baxter – representation learning (and theory) • 1994: Thrun & Mitchell – learning one more thing • 1994: J. Schmidhuber – learning how to learn learning strategies

  5. 1994: Dietterich & Bakiri: ECOC outputs • 1995: Breiman & Friedman – Curds & Whey • 1995: Sebastian Thrun – LLL (learning-to-learn, lifelong-learning) • 1996: Danny Silver – parallel transfer (ANNs) • 1996: O’Sullivan & Thrun – task clustering (KNN) • 1996: Caruana & de Sa – inputs better as outputs (ANNs) • 1997: Munro & Parmanto – committee machines (ANNs) • 1998: Blum & Mitchell – co-training • 2002: Ben-David, Gehrke, Schuller – theoretical framework • 2003: Bakker & Heskes – Bayesian MTL (and task clustering) • 2004: Tony Jebara – MTL in SVMs (feature and kernel selection) • 2004: Pontil & Micchelli – Kernels for MTL • 2004: Lawrence & Platt – MTL in GP (info vector machine) • 2005: Yu, Tresp, Schwaighofer – MTL in GP • 2005: Lia & Carin – MTL for RBF Networks

  6. A Quick Romp Through Some Stuff

  7. 1 Task vs. 2 Tasks vs. 4 Tasks

  8. STL vs. MTL Learning Curves courtesy Joseph O’Sullivan

  9. STL vs. MTL Learning Curves

  10. A Different Kind of Learning Curve

  11. A A B B D D C C E E A B D C E MTL for Bayes Net Structure Learning Yeast 1 Yeast 2 Yeast 3 • Bayes Nets for these three species overlap significantly • Learn structures from data for each species separately? No. • Learn one structure for all three species? No. • Bias learning to favor shared structure while allowing some differences? Yes -- makes most of limited data.

  12. When to Use Inductive Transfer? • multiple tasks occur naturally • using future to predict present • time series • decomposable tasks • multiple error metrics • focus of attention • different data distributions for same/similar problems • hierarchical tasks • some input features work better as outputs • …

  13. Multiple Tasks Occur Naturally • Mitchell’s Calendar Apprentice (CAP) • time-of-day (9:00am, 9:30am, ...) • day-of-week (M, T, W, ...) • duration (30min, 60min, ...) • location (Tom’s office, Dean’s office, 5409, ...)

  14. Using Future to Predict Present • medical domains • autonomous vehicles and robots • time series • stock market • economic forecasting • weather prediction • spatial series • many more

  15. Decomposable Tasks DireOutcome= ICU v Complication v Death INPUTS

  16. Focus of Attention Single-Task ALVINN Multi-Task ALVINN

  17. Different Data Distributions • Hospital 1: 50 cases, rural (Ithaca) • Hospital 2: 500 cases, mature urban (Des Moines) • Hospital 3: 1000 cases, elderly suburbs (Florida) • Hospital 4: 5000 cases, young urban (LA,SF)

  18. Some Inputs are Better as Outputs

  19. And many more uses of Xfer…

  20. A Few Issues That Arise With Xfer

  21. Issue #1: Interference

  22. Issue #1: Interference

  23. Issue #2: Task Selection/Weighting • Analogous to feature selection • Correlation between tasks • heuristic works well in practice • very suboptimal • Wrapper-based methods • expensive • benefit from single tasks can be too small to detect reliably • does not examine tasks in sets • Task weighting: MTL ≠ one model for all tasks • main task vs. all tasks • even harder than task selection • but yields best results

  24. Issue #3: Parallel vs. Serial Transfer • Where possible, use parallel transfer • All info about a task is in the training set, not necessarily a model trained on that train set • Information useful to other tasks can be lost training one task at a time • Tasks often benefit each other mutually • When serial is necessary, implement via parallel task rehearsal • Storing all experience not always feasible

  25. Issue #4: Psychological Plausibility ?

  26. Issue #5: Xfer vs. Hierarchical Bayes • Is Xfer just regularization/smoothing? • Yes and No • Yes: • Similar models for different problem instancese.g. similar stocks, data distributions, … • No: • Focus of attention • Task selection/clustering/rehearsal

  27. Issue #6: What does Related Mean? • related  helps learning (e.g., copy task)

  28. Issue #6: What does Related Mean? • related  helps learning (e.g., copy task) • helps learning  related (e.g., noise task)

  29. Issue #6: What does Related Mean? • related  helps learning (e.g., copy task) • helps learning  related (e.g., noise task) • related  correlated (e.g., A+B, A-B)

  30. Why Doesn’t Xfer Rule the Earth? • Tabula rasa learning surprisingly effective • the UCI problem

  31. Use Some Features as Outputs

  32. Why Doesn’t Xfer Rule the Earth? • Xfer opportunities abound in real problems • Somewhat easier with ANNs (and Bayes nets) • Death is in the details • Xfer often hurts more than it helps if not careful • Some important tricks counterintuitive • don’t share too much • give tasks breathing room • focus on one task at a time • Tabula rasa learning surprisingly effective • the UCI problem

  33. What Needs to be Done? • Have algs for ANN, KNN, DT, SVM, GP, BN, … • Better prescription of where to use Xfer • Public data sets • Comparison of Methods • Inductive Transfer Competition? • Task selection, task weighting, task clustering • Explicit (TC) vs. Implicit (backprop) Xfer • Theory/definition of task relatedness

  34. Kinds of Transfer • Human Expertise • Constraints • Hints (monotonicity, smoothness, …) • Parallel • Multitask Learning • Serial • Learning-To-Learn • Serial via parallel (rehearsal)

  35. Motivating Example • 4 tasks defined on eight bits B1-B8: • all tasks ignore input bits B7-B8

  36. Goals of MTL • improve predictive accuracy • not intelligibility • not learning speed • exploit “background” knowledge • applicable to many learning methods • exploit strength of current learning methods: • surprisingly good tabula rasa performance

  37. Problem 2: 1D-Doors • color camera on Xavier robot • main tasks: doorknob location and door type • 8 extra tasks (training signals collected by mouse): • doorway width • location of doorway center • location of left jamb, right jamb • location of left and right edges of door

  38. Pneumonia Risk Pneumonia Risk Age Age Gender Gender Albumin Blood pO2 RBC Count White Count Chest X-Ray Chest X-Ray Blood Pressure Blood Pressure Pre-Hospital Attributes Pre-Hospital Attributes In-Hospital Attributes Predicting Pneumonia Risk

  39. In-Hospital Attributes Pneumonia Risk Age Age Gender Gender Chest X-Ray Chest X-Ray Blood Pressure Blood Pressure Pre-Hospital Attributes Pre-Hospital Attributes Predicting Pneumonia Risk RBC Count Blood pO2 Albumin White Count Pneumonia Risk

  40. Pneumonia #1: Medis

  41. Pneumonia #1: Results -10.8% -11.8% -6.2% -6.9% -5.7%

  42. Use imputed values for missing lab tests as extra inputs?

  43. Pneumonia #1: Feature Nets

  44. Pneumonia #2: Results MTL reduces error >10%

  45. Related? • Ideal: Func (MainTask, ExtraTask, Alg) = 1 iff Alg (MainTask || ExtraTask) > Alg (MainTask) • unrealistic • try all extra tasks (or all combinations)? • need heuristics to help us find potentially useful extra tasks to use for MTL: Related Tasks

  46. Related? • related  helps learning (e.g., copy tasks)

More Related