Machine Learning: Making Computer Science Scientific - PowerPoint PPT Presentation

lotus
machine learning making computer science scientific l.
Skip this Video
Loading SlideShow in 5 Seconds..
Machine Learning: Making Computer Science Scientific PowerPoint Presentation
Download Presentation
Machine Learning: Making Computer Science Scientific

play fullscreen
1 / 64
Download Presentation
Machine Learning: Making Computer Science Scientific
346 Views
Download Presentation

Machine Learning: Making Computer Science Scientific

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Machine Learning: Making Computer Science Scientific Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 http://www.cs.orst.edu/~tgd

  2. Acknowledgements • VLSI Wafer Testing • Tony Fountain • Robot Navigation • Didac Busquets • Carles Sierra • Ramon Lopez de Mantaras • NSF grants IIS-0083292 and ITR-085836

  3. Outline • Three scenarios where standard software engineering methods fail • Machine learning methods applied to these scenarios • Fundamental questions in machine learning • Statistical thinking in computer science

  4. Scenario 1: Reading Checks Find and read “courtesy amount” on checks:

  5. Possible Methods: • Method 1: Interview humans to find out what steps they follow in reading checks • Method 2: Collect examples of checks and the correct amounts. Train a machine learning system to recognize the amounts

  6. Scenario 2: VLSI Wafer Testing • Wafer test: Functional test of each die (chip) while on the wafer

  7. Which Chips (and how many) should be tested? • Tradeoff: • Test all chips on wafer? • Avoid cost of packaging bad chips • Incur cost of testing all chips • Test none of the chips on the wafer? • May package some bad chips • No cost of testing on wafer

  8. Possible Methods • Method 1: Guess the right tradeoff point • Method 2: Learn a probabilistic model that captures the probability that each chip will be bad • Plug this model into a Bayesian decision making procedure to optimize expected profit

  9. Scenario 3: Allocating mobile robot camera Binocular No GPS

  10. Camera tradeoff • Mobile robot uses camera both for obstacle avoidance and landmark-based navigation • Tradeoff: • If camera is used only for navigation, robot collides with objects • If camera is used only for obstacle avoidance, robot gets lost

  11. Possible Methods • Method 1: Manually write a program to allocate the camera • Method 2: Experimentally learn a policy for switching between obstacle avoidance and landmark tracking

  12. Challenges for SE Methodology • Standard SE methods fail when… • System requirements are hard to collect • The system must resolve difficult tradeoffs

  13. (1) System requirements are hard to collect • There are no human experts • Cellular telephone fraud • Human experts are inarticulate • Handwriting recognition • The requirements are changing rapidly • Computer intrusion detection • Each user has different requirements • E-mail filtering

  14. (2) The system must resolve difficult tradeoffs • VLSI Wafer testing • Tradeoff point depends on probability of bad chips, relative costs of testing versus packaging • Camera Allocation for Mobile Robot • Tradeoff depends on probability of obstacles, number and quality of landmarks

  15. Machine Learning: Replacing guesswork with data • In all of these cases, the standard SE methodology requires engineers to make guesses • Guessing how to do character recognition • Guessing the tradeoff point for wafer test • Guessing the tradeoff for camera allocation • Machine Learning provides a way of making these decisions based on data

  16. Outline • Three scenarios where software engineering methods fail • Machine learning methods applied to these scenarios • Fundamental questions in machine learning • Statistical thinking in computer science

  17. Basic Machine Learning Methods • Supervised Learning • Density Estimation • Reinforcement Learning

  18. 1 0 6 3 8 Supervised Learning Training Examples New Examples Learning Algorithm Classifier 8

  19. AT&T/NCR Check Reading System Recognition transformer is a neural network trained on 500,000 examples of characters The entire system is trained given entire checks as input and dollar amounts as output LeCun, Bottou, Bengio & Haffner (1998) Gradient-Based Learning Applied to Document Recognition

  20. Check Reader Performance • 82% of machine-printed checks correctly recognized • 1% of checks incorrectly recognized • 17% “rejected” – check is presented to a person for manual reading • Fielded by NCR in June 1996; reads millions of checks per month

  21. Supervised Learning Summary • Desired classifier is a function y = f(x) • Training examples are desired input-output pairs (xi,yi)

  22. Density Estimation Training Examples Partially-tested wafer Learning Algorithm Density Estimator P(chipi is bad) = 0.42

  23. W . . . C1 C2 C3 C209 On-Wafer Testing System • Trained density estimator on 600 wafers from mature product (HP; Corvallis, OR) • Probability model is “naïve Bayes” mixture model with four components (trained with EM)

  24. One-Step Value of Information • Choose the larger of • Expected profit if we predict remaining chips, package, and re-test • Expected profit if we test chip Ci, then predict remaining chips, package, and re-test [for all Ci not yet tested]

  25. On-Wafer Chip Test Results 3.8% increase in profit

  26. Density Estimation Summary • Desired output is a joint probability distribution P(C1, C2, …, C203) • Training examples are points X= (C1, C2, …, C203) sampled from this distribution

  27. agent Reinforcement Learning state s Environment reward r action a Agent’s goal: Choose actions to maximize total reward Action Selection Rule is called a “policy”: a = p(s)

  28. Reinforcement Learning for Robot Navigation • Learning from rewards and punishments in the environment • Give reward for reaching goal • Give punishment for getting lost • Give punishment for collisions

  29. Experimental Results:% trials robot reaches goal Busquets, Lopez de Mantaras, Sierra, Dietterich (2002)

  30. Reinforcement Learning Summary • Desired output is an action selection policy p • Training examples are <s,a,r,s’> tuples collected by the agent interacting with the environment

  31. Outline • Three scenarios where software engineering methods fail • Machine learning methods applied to these scenarios • Fundamental questions in machine learning • Statistical thinking in computer science

  32. Fundamental Issues in Machine Learning • Incorporating Prior Knowledge • Incorporating Learned Structures into Larger Systems • Making Reinforcement Learning Practical • Triple Tradeoff: accuracy, sample size, hypothesis complexity

  33. Incorporating Prior Knowledge • How can we incorporate our prior knowledge into the learning algorithm? • Difficult for decision trees, neural networks, support-vector machines, etc. • Mismatch between form of our knowledge and the way the algorithms work • Easier for Bayesian networks • Express knowledge as constraints on the network

  34. Incorporating Learned Structures into Larger Systems • Success story: Digit recognizer incorporated into check reader • Challenges: • Larger system may make several coordinated decisions, but learning system treated each decision as independent • Larger system may have complex cost function: Errors in thousands place versus the cents place: $7,236.07

  35. Making Reinforcement Learning Practical • Current reinforcement learning methods do not scale well to large problems • Need robust reinforcement learning methodologies

  36. The Triple Tradeoff • Fundamental relationship between • amount of training data • size and complexity of hypothesis space • accuracy of the learned hypothesis • Explains many phenomena observed in machine learning systems

  37. Learning Algorithms • Set of data points • Class H of hypotheses • Optimization problem: Find the hypothesis h in H that best fits the data Training Data h Hypothesis Space

  38. Triple Tradeoff Amount of Data – Hypothesis Complexity – Accuracy N = 1000 Accuracy N = 100 N = 10 Hypothesis Space Complexity

  39. Triple Tradeoff (2) H3 Hypothesis Complexity H2 Accuracy H1 Number of training examples N

  40. Intuition • With only a small amount of data, we can only discriminate between a small number of different hypotheses • As we get more data, we have more evidence, so we can consider more alternative hypotheses • Complex hypotheses give better fit to the data

  41. Fixed versus Variable-Sized Hypothesis Spaces • Fixed size • Ordinary linear regression • Bayes net with fixed structure • Neural networks • Variable size • Decision trees • Bayes nets with variable structure • Support vector machines

  42. Corollary 1:Fixed H will underfit H2 underfit Accuracy H1 Number of training examples N

  43. Corollary 2:Variable-sized H will overfit overfit Accuracy N = 100 Hypothesis Space Complexity

  44. Ideal Learning Algorithm: Adapt complexity to data N = 1000 Accuracy N = 100 N = 10 Hypothesis Space Complexity

  45. Adapting Hypothesis Complexity to Data Complexity • Find hypothesis h to minimize error(h) + l complexity(h) • Many methods for adjusting l • Cross-validation • MDL

  46. Outline • Three scenarios where software engineering methods fail • Machine learning methods applied to these scenarios • Fundamental questions in machine learning • Statistical thinking in computer science

  47. The Data Explosion • NASA Data • 284 Terabytes (as of August, 1999) • Earth Observing System: 194 G/day • Landsat 7: 150 G/day • Hubble Space Telescope: 0.6 G/day http://spsosun.gsfc.nasa.gov/eosinfo/EOSDIS_Site/index.html

  48. The Data Explosion (2) • Google indexes 2,073,418,204 web pages • US Year 2000 Census: 62 Terabytes of scanned images • Walmart Data Warehouse: 7 (500?) Terabytes • Missouri Botanical Garden TROPICOS plant image database: 700 Gbytes

  49. Old Computer Science Conception of Data Store Retrieve

  50. New Computer Science Conception of Data Problems Store Build Models Solve Problems Solutions