Machine Learning: Making Computer Science Scientific

346 Views

Download Presentation
## Machine Learning: Making Computer Science Scientific

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Machine Learning: Making Computer Science Scientific**Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 http://www.cs.orst.edu/~tgd**Acknowledgements**• VLSI Wafer Testing • Tony Fountain • Robot Navigation • Didac Busquets • Carles Sierra • Ramon Lopez de Mantaras • NSF grants IIS-0083292 and ITR-085836**Outline**• Three scenarios where standard software engineering methods fail • Machine learning methods applied to these scenarios • Fundamental questions in machine learning • Statistical thinking in computer science**Scenario 1: Reading Checks**Find and read “courtesy amount” on checks:**Possible Methods:**• Method 1: Interview humans to find out what steps they follow in reading checks • Method 2: Collect examples of checks and the correct amounts. Train a machine learning system to recognize the amounts**Scenario 2: VLSI Wafer Testing**• Wafer test: Functional test of each die (chip) while on the wafer**Which Chips (and how many) should be tested?**• Tradeoff: • Test all chips on wafer? • Avoid cost of packaging bad chips • Incur cost of testing all chips • Test none of the chips on the wafer? • May package some bad chips • No cost of testing on wafer**Possible Methods**• Method 1: Guess the right tradeoff point • Method 2: Learn a probabilistic model that captures the probability that each chip will be bad • Plug this model into a Bayesian decision making procedure to optimize expected profit**Scenario 3: Allocating mobile robot camera**Binocular No GPS**Camera tradeoff**• Mobile robot uses camera both for obstacle avoidance and landmark-based navigation • Tradeoff: • If camera is used only for navigation, robot collides with objects • If camera is used only for obstacle avoidance, robot gets lost**Possible Methods**• Method 1: Manually write a program to allocate the camera • Method 2: Experimentally learn a policy for switching between obstacle avoidance and landmark tracking**Challenges for SE Methodology**• Standard SE methods fail when… • System requirements are hard to collect • The system must resolve difficult tradeoffs**(1) System requirements are hard to collect**• There are no human experts • Cellular telephone fraud • Human experts are inarticulate • Handwriting recognition • The requirements are changing rapidly • Computer intrusion detection • Each user has different requirements • E-mail filtering**(2) The system must resolve difficult tradeoffs**• VLSI Wafer testing • Tradeoff point depends on probability of bad chips, relative costs of testing versus packaging • Camera Allocation for Mobile Robot • Tradeoff depends on probability of obstacles, number and quality of landmarks**Machine Learning: Replacing guesswork with data**• In all of these cases, the standard SE methodology requires engineers to make guesses • Guessing how to do character recognition • Guessing the tradeoff point for wafer test • Guessing the tradeoff for camera allocation • Machine Learning provides a way of making these decisions based on data**Outline**• Three scenarios where software engineering methods fail • Machine learning methods applied to these scenarios • Fundamental questions in machine learning • Statistical thinking in computer science**Basic Machine Learning Methods**• Supervised Learning • Density Estimation • Reinforcement Learning**1**0 6 3 8 Supervised Learning Training Examples New Examples Learning Algorithm Classifier 8**AT&T/NCR Check Reading System**Recognition transformer is a neural network trained on 500,000 examples of characters The entire system is trained given entire checks as input and dollar amounts as output LeCun, Bottou, Bengio & Haffner (1998) Gradient-Based Learning Applied to Document Recognition**Check Reader Performance**• 82% of machine-printed checks correctly recognized • 1% of checks incorrectly recognized • 17% “rejected” – check is presented to a person for manual reading • Fielded by NCR in June 1996; reads millions of checks per month**Supervised Learning Summary**• Desired classifier is a function y = f(x) • Training examples are desired input-output pairs (xi,yi)**Density Estimation**Training Examples Partially-tested wafer Learning Algorithm Density Estimator P(chipi is bad) = 0.42**W**. . . C1 C2 C3 C209 On-Wafer Testing System • Trained density estimator on 600 wafers from mature product (HP; Corvallis, OR) • Probability model is “naïve Bayes” mixture model with four components (trained with EM)**One-Step Value of Information**• Choose the larger of • Expected profit if we predict remaining chips, package, and re-test • Expected profit if we test chip Ci, then predict remaining chips, package, and re-test [for all Ci not yet tested]**On-Wafer Chip Test Results**3.8% increase in profit**Density Estimation Summary**• Desired output is a joint probability distribution P(C1, C2, …, C203) • Training examples are points X= (C1, C2, …, C203) sampled from this distribution**agent**Reinforcement Learning state s Environment reward r action a Agent’s goal: Choose actions to maximize total reward Action Selection Rule is called a “policy”: a = p(s)**Reinforcement Learning for Robot Navigation**• Learning from rewards and punishments in the environment • Give reward for reaching goal • Give punishment for getting lost • Give punishment for collisions**Experimental Results:% trials robot reaches goal**Busquets, Lopez de Mantaras, Sierra, Dietterich (2002)**Reinforcement Learning Summary**• Desired output is an action selection policy p • Training examples are <s,a,r,s’> tuples collected by the agent interacting with the environment**Outline**• Three scenarios where software engineering methods fail • Machine learning methods applied to these scenarios • Fundamental questions in machine learning • Statistical thinking in computer science**Fundamental Issues in Machine Learning**• Incorporating Prior Knowledge • Incorporating Learned Structures into Larger Systems • Making Reinforcement Learning Practical • Triple Tradeoff: accuracy, sample size, hypothesis complexity**Incorporating Prior Knowledge**• How can we incorporate our prior knowledge into the learning algorithm? • Difficult for decision trees, neural networks, support-vector machines, etc. • Mismatch between form of our knowledge and the way the algorithms work • Easier for Bayesian networks • Express knowledge as constraints on the network**Incorporating Learned Structures into Larger Systems**• Success story: Digit recognizer incorporated into check reader • Challenges: • Larger system may make several coordinated decisions, but learning system treated each decision as independent • Larger system may have complex cost function: Errors in thousands place versus the cents place: $7,236.07**Making Reinforcement Learning Practical**• Current reinforcement learning methods do not scale well to large problems • Need robust reinforcement learning methodologies**The Triple Tradeoff**• Fundamental relationship between • amount of training data • size and complexity of hypothesis space • accuracy of the learned hypothesis • Explains many phenomena observed in machine learning systems**Learning Algorithms**• Set of data points • Class H of hypotheses • Optimization problem: Find the hypothesis h in H that best fits the data Training Data h Hypothesis Space**Triple Tradeoff**Amount of Data – Hypothesis Complexity – Accuracy N = 1000 Accuracy N = 100 N = 10 Hypothesis Space Complexity**Triple Tradeoff (2)**H3 Hypothesis Complexity H2 Accuracy H1 Number of training examples N**Intuition**• With only a small amount of data, we can only discriminate between a small number of different hypotheses • As we get more data, we have more evidence, so we can consider more alternative hypotheses • Complex hypotheses give better fit to the data**Fixed versus Variable-Sized Hypothesis Spaces**• Fixed size • Ordinary linear regression • Bayes net with fixed structure • Neural networks • Variable size • Decision trees • Bayes nets with variable structure • Support vector machines**Corollary 1:Fixed H will underfit**H2 underfit Accuracy H1 Number of training examples N**Corollary 2:Variable-sized H will overfit**overfit Accuracy N = 100 Hypothesis Space Complexity**Ideal Learning Algorithm: Adapt complexity to data**N = 1000 Accuracy N = 100 N = 10 Hypothesis Space Complexity**Adapting Hypothesis Complexity to Data Complexity**• Find hypothesis h to minimize error(h) + l complexity(h) • Many methods for adjusting l • Cross-validation • MDL**Outline**• Three scenarios where software engineering methods fail • Machine learning methods applied to these scenarios • Fundamental questions in machine learning • Statistical thinking in computer science**The Data Explosion**• NASA Data • 284 Terabytes (as of August, 1999) • Earth Observing System: 194 G/day • Landsat 7: 150 G/day • Hubble Space Telescope: 0.6 G/day http://spsosun.gsfc.nasa.gov/eosinfo/EOSDIS_Site/index.html**The Data Explosion (2)**• Google indexes 2,073,418,204 web pages • US Year 2000 Census: 62 Terabytes of scanned images • Walmart Data Warehouse: 7 (500?) Terabytes • Missouri Botanical Garden TROPICOS plant image database: 700 Gbytes**Old Computer Science Conception of Data**Store Retrieve**New Computer Science Conception of Data**Problems Store Build Models Solve Problems Solutions