- By
**lotus** - Follow User

- 284 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'This is a heavily data-oriented' - lotus

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Machine Learning: Making Computer Science Scientific

Thomas G. Dietterich

Department of Computer Science

Oregon State University

Corvallis, Oregon 97331

http://www.cs.orst.edu/~tgd

Acknowledgements

- VLSI Wafer Testing
- Tony Fountain
- Robot Navigation
- Didac Busquets
- Carles Sierra
- Ramon Lopez de Mantaras
- NSF grants IIS-0083292 and ITR-085836

Outline

- Three scenarios where standard software engineering methods fail
- Machine learning methods applied to these scenarios
- Fundamental questions in machine learning
- Statistical thinking in computer science

Scenario 1: Reading Checks

Find and read “courtesy amount” on checks:

Possible Methods:

- Method 1: Interview humans to find out what steps they follow in reading checks
- Method 2: Collect examples of checks and the correct amounts. Train a machine learning system to recognize the amounts

Scenario 2: VLSI Wafer Testing

- Wafer test: Functional test of each die (chip) while on the wafer

Which Chips (and how many) should be tested?

- Tradeoff:
- Test all chips on wafer?
- Avoid cost of packaging bad chips
- Incur cost of testing all chips
- Test none of the chips on the wafer?
- May package some bad chips
- No cost of testing on wafer

Possible Methods

- Method 1: Guess the right tradeoff point
- Method 2: Learn a probabilistic model that captures the probability that each chip will be bad
- Plug this model into a Bayesian decision making procedure to optimize expected profit

Camera tradeoff

- Mobile robot uses camera both for obstacle avoidance and landmark-based navigation
- Tradeoff:
- If camera is used only for navigation, robot collides with objects
- If camera is used only for obstacle avoidance, robot gets lost

Possible Methods

- Method 1: Manually write a program to allocate the camera
- Method 2: Experimentally learn a policy for switching between obstacle avoidance and landmark tracking

Challenges for SE Methodology

- Standard SE methods fail when…
- System requirements are hard to collect
- The system must resolve difficult tradeoffs

(1) System requirements are hard to collect

- There are no human experts
- Cellular telephone fraud
- Human experts are inarticulate
- Handwriting recognition
- The requirements are changing rapidly
- Computer intrusion detection
- Each user has different requirements
- E-mail filtering

(2) The system must resolve difficult tradeoffs

- VLSI Wafer testing
- Tradeoff point depends on probability of bad chips, relative costs of testing versus packaging
- Camera Allocation for Mobile Robot
- Tradeoff depends on probability of obstacles, number and quality of landmarks

Machine Learning: Replacing guesswork with data

- In all of these cases, the standard SE methodology requires engineers to make guesses
- Guessing how to do character recognition
- Guessing the tradeoff point for wafer test
- Guessing the tradeoff for camera allocation
- Machine Learning provides a way of making these decisions based on data

Outline

- Three scenarios where software engineering methods fail
- Machine learning methods applied to these scenarios
- Fundamental questions in machine learning
- Statistical thinking in computer science

Basic Machine Learning Methods

- Supervised Learning
- Density Estimation
- Reinforcement Learning

AT&T/NCR Check Reading System

Recognition transformer is a neural network trained on 500,000 examples of characters

The entire system is trained given entire checks as input and dollar amounts as output

LeCun, Bottou, Bengio & Haffner (1998) Gradient-Based Learning Applied to Document Recognition

Check Reader Performance

- 82% of machine-printed checks correctly recognized
- 1% of checks incorrectly recognized
- 17% “rejected” – check is presented to a person for manual reading
- Fielded by NCR in June 1996; reads millions of checks per month

Supervised Learning Summary

- Desired classifier is a function y = f(x)
- Training examples are desired input-output pairs (xi,yi)

Density Estimation

Training Examples

Partially-tested wafer

Learning

Algorithm

Density

Estimator

P(chipi is bad) = 0.42

. . .

C1

C2

C3

C209

On-Wafer Testing System- Trained density estimator on 600 wafers from mature product (HP; Corvallis, OR)
- Probability model is “naïve Bayes” mixture model with four components (trained with EM)

One-Step Value of Information

- Choose the larger of
- Expected profit if we predict remaining chips, package, and re-test
- Expected profit if we test chip Ci, then predict remaining chips, package, and re-test [for all Ci not yet tested]

On-Wafer Chip Test Results

3.8% increase in profit

Density Estimation Summary

- Desired output is a joint probability distribution P(C1, C2, …, C203)
- Training examples are points X= (C1, C2, …, C203) sampled from this distribution

Reinforcement Learning

state s

Environment

reward r

action a

Agent’s goal: Choose actions to maximize total reward

Action Selection Rule is called a “policy”: a = p(s)

Reinforcement Learning for Robot Navigation

- Learning from rewards and punishments in the environment
- Give reward for reaching goal
- Give punishment for getting lost
- Give punishment for collisions

Experimental Results:% trials robot reaches goal

Busquets, Lopez de Mantaras, Sierra, Dietterich (2002)

Reinforcement Learning Summary

- Desired output is an action selection policy p
- Training examples are <s,a,r,s’> tuples collected by the agent interacting with the environment

Outline

- Three scenarios where software engineering methods fail
- Machine learning methods applied to these scenarios
- Fundamental questions in machine learning
- Statistical thinking in computer science

Fundamental Issues in Machine Learning

- Incorporating Prior Knowledge
- Incorporating Learned Structures into Larger Systems
- Making Reinforcement Learning Practical
- Triple Tradeoff: accuracy, sample size, hypothesis complexity

Incorporating Prior Knowledge

- How can we incorporate our prior knowledge into the learning algorithm?
- Difficult for decision trees, neural networks, support-vector machines, etc.
- Mismatch between form of our knowledge and the way the algorithms work
- Easier for Bayesian networks
- Express knowledge as constraints on the network

Incorporating Learned Structures into Larger Systems

- Success story: Digit recognizer incorporated into check reader
- Challenges:
- Larger system may make several coordinated decisions, but learning system treated each decision as independent
- Larger system may have complex cost function: Errors in thousands place versus the cents place: $7,236.07

Making Reinforcement Learning Practical

- Current reinforcement learning methods do not scale well to large problems
- Need robust reinforcement learning methodologies

The Triple Tradeoff

- Fundamental relationship between
- amount of training data
- size and complexity of hypothesis space
- accuracy of the learned hypothesis
- Explains many phenomena observed in machine learning systems

Learning Algorithms

- Set of data points
- Class H of hypotheses
- Optimization problem: Find the hypothesis h in H that best fits the data

Training

Data

h

Hypothesis Space

Triple Tradeoff

Amount of Data – Hypothesis Complexity – Accuracy

N = 1000

Accuracy

N = 100

N = 10

Hypothesis Space Complexity

Intuition

- With only a small amount of data, we can only discriminate between a small number of different hypotheses
- As we get more data, we have more evidence, so we can consider more alternative hypotheses
- Complex hypotheses give better fit to the data

Fixed versus Variable-Sized Hypothesis Spaces

- Fixed size
- Ordinary linear regression
- Bayes net with fixed structure
- Neural networks
- Variable size
- Decision trees
- Bayes nets with variable structure
- Support vector machines

Ideal Learning Algorithm: Adapt complexity to data

N = 1000

Accuracy

N = 100

N = 10

Hypothesis Space Complexity

Adapting Hypothesis Complexity to Data Complexity

- Find hypothesis h to minimize

error(h) + l complexity(h)

- Many methods for adjusting l
- Cross-validation
- MDL

Outline

- Three scenarios where software engineering methods fail
- Machine learning methods applied to these scenarios
- Fundamental questions in machine learning
- Statistical thinking in computer science

The Data Explosion

- NASA Data
- 284 Terabytes (as of August, 1999)
- Earth Observing System: 194 G/day
- Landsat 7: 150 G/day
- Hubble Space Telescope: 0.6 G/day

http://spsosun.gsfc.nasa.gov/eosinfo/EOSDIS_Site/index.html

The Data Explosion (2)

- Google indexes 2,073,418,204 web pages
- US Year 2000 Census: 62 Terabytes of scanned images
- Walmart Data Warehouse: 7 (500?) Terabytes
- Missouri Botanical Garden TROPICOS plant image database: 700 Gbytes

Machine Learning:Making Data Active

- Methods for building models from data
- Methods for collecting and/or sampling data
- Methods for evaluating and validating learned models
- Methods for reasoning and decision-making with learned models
- Theoretical analyses

Machine Learning andComputer Science

- Natural language processing
- Databases and data mining
- Computer architecture
- Compilers
- Computer graphics

Hardware Branch Prediction

Source: Jiménez & Lin (2000) Perceptron Learning for Predicting the Behavior of Conditional Branches

Instruction Scheduler for New CPU

- The performance of modern microprocessors depends on the order in which instructions are executed
- Modern compilers rearrange instruction order to optimize performance (“instruction scheduling”)
- Each new CPU design requires modifying the instruction scheduler

Instruction Scheduling

- Moss, et al. (1997): Machine Learning scheduler can beat performance of commercial compilers and match the performance of research compiler.
- Training examples: small basic blocks
- Experimentally determine optimal instruction order
- Learn preference function

Computer Graphics: Video Textures

- Generate new video by splicing together short stretches of old video

A

B

C

D

E

F

B

D

E

D

E

F

A

Apply reinforcement learning to identify good transition points

Arno Schödl, Richard Szeliski, David H. Salesin, Irfan Essa (SIGGRAPH 2000)

Video TexturesArno Schödl, Richard Szeliski, David H. Salesin, Irfan Essa (SIGGRAPH 2000)

You can find this video at Virtual Fish Tank Movie

A’(p)

B(q)

B’(q)

Learning to Predict TexturesFind p to minimize Euclidean distance between

and

B’(q) := A’(p)

Image Analogies Movie

Summary

- Standard Software Engineering methods fail in many application problems
- Machine Learning methods can replace guesswork with data to make good design decisions

Machine Learning and Computer Science

- Machine Learning is already at the heart of speech recognition and handwriting recognition
- Statistical methods are transforming natural language processing (understanding, translation, retrieval)
- Statistical methods are creating opportunities in databases, computer graphics, robotics, computer vision, networking, and computer security

Computer Power and Data Power

- Data is a new source of power for computer science
- Every computer science student should learn the fundamentals of machine learning and statistical thinking
- By combining engineered frameworks with models learned from data, we can develop the high-performance systems of the future

Download Presentation

Connecting to Server..