- 260 Views
- Uploaded on
- Presentation posted in: Sports / GamesEducation / CareerFashion / BeautyGraphics / DesignNews / Politics

Machine Learning: Making Computer Science Scientific

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Machine Learning: Making Computer Science Scientific

Thomas G. Dietterich

Department of Computer Science

Oregon State University

Corvallis, Oregon 97331

http://www.cs.orst.edu/~tgd

- VLSI Wafer Testing
- Tony Fountain

- Robot Navigation
- Didac Busquets
- Carles Sierra
- Ramon Lopez de Mantaras

- NSF grants IIS-0083292 and ITR-085836

- Three scenarios where standard software engineering methods fail
- Machine learning methods applied to these scenarios
- Fundamental questions in machine learning
- Statistical thinking in computer science

Find and read “courtesy amount” on checks:

- Method 1: Interview humans to find out what steps they follow in reading checks
- Method 2: Collect examples of checks and the correct amounts. Train a machine learning system to recognize the amounts

- Wafer test: Functional test of each die (chip) while on the wafer

- Tradeoff:
- Test all chips on wafer?
- Avoid cost of packaging bad chips
- Incur cost of testing all chips

- Test none of the chips on the wafer?
- May package some bad chips
- No cost of testing on wafer

- Test all chips on wafer?

- Method 1: Guess the right tradeoff point
- Method 2: Learn a probabilistic model that captures the probability that each chip will be bad
- Plug this model into a Bayesian decision making procedure to optimize expected profit

Binocular

No GPS

- Mobile robot uses camera both for obstacle avoidance and landmark-based navigation
- Tradeoff:
- If camera is used only for navigation, robot collides with objects
- If camera is used only for obstacle avoidance, robot gets lost

- Method 1: Manually write a program to allocate the camera
- Method 2: Experimentally learn a policy for switching between obstacle avoidance and landmark tracking

- Standard SE methods fail when…
- System requirements are hard to collect
- The system must resolve difficult tradeoffs

- There are no human experts
- Cellular telephone fraud

- Human experts are inarticulate
- Handwriting recognition

- The requirements are changing rapidly
- Computer intrusion detection

- Each user has different requirements
- E-mail filtering

- VLSI Wafer testing
- Tradeoff point depends on probability of bad chips, relative costs of testing versus packaging

- Camera Allocation for Mobile Robot
- Tradeoff depends on probability of obstacles, number and quality of landmarks

- In all of these cases, the standard SE methodology requires engineers to make guesses
- Guessing how to do character recognition
- Guessing the tradeoff point for wafer test
- Guessing the tradeoff for camera allocation

- Machine Learning provides a way of making these decisions based on data

- Three scenarios where software engineering methods fail
- Machine learning methods applied to these scenarios
- Fundamental questions in machine learning
- Statistical thinking in computer science

- Supervised Learning
- Density Estimation
- Reinforcement Learning

1

0

6

3

8

Training Examples

New Examples

Learning

Algorithm

Classifier

8

Recognition transformer is a neural network trained on 500,000 examples of characters

The entire system is trained given entire checks as input and dollar amounts as output

LeCun, Bottou, Bengio & Haffner (1998) Gradient-Based Learning Applied to Document Recognition

- 82% of machine-printed checks correctly recognized
- 1% of checks incorrectly recognized
- 17% “rejected” – check is presented to a person for manual reading
- Fielded by NCR in June 1996; reads millions of checks per month

- Desired classifier is a function y = f(x)
- Training examples are desired input-output pairs (xi,yi)

Training Examples

Partially-tested wafer

Learning

Algorithm

Density

Estimator

P(chipi is bad) = 0.42

W

. . .

C1

C2

C3

C209

- Trained density estimator on 600 wafers from mature product (HP; Corvallis, OR)
- Probability model is “naïve Bayes” mixture model with four components (trained with EM)

- Choose the larger of
- Expected profit if we predict remaining chips, package, and re-test
- Expected profit if we test chip Ci, then predict remaining chips, package, and re-test [for all Ci not yet tested]

3.8% increase in profit

- Desired output is a joint probability distribution P(C1, C2, …, C203)
- Training examples are points X= (C1, C2, …, C203) sampled from this distribution

agent

state s

Environment

reward r

action a

Agent’s goal: Choose actions to maximize total reward

Action Selection Rule is called a “policy”: a = p(s)

- Learning from rewards and punishments in the environment
- Give reward for reaching goal
- Give punishment for getting lost
- Give punishment for collisions

Busquets, Lopez de Mantaras, Sierra, Dietterich (2002)

- Desired output is an action selection policy p
- Training examples are <s,a,r,s’> tuples collected by the agent interacting with the environment

- Three scenarios where software engineering methods fail
- Machine learning methods applied to these scenarios
- Fundamental questions in machine learning
- Statistical thinking in computer science

- Incorporating Prior Knowledge
- Incorporating Learned Structures into Larger Systems
- Making Reinforcement Learning Practical
- Triple Tradeoff: accuracy, sample size, hypothesis complexity

- How can we incorporate our prior knowledge into the learning algorithm?
- Difficult for decision trees, neural networks, support-vector machines, etc.
- Mismatch between form of our knowledge and the way the algorithms work

- Easier for Bayesian networks
- Express knowledge as constraints on the network

- Difficult for decision trees, neural networks, support-vector machines, etc.

- Success story: Digit recognizer incorporated into check reader
- Challenges:
- Larger system may make several coordinated decisions, but learning system treated each decision as independent
- Larger system may have complex cost function: Errors in thousands place versus the cents place: $7,236.07

- Current reinforcement learning methods do not scale well to large problems
- Need robust reinforcement learning methodologies

- Fundamental relationship between
- amount of training data
- size and complexity of hypothesis space
- accuracy of the learned hypothesis

- Explains many phenomena observed in machine learning systems

- Set of data points
- Class H of hypotheses
- Optimization problem: Find the hypothesis h in H that best fits the data

Training

Data

h

Hypothesis Space

Amount of Data – Hypothesis Complexity – Accuracy

N = 1000

Accuracy

N = 100

N = 10

Hypothesis Space Complexity

H3

Hypothesis Complexity

H2

Accuracy

H1

Number of training examples N

- With only a small amount of data, we can only discriminate between a small number of different hypotheses
- As we get more data, we have more evidence, so we can consider more alternative hypotheses
- Complex hypotheses give better fit to the data

- Fixed size
- Ordinary linear regression
- Bayes net with fixed structure
- Neural networks

- Variable size
- Decision trees
- Bayes nets with variable structure
- Support vector machines

H2

underfit

Accuracy

H1

Number of training examples N

overfit

Accuracy

N = 100

Hypothesis Space Complexity

N = 1000

Accuracy

N = 100

N = 10

Hypothesis Space Complexity

- Find hypothesis h to minimize
error(h) + l complexity(h)

- Many methods for adjusting l
- Cross-validation
- MDL

- Three scenarios where software engineering methods fail
- Machine learning methods applied to these scenarios
- Fundamental questions in machine learning
- Statistical thinking in computer science

- NASA Data
- 284 Terabytes (as of August, 1999)
- Earth Observing System: 194 G/day
- Landsat 7: 150 G/day
- Hubble Space Telescope: 0.6 G/day

http://spsosun.gsfc.nasa.gov/eosinfo/EOSDIS_Site/index.html

- Google indexes 2,073,418,204 web pages
- US Year 2000 Census: 62 Terabytes of scanned images
- Walmart Data Warehouse: 7 (500?) Terabytes
- Missouri Botanical Garden TROPICOS plant image database: 700 Gbytes

Store

Retrieve

Problems

Store

Build

Models

Solve

Problems

Solutions

- Methods for building models from data
- Methods for collecting and/or sampling data
- Methods for evaluating and validating learned models
- Methods for reasoning and decision-making with learned models
- Theoretical analyses

- Natural language processing
- Databases and data mining
- Computer architecture
- Compilers
- Computer graphics

Source: Jiménez & Lin (2000) Perceptron Learning for Predicting the Behavior of Conditional Branches

- The performance of modern microprocessors depends on the order in which instructions are executed
- Modern compilers rearrange instruction order to optimize performance (“instruction scheduling”)
- Each new CPU design requires modifying the instruction scheduler

- Moss, et al. (1997): Machine Learning scheduler can beat performance of commercial compilers and match the performance of research compiler.
- Training examples: small basic blocks
- Experimentally determine optimal instruction order
- Learn preference function

- Generate new video by splicing together short stretches of old video

A

B

C

D

E

F

B

D

E

D

E

F

A

Apply reinforcement learning to identify good transition points

Arno Schödl, Richard Szeliski, David H. Salesin, Irfan Essa (SIGGRAPH 2000)

You can find this video at Virtual Fish Tank Movie

:

::

?

:

Hertzmann, Jacobs, Oliver, Curless, Salesin (2000) SIGGRAPH

A(p)

A’(p)

B(q)

B’(q)

Find p to minimize Euclidean distance between

and

B’(q) := A’(p)

:

::

:

A video can be found at

Image Analogies Movie

- Standard Software Engineering methods fail in many application problems
- Machine Learning methods can replace guesswork with data to make good design decisions

- Machine Learning is already at the heart of speech recognition and handwriting recognition
- Statistical methods are transforming natural language processing (understanding, translation, retrieval)
- Statistical methods are creating opportunities in databases, computer graphics, robotics, computer vision, networking, and computer security

- Data is a new source of power for computer science
- Every computer science student should learn the fundamentals of machine learning and statistical thinking
- By combining engineered frameworks with models learned from data, we can develop the high-performance systems of the future