Machine learning making computer science scientific l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 64

Machine Learning: Making Computer Science Scientific PowerPoint PPT Presentation

Machine Learning: Making Computer Science Scientific Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 http://www.cs.orst.edu/~tgd Acknowledgements VLSI Wafer Testing Tony Fountain Robot Navigation Didac Busquets Carles Sierra

Download Presentation

Machine Learning: Making Computer Science Scientific

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Machine learning making computer science scientific l.jpg

Machine Learning: Making Computer Science Scientific

Thomas G. Dietterich

Department of Computer Science

Oregon State University

Corvallis, Oregon 97331

http://www.cs.orst.edu/~tgd


Acknowledgements l.jpg

Acknowledgements

  • VLSI Wafer Testing

    • Tony Fountain

  • Robot Navigation

    • Didac Busquets

    • Carles Sierra

    • Ramon Lopez de Mantaras

  • NSF grants IIS-0083292 and ITR-085836


Outline l.jpg

Outline

  • Three scenarios where standard software engineering methods fail

  • Machine learning methods applied to these scenarios

  • Fundamental questions in machine learning

  • Statistical thinking in computer science


Scenario 1 reading checks l.jpg

Scenario 1: Reading Checks

Find and read “courtesy amount” on checks:


Possible methods l.jpg

Possible Methods:

  • Method 1: Interview humans to find out what steps they follow in reading checks

  • Method 2: Collect examples of checks and the correct amounts. Train a machine learning system to recognize the amounts


Scenario 2 vlsi wafer testing l.jpg

Scenario 2: VLSI Wafer Testing

  • Wafer test: Functional test of each die (chip) while on the wafer


Which chips and how many should be tested l.jpg

Which Chips (and how many) should be tested?

  • Tradeoff:

    • Test all chips on wafer?

      • Avoid cost of packaging bad chips

      • Incur cost of testing all chips

    • Test none of the chips on the wafer?

      • May package some bad chips

      • No cost of testing on wafer


Possible methods8 l.jpg

Possible Methods

  • Method 1: Guess the right tradeoff point

  • Method 2: Learn a probabilistic model that captures the probability that each chip will be bad

    • Plug this model into a Bayesian decision making procedure to optimize expected profit


Scenario 3 allocating mobile robot camera l.jpg

Scenario 3: Allocating mobile robot camera

Binocular

No GPS


Camera tradeoff l.jpg

Camera tradeoff

  • Mobile robot uses camera both for obstacle avoidance and landmark-based navigation

  • Tradeoff:

    • If camera is used only for navigation, robot collides with objects

    • If camera is used only for obstacle avoidance, robot gets lost


Possible methods11 l.jpg

Possible Methods

  • Method 1: Manually write a program to allocate the camera

  • Method 2: Experimentally learn a policy for switching between obstacle avoidance and landmark tracking


Challenges for se methodology l.jpg

Challenges for SE Methodology

  • Standard SE methods fail when…

    • System requirements are hard to collect

    • The system must resolve difficult tradeoffs


1 system requirements are hard to collect l.jpg

(1) System requirements are hard to collect

  • There are no human experts

    • Cellular telephone fraud

  • Human experts are inarticulate

    • Handwriting recognition

  • The requirements are changing rapidly

    • Computer intrusion detection

  • Each user has different requirements

    • E-mail filtering


2 the system must resolve difficult tradeoffs l.jpg

(2) The system must resolve difficult tradeoffs

  • VLSI Wafer testing

    • Tradeoff point depends on probability of bad chips, relative costs of testing versus packaging

  • Camera Allocation for Mobile Robot

    • Tradeoff depends on probability of obstacles, number and quality of landmarks


Machine learning replacing guesswork with data l.jpg

Machine Learning: Replacing guesswork with data

  • In all of these cases, the standard SE methodology requires engineers to make guesses

    • Guessing how to do character recognition

    • Guessing the tradeoff point for wafer test

    • Guessing the tradeoff for camera allocation

  • Machine Learning provides a way of making these decisions based on data


Outline16 l.jpg

Outline

  • Three scenarios where software engineering methods fail

  • Machine learning methods applied to these scenarios

  • Fundamental questions in machine learning

  • Statistical thinking in computer science


Basic machine learning methods l.jpg

Basic Machine Learning Methods

  • Supervised Learning

  • Density Estimation

  • Reinforcement Learning


Supervised learning l.jpg

1

0

6

3

8

Supervised Learning

Training Examples

New Examples

Learning

Algorithm

Classifier

8


At t ncr check reading system l.jpg

AT&T/NCR Check Reading System

Recognition transformer is a neural network trained on 500,000 examples of characters

The entire system is trained given entire checks as input and dollar amounts as output

LeCun, Bottou, Bengio & Haffner (1998) Gradient-Based Learning Applied to Document Recognition


Check reader performance l.jpg

Check Reader Performance

  • 82% of machine-printed checks correctly recognized

  • 1% of checks incorrectly recognized

  • 17% “rejected” – check is presented to a person for manual reading

  • Fielded by NCR in June 1996; reads millions of checks per month


Supervised learning summary l.jpg

Supervised Learning Summary

  • Desired classifier is a function y = f(x)

  • Training examples are desired input-output pairs (xi,yi)


Density estimation l.jpg

Density Estimation

Training Examples

Partially-tested wafer

Learning

Algorithm

Density

Estimator

P(chipi is bad) = 0.42


On wafer testing system l.jpg

W

. . .

C1

C2

C3

C209

On-Wafer Testing System

  • Trained density estimator on 600 wafers from mature product (HP; Corvallis, OR)

    • Probability model is “naïve Bayes” mixture model with four components (trained with EM)


One step value of information l.jpg

One-Step Value of Information

  • Choose the larger of

    • Expected profit if we predict remaining chips, package, and re-test

    • Expected profit if we test chip Ci, then predict remaining chips, package, and re-test [for all Ci not yet tested]


On wafer chip test results l.jpg

On-Wafer Chip Test Results

3.8% increase in profit


Density estimation summary l.jpg

Density Estimation Summary

  • Desired output is a joint probability distribution P(C1, C2, …, C203)

  • Training examples are points X= (C1, C2, …, C203) sampled from this distribution


Reinforcement learning l.jpg

agent

Reinforcement Learning

state s

Environment

reward r

action a

Agent’s goal: Choose actions to maximize total reward

Action Selection Rule is called a “policy”: a = p(s)


Reinforcement learning for robot navigation l.jpg

Reinforcement Learning for Robot Navigation

  • Learning from rewards and punishments in the environment

    • Give reward for reaching goal

    • Give punishment for getting lost

    • Give punishment for collisions


Experimental results trials robot reaches goal l.jpg

Experimental Results:% trials robot reaches goal

Busquets, Lopez de Mantaras, Sierra, Dietterich (2002)


Reinforcement learning summary l.jpg

Reinforcement Learning Summary

  • Desired output is an action selection policy p

  • Training examples are <s,a,r,s’> tuples collected by the agent interacting with the environment


Outline31 l.jpg

Outline

  • Three scenarios where software engineering methods fail

  • Machine learning methods applied to these scenarios

  • Fundamental questions in machine learning

  • Statistical thinking in computer science


Fundamental issues in machine learning l.jpg

Fundamental Issues in Machine Learning

  • Incorporating Prior Knowledge

  • Incorporating Learned Structures into Larger Systems

  • Making Reinforcement Learning Practical

  • Triple Tradeoff: accuracy, sample size, hypothesis complexity


Incorporating prior knowledge l.jpg

Incorporating Prior Knowledge

  • How can we incorporate our prior knowledge into the learning algorithm?

    • Difficult for decision trees, neural networks, support-vector machines, etc.

      • Mismatch between form of our knowledge and the way the algorithms work

    • Easier for Bayesian networks

      • Express knowledge as constraints on the network


Incorporating learned structures into larger systems l.jpg

Incorporating Learned Structures into Larger Systems

  • Success story: Digit recognizer incorporated into check reader

  • Challenges:

    • Larger system may make several coordinated decisions, but learning system treated each decision as independent

    • Larger system may have complex cost function: Errors in thousands place versus the cents place: $7,236.07


Making reinforcement learning practical l.jpg

Making Reinforcement Learning Practical

  • Current reinforcement learning methods do not scale well to large problems

  • Need robust reinforcement learning methodologies


The triple tradeoff l.jpg

The Triple Tradeoff

  • Fundamental relationship between

    • amount of training data

    • size and complexity of hypothesis space

    • accuracy of the learned hypothesis

  • Explains many phenomena observed in machine learning systems


Learning algorithms l.jpg

Learning Algorithms

  • Set of data points

  • Class H of hypotheses

  • Optimization problem: Find the hypothesis h in H that best fits the data

Training

Data

h

Hypothesis Space


Triple tradeoff l.jpg

Triple Tradeoff

Amount of Data – Hypothesis Complexity – Accuracy

N = 1000

Accuracy

N = 100

N = 10

Hypothesis Space Complexity


Triple tradeoff 2 l.jpg

Triple Tradeoff (2)

H3

Hypothesis Complexity

H2

Accuracy

H1

Number of training examples N


Intuition l.jpg

Intuition

  • With only a small amount of data, we can only discriminate between a small number of different hypotheses

  • As we get more data, we have more evidence, so we can consider more alternative hypotheses

  • Complex hypotheses give better fit to the data


Fixed versus variable sized hypothesis spaces l.jpg

Fixed versus Variable-Sized Hypothesis Spaces

  • Fixed size

    • Ordinary linear regression

    • Bayes net with fixed structure

    • Neural networks

  • Variable size

    • Decision trees

    • Bayes nets with variable structure

    • Support vector machines


Corollary 1 fixed h will underfit l.jpg

Corollary 1:Fixed H will underfit

H2

underfit

Accuracy

H1

Number of training examples N


Corollary 2 variable sized h will overfit l.jpg

Corollary 2:Variable-sized H will overfit

overfit

Accuracy

N = 100

Hypothesis Space Complexity


Ideal learning algorithm adapt complexity to data l.jpg

Ideal Learning Algorithm: Adapt complexity to data

N = 1000

Accuracy

N = 100

N = 10

Hypothesis Space Complexity


Adapting hypothesis complexity to data complexity l.jpg

Adapting Hypothesis Complexity to Data Complexity

  • Find hypothesis h to minimize

    error(h) + l complexity(h)

  • Many methods for adjusting l

    • Cross-validation

    • MDL


Outline46 l.jpg

Outline

  • Three scenarios where software engineering methods fail

  • Machine learning methods applied to these scenarios

  • Fundamental questions in machine learning

  • Statistical thinking in computer science


The data explosion l.jpg

The Data Explosion

  • NASA Data

    • 284 Terabytes (as of August, 1999)

    • Earth Observing System: 194 G/day

    • Landsat 7: 150 G/day

    • Hubble Space Telescope: 0.6 G/day

http://spsosun.gsfc.nasa.gov/eosinfo/EOSDIS_Site/index.html


The data explosion 2 l.jpg

The Data Explosion (2)

  • Google indexes 2,073,418,204 web pages

  • US Year 2000 Census: 62 Terabytes of scanned images

  • Walmart Data Warehouse: 7 (500?) Terabytes

  • Missouri Botanical Garden TROPICOS plant image database: 700 Gbytes


Old computer science conception of data l.jpg

Old Computer Science Conception of Data

Store

Retrieve


New computer science conception of data l.jpg

New Computer Science Conception of Data

Problems

Store

Build

Models

Solve

Problems

Solutions


Machine learning making data active l.jpg

Machine Learning:Making Data Active

  • Methods for building models from data

  • Methods for collecting and/or sampling data

  • Methods for evaluating and validating learned models

  • Methods for reasoning and decision-making with learned models

  • Theoretical analyses


Machine learning and computer science l.jpg

Machine Learning andComputer Science

  • Natural language processing

  • Databases and data mining

  • Computer architecture

  • Compilers

  • Computer graphics


Hardware branch prediction l.jpg

Hardware Branch Prediction

Source: Jiménez & Lin (2000) Perceptron Learning for Predicting the Behavior of Conditional Branches


Instruction scheduler for new cpu l.jpg

Instruction Scheduler for New CPU

  • The performance of modern microprocessors depends on the order in which instructions are executed

  • Modern compilers rearrange instruction order to optimize performance (“instruction scheduling”)

  • Each new CPU design requires modifying the instruction scheduler


Instruction scheduling l.jpg

Instruction Scheduling

  • Moss, et al. (1997): Machine Learning scheduler can beat performance of commercial compilers and match the performance of research compiler.

  • Training examples: small basic blocks

    • Experimentally determine optimal instruction order

    • Learn preference function


Computer graphics video textures l.jpg

Computer Graphics: Video Textures

  • Generate new video by splicing together short stretches of old video

A

B

C

D

E

F

B

D

E

D

E

F

A

Apply reinforcement learning to identify good transition points

Arno Schödl, Richard Szeliski, David H. Salesin, Irfan Essa (SIGGRAPH 2000)


Video textures arno sch dl richard szeliski david h salesin irfan essa siggraph 2000 l.jpg

Video TexturesArno Schödl, Richard Szeliski, David H. Salesin, Irfan Essa (SIGGRAPH 2000)

You can find this video at Virtual Fish Tank Movie


Graphics image analogies l.jpg

Graphics: Image Analogies

:

::

?

:

Hertzmann, Jacobs, Oliver, Curless, Salesin (2000) SIGGRAPH


Learning to predict textures l.jpg

A(p)

A’(p)

B(q)

B’(q)

Learning to Predict Textures

Find p to minimize Euclidean distance between

and

B’(q) := A’(p)


Image analogies l.jpg

Image Analogies

:

::

:


Slide61 l.jpg

A video can be found at

Image Analogies Movie


Summary l.jpg

Summary

  • Standard Software Engineering methods fail in many application problems

  • Machine Learning methods can replace guesswork with data to make good design decisions


Machine learning and computer science63 l.jpg

Machine Learning and Computer Science

  • Machine Learning is already at the heart of speech recognition and handwriting recognition

  • Statistical methods are transforming natural language processing (understanding, translation, retrieval)

  • Statistical methods are creating opportunities in databases, computer graphics, robotics, computer vision, networking, and computer security


Computer power and data power l.jpg

Computer Power and Data Power

  • Data is a new source of power for computer science

  • Every computer science student should learn the fundamentals of machine learning and statistical thinking

  • By combining engineered frameworks with models learned from data, we can develop the high-performance systems of the future


  • Login