introduction to machine learning n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Introduction to Machine Learning PowerPoint Presentation
Download Presentation
Introduction to Machine Learning

Loading in 2 Seconds...

play fullscreen
1 / 61

Introduction to Machine Learning - PowerPoint PPT Presentation


  • 214 Views
  • Uploaded on

Introduction to Machine Learning. Learning. Agent has made observations ( data ) Now must make sense of it ( hypotheses ) Hypotheses alone may be important (e.g., in basic science) For inference (e.g., forecasting) To take sensible actions (decision making)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Introduction to Machine Learning' - mora


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
learning
Learning
  • Agent has made observations (data)
  • Now must make sense of it (hypotheses)
    • Hypotheses alone may be important (e.g., in basic science)
    • For inference (e.g., forecasting)
    • To take sensible actions (decision making)
  • A basic component of economics, social and hard sciences, engineering, …
last time
Last time
  • Learning using statistical modelsrelating unknown hypothesis to observed data
  • 3 techniques
    • Maximum likelihood
    • Maximum a posteriori
    • Bayesian inference
a review
A Review
  • HW4 asks you to learn parameters of a Bayes Net (Naïve Bayes)
  • Suppose we wish to estimate P(C|A,B)
  • The unknown parameters in the conditional probability distribution are “coin flips”
a review1
A Review
  • Example data
  • ML estimates
maximum a posteriori
Maximum A Posteriori
  • Example data
  • MAP Estimates assumingvirtual counts a=b=2
  • virtual counts 2H,2T
topics in machine learning
Topics in Machine Learning

TechniquesBayesian learningDecision treesNeural networksSupport vector machinesBoostingCase-based reasoningDimensionality reduction…

Tasks & settingsClassificationRankingClusteringRegressionDecision-makingSupervisedUnsupervisedSemi-supervisedActiveReinforcement learning

Applications

Document retrieval

Document classification

Data mining

Computer vision

Scientific discoveryRobotics…

what is learning
What is Learning?
  • Mostly generalization from experience:

“Our experience of the world is specific, yet we are able to formulate general theories that account for the past and predict the future”M.R.Genesereth and N.J. Nilsson, in Logical Foundations of AI, 1987

  •  Concepts, heuristics, policies
  • Supervised vs. un-supervised learning
supervised learning
Supervised Learning
  • Agent is given a training set of input/output pairs (x,y), with y=f(x)
  • Task: build a model that will allow it to predict f(x)for a new x
unsupervised learning
Unsupervised Learning
  • Agent is given a training set of data points x
  • Task: learn “patterns” in the data (e.g., clusters)
reinforcement learning
Reinforcement Learning
  • Agent acts sequentially in the real world, chooses actions a1,…,an, receives reward R
  • Must decide which actions were most responsible for R
deductive vs inductive reasoning
Deductive Vs. Inductive Reasoning
  • Deductive reasoning:
    • General rules (e.g., logic) to specific examples
  • Inductive reasoning:
    • Specific examples to general rules
inductive learning
Inductive Learning
  • Basic form: learn a functionfrom examples
  • f is the unknown target function
  • An example is a pair (x, f(x))
  • Problem: find a hypothesis h
    • such that h ≈ f
    • given a training set of examples D
  • Instance of supervised learning
    • Classification task: f  {0,1,…,C} (usually C=1)
    • Regression task: f reals
inductive learning method
Inductive learning method
  • Construct/adjust h to agree with f on training set
  • (h is consistent if it agrees with f on all examples)
  • E.g., curve fitting:
inductive learning method1
Inductive learning method
  • Construct/adjust h to agree with f on training set
  • (h is consistent if it agrees with f on all examples)
  • E.g., curve fitting:
inductive learning method2
Inductive learning method
  • Construct/adjust h to agree with f on training set
  • (h is consistent if it agrees with f on all examples)
  • E.g., curve fitting:
inductive learning method3
Inductive learning method
  • Construct/adjust h to agree with f on training set
  • (h is consistent if it agrees with f on all examples)
  • E.g., curve fitting:
inductive learning method4
Inductive learning method
  • Construct/adjust h to agree with f on training set
  • (h is consistent if it agrees with f on all examples)
  • E.g., curve fitting:
inductive learning method5
Inductive learning method
  • Construct/adjust h to agree with f on training set
  • (h is consistent if it agrees with f on all examples)
  • E.g., curve fitting:

h=D is a trivial, but perhaps uninteresting solution (caching)

overfitting
Overfitting
  • An issue that can arise in all learning tasks where the hypothesis too closely matches the known data
  • Essentially we incorporate the “noise” of the data into our prediction strategy for unseen data — this is BAD!
  • Extreme cases shown on previous two slides
classification task

a small one!

Classification Task
  • The target function f(x) takes on values True and False
  • A example is positive if f is True, else it is negative
  • The set X of all examples is the example set
  • The training set is a subset of X
logic based inductive learning
Logic-Based Inductive Learning
  • Here, examples (x, f(x)) take on discrete values
logic based inductive learning1
Logic-Based Inductive Learning
  • Here, examples (x, f(x)) take on discrete values

Concept

Note that the training set does not say whether

an observable predicate is pertinent or not

rewarded card example
Rewarded Card Example
  • Deck of cards, with each card designated by [r,s], its rank and suit, and some cards “rewarded”
  • Background knowledge KB:((r=1) v … v (r=10))  NUM(r)((r=J) v (r=Q) v (r=K))  FACE(r)((s=S) v (s=C))  BLACK(s)((s=D) v (s=H))  RED(s)
  • Training set D:REWARD([4,C])  REWARD([7,C])  REWARD([2,S])  REWARD([5,H])  REWARD([J,S])
rewarded card example1
Rewarded Card Example
  • Deck of cards, with each card designated by [r,s], its rank and suit, and some cards “rewarded”
  • Background knowledge KB:((r=1) v … v (r=10))  NUM(r)((r=J) v (r=Q) v (r=K))  FACE(r)((s=S) v (s=C))  BLACK(s)((s=D) v (s=H))  RED(s)
  • Training set D:REWARD([4,C])  REWARD([7,C])  REWARD([2,S])  REWARD([5,H])  REWARD([J,S])
  • Possible inductive hypothesis:h  (NUM(r)  BLACK(s)  REWARD([r,s]))

There are several possible

inductive hypotheses

learning a logical predicate concept classifier
Learning a Logical Predicate (Concept Classifier)
  • Set E of objects (e.g., cards)
  • Goal predicate CONCEPT(x), where x is an object in E, that takes the value True or False (e.g., REWARD)
  • Example:CONCEPT describes the precondition of an action, e.g., Unstack(C,A)
  • E is the set of states
  • CONCEPT(x)  HANDEMPTYx, BLOCK(C) x, BLOCK(A) x, CLEAR(C) x, ON(C,A) x
  • Learning CONCEPT is a step toward learning an action description
learning a logical predicate concept classifier1
Learning a Logical Predicate (Concept Classifier)
  • Set E of objects (e.g., cards)
  • Goal predicate CONCEPT(x), where x is an object in E, that takes the value True or False (e.g., REWARD)
  • Observable predicates A(x), B(X), … (e.g., NUM, RED)
  • Training set: values of CONCEPT for some combinations of values of the observable predicates
learning a logical predicate concept classifier2
Learning a Logical Predicate (Concept Classifier)
  • Set E of objects (e.g., cards)
  • Goal predicate CONCEPT(x), where x is an object in E, that takes the value True or False (e.g., REWARD)
  • Observable predicates A(x), B(X), … (e.g., NUM, RED)
  • Training set: values of CONCEPT for some combinations of values of the observable predicates
  • Find a representation of CONCEPT in the form: CONCEPT(x)  S(A,B, …)where S(A,B,…) is a sentence built with the observable predicates, e.g.: CONCEPT(x)  A(x)  (B(x) v C(x))
hypothesis space
Hypothesis Space
  • An hypothesis is any sentence of the form: CONCEPT(x) S(A,B, …)where S(A,B,…) is a sentence built using the observable predicates
  • The set of all hypotheses is called the hypothesis space H
  • An hypothesis h agrees with an example if it gives the correct value of CONCEPT
inductive learning scheme

Inductivehypothesis h

Training set D

-

-

+

-

+

-

-

-

-

+

+

+

+

-

-

+

+

+

+

-

-

-

+

+

Hypothesis space H

{[CONCEPT(x)  S(A,B, …)]}

Example set X

{[A, B, …, CONCEPT]}

Inductive Learning Scheme
size of hypothesis space

2n

2

Size of Hypothesis Space
  • n observable predicates
  • 2n entries in truth table defining CONCEPT and each entry can be filled with True or False
  • In the absence of any restriction (bias), there are

hypotheses to choose from

  • n = 6  2x1019 hypotheses!
multiple inductive hypotheses
Multiple Inductive Hypotheses

h1 NUM(r)  BLACK(s)  REWARD([r,s])

h2 BLACK(s) (r=J)  REWARD([r,s])

h3 ([r,s]=[4,C])  ([r,s]=[7,C])  [r,s]=[2,S])  REWARD([r,s])

h4 ([r,s]=[5,H])  ([r,s]=[J,S])  REWARD([r,s])

agree with all the examples in the training set

multiple inductive hypotheses1
Multiple Inductive Hypotheses

Need for a system of preferences – called an inductive bias – to compare possible hypotheses

h1 NUM(r)  BLACK(s)  REWARD([r,s])

h2 BLACK(s) (r=J)  REWARD([r,s])

h3 ([r,s]=[4,C])  ([r,s]=[7,C])  [r,s]=[2,S])  REWARD([r,s])

h4 ([r,s]=[5,H])  ([r,s]=[J,S])  REWARD([r,s])

agree with all the examples in the training set

notion of capacity
Notion of Capacity
  • It refers to the ability of a machine to learn any training set without error
  • A machine with too much capacity is like a botanist with photographic memory who, when presented with a new tree, concludes that it is not a tree because it has a different number of leaves from anything he has seen before
  • A machine with too little capacity is like the botanist’s lazy brother, who declares that if it’s green, it’s a tree
  • Good generalization can only be achieved when the right balance is struck between the accuracy attained on the training set and the capacity of the machine
keep it simple kis bias
 Keep-It-Simple (KIS) Bias
  • Examples
    • Use much fewer observable predicates than the training set
    • Constrain the learnt predicate, e.g., to use only “high-level” observable predicates such as NUM, FACE, BLACK, and RED and/or to have simple syntax
  • Motivation
    • If an hypothesis is too complex it is not worth learning it (data caching does the job as well)
    • There are much fewer simple hypotheses than complex ones, hence the hypothesis space is smaller
keep it simple kis bias1
 Keep-It-Simple (KIS) Bias
  • Examples
    • Use much fewer observable predicates than the training set
    • Constrain the learnt predicate, e.g., to use only “high-level” observable predicates such as NUM, FACE, BLACK, and RED and/or to have simple syntax
  • Motivation
    • If an hypothesis is too complex it is not worth learning it (data caching does the job as well)
    • There are much fewer simple hypotheses than complex ones, hence the hypothesis space is smaller

Einstein: “A theory must be as simple as possible, but not simpler than this”

keep it simple kis bias2
 Keep-It-Simple (KIS) Bias
  • Examples
    • Use much fewer observable predicates than the training set
    • Constrain the learnt predicate, e.g., to use only “high-level” observable predicates such as NUM, FACE, BLACK, and RED and/or to have simple syntax
  • Motivation
    • If an hypothesis is too complex it is not worth learning it (data caching does the job as well)
    • There are much fewer simple hypotheses than complex ones, hence the hypothesis space is smaller

If the bias allows only sentences S that are

conjunctions of k << n predicates picked fromthe n observable predicates, then the size of

H is O(nk)

supervised learning flow chart

Unknown concept we want to approximate

Hypothesisspace

Choice of learning algorithm

Learner

Observations we have seen

InductiveHypothesis

Test set

Better quantities to assess performance

Prediction

Observations we will see in the future

Supervised Learning Flow Chart

Datapoints

Targetfunction

Training

set

capacity is not the only criterion

-

-

+

-

+

-

-

-

-

+

+

+

+

-

-

+

+

+

+

-

-

-

+

+

Capacity is Not the Only Criterion
  • Accuracy on training set isn’t the best measure of performance

Training set D

Test

Learn

Example set X

Hypothesis space H

generalization error

-

-

+

-

+

-

-

-

-

+

+

+

+

-

-

+

+

+

+

-

-

-

+

+

Generalization Error
  • A hypothesis h is said to generalize well if it achieves low error on all examples in X

Test

Learn

Example set X

Hypothesis space H

overfitting restated
Overfitting (Restated)
  • If we have a situation where our training error continues to decrease
  • While our generalization error on the other handbegins to increase
  • We can safely say our model has fallen victim to overfitting
  • What does this have to do with capacity?
assessing performance of a learning algorithm
Assessing Performance of a Learning Algorithm
  • Samples from X are typically unavailable
  • Take out some of the training set
    • Train on the remaining training set
    • Test on the excluded instances
    • Cross-validation
cross validation

-

-

+

-

+

-

-

-

-

+

+

+

+

-

-

+

+

+

+

-

-

-

+

+

Cross-Validation
  • Split original set of examples, train

Examples D

Train

Hypothesis space H

cross validation1

-

-

-

-

+

+

+

+

+

+

-

-

+

Cross-Validation
  • Evaluate hypothesis on testing set

Testing set

Hypothesis space H

cross validation2
Cross-Validation
  • Evaluate hypothesis on testing set

Testing set

-

+

+

-

+

+

+

Test

-

+

+

-

-

-

Hypothesis space H

cross validation3

-

-

-

-

+

+

+

+

+

+

-

-

+

Cross-Validation
  • Compare true concept against prediction

9/13 correct

Testing set

-

+

+

-

+

+

+

-

+

+

-

-

-

Hypothesis space H

tennis example
Tennis Example
  • Evaluate learning algorithmPlayTennis = S(Temperature,Wind)
tennis example1

Trained hypothesis

PlayTennis =(T=Mild or Cool)  (W=Weak)

Training errors = 3/10

Testing errors = 4/4

Tennis Example
  • Evaluate learning algorithmPlayTennis = S(Temperature,Wind)
tennis example2

Trained hypothesis

PlayTennis = (T=Mild or Cool)

Training errors = 3/10

Testing errors = 1/4

Tennis Example
  • Evaluate learning algorithmPlayTennis = S(Temperature,Wind)
tennis example3

Trained hypothesis

PlayTennis = (T=Mild or Cool)

Training errors = 3/10

Testing errors = 2/4

Tennis Example
  • Evaluate learning algorithmPlayTennis = S(Temperature,Wind)
ten ish commandments of machine learning
Ten(ish) Commandments of machine learning
  • Thou shalt not:
    • Train on examples in the testing set
    • Form assumptions by “peeking” at the testing set, then formulating inductive bias
    • Overfit the training data
reinforcement learning1
Reinforcement Learning
  • Enough talk, let’s see some machine learning in action!
klondike solitaire1
Klondike Solitaire

Difficulties: size of state space; large number of stochastic elements; variability of utility of actions

Strategy: Reinforcement Learning

reward

long-term learning

representation
Representation

State: want to find a way to effectively reduce/collapse state space

Action: need to find way of incorporating key components of an action and its potential results

algorithm
Algorithm

SARSA: State1-Action1-Reward-State2-Action2

Different learning policies vary in:

State representation

Action representation

Reward function

clear winner emerges
Clear Winner Emerges

- Slow, but steady progress

keep playing
Keep Playing...

- Seems to converge, BUT: average win rates for last 10,000 games were: 3.13, 3.08, 3.4, 3, 3.2 ≥ 3 !!!

next time
Next Time
  • Decision tree learning
  • Read R&N 18.1-3