Introduction to learning
This presentation is the property of its rightful owner.
Sponsored Links
1 / 49

Introduction to Learning PowerPoint PPT Presentation


  • 111 Views
  • Uploaded on
  • Presentation posted in: General

Introduction to Learning. ECE457 Applied Artificial Intelligence Fall 2007 Lecture #11. Outline. Overview of learning Supervised learning Russell & Norvig, sections 18.1-18.3, 19.1 Unsupervised learning Russell & Norvig, section 20.3 Reinforcement learning

Download Presentation

Introduction to Learning

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Introduction to learning

Introduction to Learning

ECE457 Applied Artificial Intelligence

Fall 2007

Lecture #11


Outline

Outline

  • Overview of learning

  • Supervised learning

    • Russell & Norvig, sections 18.1-18.3, 19.1

  • Unsupervised learning

    • Russell & Norvig, section 20.3

  • Reinforcement learning

    • Russell & Norvig, sections 21.1, 21.3, 21.5

      CS 498 & CS 698 (Prof. Ben-David)

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 2


Limit of predefined knowledge

Limit of Predefined Knowledge

  • Many of the algorithms and techniques we saw relied on predefined information

    • Probability distributions

    • Heuristics

    • Utility functions

  • Only works if this information is easily available

  • For real-world application, often preferable to make agent learn the information automatically

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 3


Overview of learning

Overview of Learning

  • Learn what?

    • Facts about the world (KB)

    • Decision-making strategy

    • Probabilities, costs, functions, states, …

  • Learn from what?

    • Training data

      • Often freely available for common problems

    • Real world

  • Learn how?

    • Need some form of feedback to the agent

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 4


Learning and feedback

Learning and Feedback

  • Supervised learning

    • Training data includes correct output

    • Learn relationship between data and output

    • Evaluate with statistics

  • Unsupervised learning

    • Training data with no correct output

    • Learn patterns in the data

    • Evaluate with fitness of the pattern

  • Reinforcement learning

    • Set of actions with rewards and punishments

    • Learn to maximize reward & minimize punishment

    • Evaluate with value of reward

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 5


Supervised learning

Supervised Learning

  • Given a training corpus of data-output pairs

    • x & y values

    • Email & spam/not spam

    • Variable values & decision

  • Learn the relationship mapping the data to the output

    • f(x)

    • Spam features

    • Decision rules

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 6


Supervised learning example

-

-

-

-

-

-

+

+

-

-

+

+

-

+

-

+

+

-

+

+

+

-

+

+

+

+

-

+

-

-

-

-

-

-

-

-

-

Supervised Learning Example

  • 2D state space with binary classification

  • Learn function to separate both classes

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 7


Decision tree

-

-

-

-

-

-

+

+

-

-

+

+

-

+

-

+

+

-

+

+

+

-

+

+

+

+

-

+

-

-

-

-

-

-

-

-

-

Decision Tree

X>5

Yes

8

No

-

X<2

Yes

No

-

Y>8

Yes

No

-

Y<4

4

No

Yes

-

+

2

5

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 8


Decision tree1

-

-

-

-

-

-

+

+

-

-

+

+

-

+

-

+

+

-

+

+

+

-

+

+

+

+

-

+

-

-

-

-

-

-

-

-

-

Decision Tree

X

Other

<2

8

>5

-

-

Y

Other

<4

>8

-

-

+

4

2

5

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 9


Decision tree2

Decision Tree

  • Multiple variables and value to decide whether email is spam

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 10


Decision tree3

Decision Tree

  • Build decision tree

UW

No

Yes

Not Spam

Multi

No

Yes

Not Spam

Attach

Yes

No

Spam

Pictures

Yes

No

Spam

Not Spam

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 11


Supervised learning algorithm

-

-

-

-

-

-

+

+

-

-

+

+

-

+

-

+

+

-

+

+

+

-

+

+

+

+

-

+

-

-

-

-

-

-

-

-

-

Supervised Learning Algorithm

  • Many possible learning techniques, depending on the problem and the data

    • Start with inaccurate initial hypothesis

    • Refine to reduce error or increase accuracy

    • End with trade-off between accuracy and simplicity

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 12


Trade off

Trade-Off

  • Why trade-off between accuracy and simplicity?

    • Noisy data

    • Special cases

    • Measurement errors

-

-

-

-

-

-

+

-

+

+

-

-

+

+

-

+

-

+

-

+

-

-

+

+

+

-

+

+

+

+

-

-

-

-

-

-

-

-

-

-

-

+

-

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 13


Supervised learning algorithm1

Supervised Learning Algorithm

  • Learning a decision tree follows the same general algorithm

    • Start with all emails at root

    • Pick attribute that will teach us the most

      • Highest information gain, i.e. difference of probability of each class

    • Branch using that attribute

    • Repeat until trade-off between accuracy of leafs and depth limit / relevance of attributes

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 14


Supervised learning evaluation

Supervised Learning Evaluation

  • Statistical measures of agent’s performance

    • RMS error between f(x) and y

    • Making correct decision

      • With as few decision rules as possible

      • Shallowest tree possible

    • Accuracy of a classification

    • Precision and recall of a classification

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 15


Precision and recall

Precision and Recall

  • Binary classification: distinguish + (our target) from – (everything else)

  • Classifier makes mistakes

    • Classifies some + as – and some – as +

  • Define four categories:

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 16


Precision and recall1

Precision and Recall

  • Precision

    • Proportion of selected items the classifier got right

    • TP / (TP + FP)

  • Recall

    • Proportion of target items the classifier selected

    • TP / (TP + FN)

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 17


Precision and recall2

Precision and Recall

  • Why ignore True Negatives?

  • Typically, there are a lot more negatives than positives

    • Internet searches: + are target websites, - are all other websites

  • Counting TN would skew the statistics and favour a system that classifies everything as negatives

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 18


Overfitting

Overfitting

  • A common problem with supervised learning is over-specializing the relation learned to the training data

  • Learning from irrelevant features of the data

    • Email features such as: paragraph indentation, number of typos, letter “x” in sender address, …

  • Works well on training data

    • Because of poor sampling or random chance

  • Fails in real-world tests

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 19


Testing data

Testing Data

  • Evaluate the relation learned using unseen test data

    • i.e. that was not used in the training

    • Therefore system not overfitted for it

  • Split training data beforehand, keep part away for testing

    • Only works once!

    • If you reuse testing data, you are overfitting your system for that test!!

    • Never do that!!!

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 20


Cross validation

Cross-Validation

  • Shortcomings of holding out test data

    • Test only works once

    • Training on less data, therefore result less accurate

  • n-fold cross-validation

    • Split the training corpus into n parts

    • Train with n-1, test with 1

    • Run n tests, each time using a different test part

    • Final training with all data and best features

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 21


Na ve bayes classifier

Naïve Bayes Classifier

  • P(Cx|F1,…,Fn) = P(Cx) iP(Fi|Cx)

    • Classify item in class Cx with maximum probability

  • Weighted Naïve Bayes Classifier

    • Paper on website

    • Give each feature Fi a weight wi

    • Learn the proper weight values

  • P(Cx|F1,…,Fn) = P(Cx) iP(Fi|Cx)wi

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 22


Learning the weights

Learning the Weights

  • Start with initial weight values

  • At each iteration, for each feature

    • Measure the impact of that feature on the accuracy of the classification

    • Modify the weight to increase the accuracy of the classification

  • End if

    • Iteration limit is reached

    • Accuracy increase less than threshold

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 23


Learning the weights1

Learning the Weights

  • Define the following

    • Initial weight values: wi(0) = 1

    • Learning rate: 

    • Measure of the accuracy of the classification using feature Fi at iteration n: Ain

    • Function to convert Ain into weight variation: (Ain)

      • (Ain) = (1 + e-Ain)-1 * [1 - (1 + e-Ain)-1]²

    • Threshold improvement in accuracy: є

    • Iteration limit: nmax

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 24


Learning the weights2

Learning the Weights

  • Start with wi(0)

  • At iteration n, for feature Fi

    • Measure Ain

    • Compute wi(n) = (Ain)

    • wi(n) = wi(n-1) + wi(n)

  • End if

    • n = nmax (entire algorithm)

    • Ain < є (feature Fi)

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 25


Unsupervised learning

Unsupervised Learning

  • Given a training corpus of data points

    • Observed value of random variables in Bayesian network

    • Series of data points

    • Orbits of planets

  • Learn underlying pattern in the data

    • Existence and conditional probability of hidden variables

    • Number of classes and classification rules

    • Kepler’s laws of planetary motion

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 26


Unsupervised learning example

Unsupervised Learning Example

  • 2D state space with unclassified observations

  • Learn number and form of clusters

  • Problem of unsupervised clustering

    • Many algorithms proposed for it

    • More research still being done for better algorithms, different kind of data, …

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 27


Unsupervised learning algorithm

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

Unsupervised Learning Algorithm

  • Define a similarity measure, to compare pairs of elements

  • Starting with no clusters

    • Pick seed element

    • Group similar elements until threshold

    • Pick new seed from free elements and start again

*

*

*

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 28


Unsupervised learning algorithm1

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

Unsupervised Learning Algorithm

  • Starting with one all-encompassing cluster

    • Find cluster with highest internal dissimilarity

    • Find most dissimilar pair of elements inside cluster

    • Split into two clusters

    • Repeat until all clusters have internal homogeneity

    • Merge homogeneous clusters

*

*

*

*

*

*

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 29


Unsupervised learning evaluation

Unsupervised Learning Evaluation

  • Need to evaluate fitness of relationship learned

    • Number of clusters vs. their internal properties

    • Difference between clusters vs. internal homogeneity

    • Number of parameters vs. number of hidden variables in Bayesian network

  • No way of knowing what is the optimal solution

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 30


K means

K-Means

  • Popular unsupervised clustering algorithm

  • Data represented as cloud of points in state space

  • Target

    • Group points in k clusters

    • Minimize intra-cluster variance

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 31


K means1

K-Means

  • Start with k random cluster centers

  • For each iteration

    • For each data point

      • Associate the point to the nearest cluster center

      • Add to variance

    • Move each cluster center to the center of mass of associated data point cloud

    • End when

      • Variance less than threshold

      • Cluster centers stabilize

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 32


K means2

K-Means

  • We have:

    • Data points: x1, …, xi, …, xn

    • Clusters: C1, …, Cj, … Ck

    • Cluster centers: 1, …, j, … k

  • Minimize intra-cluster variance

    • V = j xiCj |xi - j|²

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 33


K means example

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

K-Means Example

o

o

o

o

o

o

o

o

o

o

o

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 34


Reinforcement learning

Reinforcement Learning

  • Given a set of possible actions, the resulting state of the environment, and rewards or punishment for each state

    • Taxi driver: tips, car repair costs, tickets

    • Checkers: advantage in number of pieces

  • Learn to maximize the rewards and/or minimize the punishments

    • Maximize tip, minimize damage to car and police tickets: drive properly

    • Protect own pieces, take enemy pieces: good play strategy

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 35


Reinforcement learning1

Reinforcement Learning

  • Learning by trial and error

  • Try something, see the result

    • Speeding results in tickets, going through a red light results in car damage, quick and safe drive results in tips

    • Checkers pieces in the center of the board are soon lost, pieces on the side are kept longer, sacrifice some pieces to take a greater number of enemy pieces

  • Sacrifice known rewarding actions to explore new, potentially more rewarding actions

  • Develop strategies to maximize rewards while minimizing penalties over the long-term

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 36


Q learning

Q-Learning

  • Each state has

    • A reward or punishment

    • A list of possible actions, which lead to other states

  • Learn value of state-action pairs

    • Q-value

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 37


Q learning1

Q-Learning

  • Update value of previous (t-1) state-action pair based on current (t) state-action value

  • Q(st-1,at-1) = [Rt-1 + maxa(Q(st,at)) – Q(st-1,at-1)]

    • Q(s,a): estimated value of state-action pair (s,a)

    • Rt: reward of state st

    • : learning rate

    • : discount factor of future rewards

      • 0 (future rewards are irrelevant), 1 (future rewards are the same as current rewards)

    • Update: Q(st-1,at-1) = Q(st-1,at-1) + Q(st-1,at-1)

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 38


Exploration function

Exploration Function

  • If agent always does action with max Q(s,a), it always evaluates the same state-action pairs

  • Need exploration function

    • Trade-off greed vs. curiosity

    • Try rarely-explored low-payoff actions instead of well-known high-payoff actions

    • Many possible functions

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 39


Exploration function1

Exploration Function

  • Define:

    • Q(s,a): estimated value of (s,a)

    • N(s,a): Number of times (s,a) has been tried

    • Rmax: maximum possible value of Q(s,a)

    • Nmin: minimum number of times we want the agent to try (s,a)

  • f( Q(s,a), N(s,a) ) =

    • Rmax if N(s,a) < Nmin

    • Q(s,a) otherwise

  • Agent picks action with maximum f(.) value

  • Guarantees each (s,a) pair is explored at least Nmin times

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 40


Limits of rl

Limits of RL

  • Search

    • Number of state-action pairs can be very large

    • Intermediate rewards can be noisy

  • Real-world search

    • Initial policy can have very poor reward

    • Necessary exploration of suboptimal actions can be costly

    • Some states are hard to reach

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 41


Policy

Policy

  • Learn the optimal policy in decision network

  • : S  A

  • EU() = t=0t Rt

  • Greedy search

    • Modify policy until EU stops increasing

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 42


Helicopter flight control

Helicopter Flight Control

  • Sustained stable inverted flight

    • Very difficult for humans

    • First AI able to do it

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 43


Helicopter flight control1

Helicopter Flight Control

  • Collect flight data with human pilot

  • Learn model of helicopter dynamics

    • Stochastic and nonlinear

    • Supervised learning

  • Learn policy for helicopter controller

    • Reinforcement learning

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 44


Helicopter dynamics

Helicopter Dynamics

  • States

    • Position, orientation, velocity, angular velocity

    • 12 variables

  • 391 seconds of flight data

    • Time step 0.1s

    • 3910 triplets (st, at, st+1)

  • Learn probability distribution P(st+1|st,at)

  • Implemented in simulator and tested by pilot

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 45


Helicopter controller

Helicopter Controller

  • Problem definition

    • S: set of possible states

    • s0: initial state (s0 S)

    • A: set of possible actions

    • P(S|S,A): state transition probabilities

    • : discount factor

    • R: reward function mapping states to values

  • At state st, controller picks action at, system transitions to random state st+1 with probability P(st+1|st,at)

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 46


Helicopter controller1

Helicopter Controller

  • Reward function

    • Punish deviation from desired helicopter position and velocity

    • R  [-, 0]

  • Policy learning

    • Reinforcement learning

    • EU() = t=0t Rt

  • Problem

    • Stochastic state transitions

    • Impossible to compare several policies!

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 47


Pegasus algorithm

PEGASUS algorithm

  • Predefined series of random numbers

    • Length of series function of complexity of policy

  • Use the same series to test all policies

    • At time t, each policy encounters the same random event

  • Simulate stochastic environment

    • Environment stochastic from point of view of agent

    • Environment deterministic from our point of view

    • Makes comparison between policies possible

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 48


Summary of learning

Summary of Learning

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 49


  • Login