Machine Learning

Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) Perceptrons Feed-forward network Back-prop Other types of networks Machine Learning

Components of a “Well-Posed”Learning Problem • Task: the domain of the problem • Experience: information about the domain • Performance measure: a metric to judge how well the trained system can solve the problem • Learner: a computer program whose performance on the task improves (according to the metric) with more experience

Example: Classification • Task: Predict whether the user might like a movie or not • Experience: database of movies the user has seen and the user’s ratings for them • Performance Measure: percent of times the system correctly predicts the user’s preference

Example: Speech Recognition • Task: take dictations from the user • Experience: a collection of recordings of acoustic utterances with their transcriptions • Performance Measure: percent of words correctly identified

Example: Function Modeling • Task: approximate an unknown function f(x) • Experience: a set of data points: {xi, f(xi)} • Performance Measure: average error rate between f(x), the target function, and h(x), the function the system learned, over m test points e.g.

Designing a Learner • Training experience • Kind of feedback? • A representative sample? • Learner has control? • Target function • Specify expected behavior • Function representation • Specify form and parameters • Learning algorithm

Artificial Neural Networks • Inspired by neurobiology • A network is made up of massively interconnect “neurons” • Good for some learning problems • Noisy training examples (contain errors) • Target function input can be best described by a vector (e.g., robot sensor data) • Target function is continuous (differentiable)

Perceptron 1 w0 • n weighted inputs: In= w0+x1w1 + x2w2 + … + xnwn = x • w • An activation function, g(In) w1 S x1 O={-1,+1} … wn xn +1 : In > 0 O = g(In) = g(x•w) = -1: otherwise

Training a Perceptron • Quantify error • compare output with correct answer • Update weights to minimize error a is a constant, the learning rate

How Powerful Are Perceptrons? • A perceptron can represent simple Boolean functions • AND, OR, NOT • A network of perceptron can represent any Boolean function • A perceptron cannot represent XOR • Why?

Linearly Separable • Refer to pictures from R&N Fig. 19.9

Gradient Descent • Guarantees convergence • Approximates non-linearly separable functions • Search through the weight space • Define error as a continuous function of the weights

Multilayer Network x1 x2 … x n Input units wni ui Hidden units … wij uj … Output units Oj

Training a Multilayer Network • Need to update weights to minimize error, but… • How to assign portions of “blame” to each weights fairly? • In a multilayer network, a weight may (eventually) contribute to multiple outputs • Need to back-propagate the error

Back-Propagation Between a hidden unit and an output unit: Between an input unit and a hidden unit:

Artificial Neural Network Summary • Expressiveness: Can approximate any function of a set of attributes • Computational efficiency: May take a long time to train to convergence • Generalization: generalizes well • Sensitivity to noise: very tolerant • Transparency: can be used like a black box • Prior knowledge: difficult to incorporate

Machine Learning