Machine Learning Intro iCAMP 2012

Machine Learning Intro iCAMP 2012 Max Welling UC Irvine

Machine Learning • Algorithms that learn to make predictions from examples (data)

Types of Machine Learning • Supervised Learning • Labels are provided, there is a strong learning signal. • e.g. classification, regression. • Semi-supervised Learning. • Only part of the data have labels. • e.g. a child growing up. • Reinforcement learning. • The learning signal is a (scalar) reward and may come with a delay. • e.g. trying to learn to play chess, a mouse in a maze. • Unsupervised learning • There is no direct learning signal. We are simply trying to find structure in data. • e.g. clustering, dimensionality reduction.

Unsupervised Learning: Dimensionality Reduction: clustering (LLE – Roweis& Saul)

Supervised Learning Classification Regression

total of +/- 400,000,000 nonzero entries (99% sparse) movies (+/- 17,770) users (+/- 240,000) Collaborative Filtering (Netflix Dataset) 1 ? 4 ? 1 4

Generalization • Consider the following regression problem: • Predict the real value on the y-axis from the real value on the x-axis. • You are given 6 examples: {Xi,Yi}. • What is the y-value for a new query point X* ? X*

Generalization

Generalization which curve is best?

Generalization • Ockham’s razor: prefer the simplest hypothesis consistent with data.

Generalization Learning is concerned with accurate prediction of future data, not accurate prediction of training data. Question: Design an algorithm that is perfect at predicting training data.

Learning as Compression • Imagine a game where Bob needs to send a dataset to Alice. • They are allowed to meet once before they see the data. • The agree on a precision level (quantization level). • Bob learns a model (red line). • Bob sends the model parameters • (offset and slant) only once • For every datapoint, Bob sends • -distance along line (large number) • -orthogonal distance from line (small number) • (small numbers are cheaper to encode than • large numbers)

Generalization learning = compression = abstraction • The man who couldn’t forget …

Classification: nearest neighbor Example: Imagine you want to classify versus Data: 100 monkey images and 200 human images with labels what is what. Task: Here is a new image: monkey or human?

1 nearest neighbor • Idea: • Find the picture in the database which is closest your query image. • Check its label. • Declare the class of your query image to be the same as that of the • closest picture. query closest image

kNN Decision Surface decision curve

Bayes Rule(s) Riddle: Joe goes to the doctor and tells the doctor he has a stiff neck and a rash. The doctor is worried about meningitis and performs a test that is 80% correct, that is, for 80% of the people that have meningitis it will turn out positive. If 1 in 100,000 people have meningitis in the population and 1 in 1000 people will test positive (sick or not sick) what is the probability that Joe has meningitis? Answer: Bayes Rule. P(meningitis | positive test) = P(positive test | meningitis ) P(meningitis) / P(positive test) = 0.8 * 0.00001 / 0.001 = 0.008 < 1%

Naïve Bayes Classifier test result X1 meningitis stiff-neck, rash X2 Y Naïve Bayes Classifier: P(Y|X1,X2) = P(X1|Y) P(X2|Y) P(Y) / PX1, X2) Conditional Independence: P(X1,X2|Y) = P(X1|Y) P(X2|Y)

Bayesian Networks & Graphical Models • Main modeling tool for modern machine learning • Reasoning over large collections of random variables with intricate relations

Machine Learning Intro iCAMP 2012